Hi, I have a question about mirroring. I would like to create a highly available Kafka service that runs on AWS and can survive an AZ failure. Based on what I've read, I plan to create a Kafka cluster in each AZ and use mirror maker to replicate one cluster to the other. I'll call the two clusters in their respective availability zones A and B. A is the primary which is replicated to B. Normally, all consumers consume from A and record their current offset in a persistent store that is replicated across A and B (like Dynamo). If I detect that A has failed producers and consumers will fail over to B. That's the basic idea.
Now, the question: Can I rely on the offset that is being stored in the persistent store to refer to the same event in each cluster? Or is it possible for the two to get out of sync over time - I don't know why, failures of some kind maybe - in which case the offset from A might not really be valid with respect to the replica B. If that is possible, then I'm wondering what I can/should do about it to achieve a clean failover. I realize that the replication may lag behind, so some events from A make be lost when there is a failover. That is okay. I've been told that creating a single cluster that spans AZs and relying on the new replication functionality in 0.8 is a bad idea, as zookeeper isn't well behaved in that case. Hence my alternative design. Thanks in advance. Seth