I am running two kafka 0.11 clusters. Cluster A has two 0.11.0.0 brokers
with 3 zookeepers. Cluster B has 4 0.11.0.1 brokers with 5 zookeepers.

We have recently updated from running 0.8.2 client and brokers to 0.11. In
addition, we added two kafka streams group.id that process data from one of
the topics that all of the old code processes from.

Most of the time, scaling the streams clients up or down works ask
expected. The streams clients go into a rebalance and come up with all
consumer offsets correct for the topic.

However, I have found two cases were a sever loss of offsets is occuring:

On Cluster A (min.insync.replicas=1), I do a normal "cycle" of the brokers,
to stop/start them one at a time, giving time for the brokers to handshake
and exchange leadership as necessary. Twice now I have done this, and both
kafka streams consumers have rebalanced only to come up with totally messed
up offsets. The offsets for one group.id is set to 5,000,000 for all
partitions, and the other group.id offsets were set to a number just short
of 7,000,000.

On Cluster B (min.insync.replicas=2), I am running the exact same streams
code. I have seen cases where if I scale up or down twoo quickly (i.e. add
or remove too many streams clients at once) before a rebalance has
finished, the offsets for the group.ids are completely lost. This causes
the streams consumers to reset according to "auto.offset.reset".

In both cases, streams is calculating real-time metrics for data flowing
through our brokers. These are serious issues because it causes them to
completely get the counting wrong, either doubly counting or skipping data
altogether. I have scoured the web and have been unable to find anyone else
having this issue with streams.

I should also mention that all of our old 0.8.2 consumer code (which is
updated to 0.11 client library) never has any problems with offsets. My
guess is because they are still using zookeeper to store their offsets.

This implies to me that the __consumer_offsets topic isn't being utilized
by streams clients correctly.

I'm at a total loss at this point and would greatly appreciate any advice.
Thank you.

Reply via email to