Thank you for the response! The offsets.topic.replication.factor is set to 2 for Cluster A (the size of the cluster). It is 3 for Cluster B, but the number of in-sync replicas was manually increased to 4 (cluster size) for the the __consumer_offsets topic after the cluster was created.
In addition, these topics are written to at least once a minute, so it's not the case that a retention interval is being exceeded and the offsets being purged as far as I can tell. On Fri, Dec 22, 2017 at 4:01 PM, Matthias J. Sax <matth...@confluent.io> wrote: > Thanks for reporting this. > > What is your `offsets.topic.replication.factor`? > > > > -Matthias > > > > On 12/19/17 8:32 AM, Adam Gurson wrote: > > I am running two kafka 0.11 clusters. Cluster A has two 0.11.0.0 brokers > > with 3 zookeepers. Cluster B has 4 0.11.0.1 brokers with 5 zookeepers. > > > > We have recently updated from running 0.8.2 client and brokers to 0.11. > In > > addition, we added two kafka streams group.id that process data from > one of > > the topics that all of the old code processes from. > > > > Most of the time, scaling the streams clients up or down works ask > > expected. The streams clients go into a rebalance and come up with all > > consumer offsets correct for the topic. > > > > However, I have found two cases were a sever loss of offsets is occuring: > > > > On Cluster A (min.insync.replicas=1), I do a normal "cycle" of the > brokers, > > to stop/start them one at a time, giving time for the brokers to > handshake > > and exchange leadership as necessary. Twice now I have done this, and > both > > kafka streams consumers have rebalanced only to come up with totally > messed > > up offsets. The offsets for one group.id is set to 5,000,000 for all > > partitions, and the other group.id offsets were set to a number just > short > > of 7,000,000. > > > > On Cluster B (min.insync.replicas=2), I am running the exact same streams > > code. I have seen cases where if I scale up or down twoo quickly (i.e. > add > > or remove too many streams clients at once) before a rebalance has > > finished, the offsets for the group.ids are completely lost. This causes > > the streams consumers to reset according to "auto.offset.reset". > > > > In both cases, streams is calculating real-time metrics for data flowing > > through our brokers. These are serious issues because it causes them to > > completely get the counting wrong, either doubly counting or skipping > data > > altogether. I have scoured the web and have been unable to find anyone > else > > having this issue with streams. > > > > I should also mention that all of our old 0.8.2 consumer code (which is > > updated to 0.11 client library) never has any problems with offsets. My > > guess is because they are still using zookeeper to store their offsets. > > > > This implies to me that the __consumer_offsets topic isn't being utilized > > by streams clients correctly. > > > > I'm at a total loss at this point and would greatly appreciate any > advice. > > Thank you. > > > >