Sorry for the long delay. Just rediscovered this... Hard to tell without logs. Can you still reproduce the issue? Debug logs for broker and stream application would be helpful to dig into it.
-Matthias On 1/2/18 6:26 AM, Adam Gurson wrote: > Thank you for the response! The offsets.topic.replication.factor is set to > 2 for Cluster A (the size of the cluster). It is 3 for Cluster B, but the > number of in-sync replicas was manually increased to 4 (cluster size) for > the the __consumer_offsets topic after the cluster was created. > > In addition, these topics are written to at least once a minute, so it's > not the case that a retention interval is being exceeded and the offsets > being purged as far as I can tell. > > On Fri, Dec 22, 2017 at 4:01 PM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> Thanks for reporting this. >> >> What is your `offsets.topic.replication.factor`? >> >> >> >> -Matthias >> >> >> >> On 12/19/17 8:32 AM, Adam Gurson wrote: >>> I am running two kafka 0.11 clusters. Cluster A has two 0.11.0.0 brokers >>> with 3 zookeepers. Cluster B has 4 0.11.0.1 brokers with 5 zookeepers. >>> >>> We have recently updated from running 0.8.2 client and brokers to 0.11. >> In >>> addition, we added two kafka streams group.id that process data from >> one of >>> the topics that all of the old code processes from. >>> >>> Most of the time, scaling the streams clients up or down works ask >>> expected. The streams clients go into a rebalance and come up with all >>> consumer offsets correct for the topic. >>> >>> However, I have found two cases were a sever loss of offsets is occuring: >>> >>> On Cluster A (min.insync.replicas=1), I do a normal "cycle" of the >> brokers, >>> to stop/start them one at a time, giving time for the brokers to >> handshake >>> and exchange leadership as necessary. Twice now I have done this, and >> both >>> kafka streams consumers have rebalanced only to come up with totally >> messed >>> up offsets. The offsets for one group.id is set to 5,000,000 for all >>> partitions, and the other group.id offsets were set to a number just >> short >>> of 7,000,000. >>> >>> On Cluster B (min.insync.replicas=2), I am running the exact same streams >>> code. I have seen cases where if I scale up or down twoo quickly (i.e. >> add >>> or remove too many streams clients at once) before a rebalance has >>> finished, the offsets for the group.ids are completely lost. This causes >>> the streams consumers to reset according to "auto.offset.reset". >>> >>> In both cases, streams is calculating real-time metrics for data flowing >>> through our brokers. These are serious issues because it causes them to >>> completely get the counting wrong, either doubly counting or skipping >> data >>> altogether. I have scoured the web and have been unable to find anyone >> else >>> having this issue with streams. >>> >>> I should also mention that all of our old 0.8.2 consumer code (which is >>> updated to 0.11 client library) never has any problems with offsets. My >>> guess is because they are still using zookeeper to store their offsets. >>> >>> This implies to me that the __consumer_offsets topic isn't being utilized >>> by streams clients correctly. >>> >>> I'm at a total loss at this point and would greatly appreciate any >> advice. >>> Thank you. >>> >> >> >
Description: OpenPGP digital signature