Re: Kafka Streams 0.11 consumers losing offsets for all group.ids

2018-02-14 Thread Matthias J. Sax
Sorry for the long delay. Just rediscovered this...

Hard to tell without logs. Can you still reproduce the issue? Debug logs
for broker and stream application would be helpful to dig into it.

-Matthias

On 1/2/18 6:26 AM, Adam Gurson wrote:
> Thank you for the response! The offsets.topic.replication.factor is set to
> 2 for Cluster A (the size of the cluster). It is 3 for Cluster B, but the
> number of in-sync replicas was manually increased to 4 (cluster size) for
> the the __consumer_offsets topic after the cluster was created.
> 
> In addition, these topics are written to at least once a minute, so it's
> not the case that a retention interval is being exceeded and the offsets
> being purged as far as I can tell.
> 
> On Fri, Dec 22, 2017 at 4:01 PM, Matthias J. Sax 
> wrote:
> 
>> Thanks for reporting this.
>>
>> What is your `offsets.topic.replication.factor`?
>>
>>
>>
>> -Matthias
>>
>>
>>
>> On 12/19/17 8:32 AM, Adam Gurson wrote:
>>> I am running two kafka 0.11 clusters. Cluster A has two 0.11.0.0 brokers
>>> with 3 zookeepers. Cluster B has 4 0.11.0.1 brokers with 5 zookeepers.
>>>
>>> We have recently updated from running 0.8.2 client and brokers to 0.11.
>> In
>>> addition, we added two kafka streams group.id that process data from
>> one of
>>> the topics that all of the old code processes from.
>>>
>>> Most of the time, scaling the streams clients up or down works ask
>>> expected. The streams clients go into a rebalance and come up with all
>>> consumer offsets correct for the topic.
>>>
>>> However, I have found two cases were a sever loss of offsets is occuring:
>>>
>>> On Cluster A (min.insync.replicas=1), I do a normal "cycle" of the
>> brokers,
>>> to stop/start them one at a time, giving time for the brokers to
>> handshake
>>> and exchange leadership as necessary. Twice now I have done this, and
>> both
>>> kafka streams consumers have rebalanced only to come up with totally
>> messed
>>> up offsets. The offsets for one group.id is set to 5,000,000 for all
>>> partitions, and the other group.id offsets were set to a number just
>> short
>>> of 7,000,000.
>>>
>>> On Cluster B (min.insync.replicas=2), I am running the exact same streams
>>> code. I have seen cases where if I scale up or down twoo quickly (i.e.
>> add
>>> or remove too many streams clients at once) before a rebalance has
>>> finished, the offsets for the group.ids are completely lost. This causes
>>> the streams consumers to reset according to "auto.offset.reset".
>>>
>>> In both cases, streams is calculating real-time metrics for data flowing
>>> through our brokers. These are serious issues because it causes them to
>>> completely get the counting wrong, either doubly counting or skipping
>> data
>>> altogether. I have scoured the web and have been unable to find anyone
>> else
>>> having this issue with streams.
>>>
>>> I should also mention that all of our old 0.8.2 consumer code (which is
>>> updated to 0.11 client library) never has any problems with offsets. My
>>> guess is because they are still using zookeeper to store their offsets.
>>>
>>> This implies to me that the __consumer_offsets topic isn't being utilized
>>> by streams clients correctly.
>>>
>>> I'm at a total loss at this point and would greatly appreciate any
>> advice.
>>> Thank you.
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Kafka Streams 0.11 consumers losing offsets for all group.ids

2017-12-22 Thread Matthias J. Sax
Thanks for reporting this.

What is your `offsets.topic.replication.factor`?



-Matthias



On 12/19/17 8:32 AM, Adam Gurson wrote:
> I am running two kafka 0.11 clusters. Cluster A has two 0.11.0.0 brokers
> with 3 zookeepers. Cluster B has 4 0.11.0.1 brokers with 5 zookeepers.
> 
> We have recently updated from running 0.8.2 client and brokers to 0.11. In
> addition, we added two kafka streams group.id that process data from one of
> the topics that all of the old code processes from.
> 
> Most of the time, scaling the streams clients up or down works ask
> expected. The streams clients go into a rebalance and come up with all
> consumer offsets correct for the topic.
> 
> However, I have found two cases were a sever loss of offsets is occuring:
> 
> On Cluster A (min.insync.replicas=1), I do a normal "cycle" of the brokers,
> to stop/start them one at a time, giving time for the brokers to handshake
> and exchange leadership as necessary. Twice now I have done this, and both
> kafka streams consumers have rebalanced only to come up with totally messed
> up offsets. The offsets for one group.id is set to 5,000,000 for all
> partitions, and the other group.id offsets were set to a number just short
> of 7,000,000.
> 
> On Cluster B (min.insync.replicas=2), I am running the exact same streams
> code. I have seen cases where if I scale up or down twoo quickly (i.e. add
> or remove too many streams clients at once) before a rebalance has
> finished, the offsets for the group.ids are completely lost. This causes
> the streams consumers to reset according to "auto.offset.reset".
> 
> In both cases, streams is calculating real-time metrics for data flowing
> through our brokers. These are serious issues because it causes them to
> completely get the counting wrong, either doubly counting or skipping data
> altogether. I have scoured the web and have been unable to find anyone else
> having this issue with streams.
> 
> I should also mention that all of our old 0.8.2 consumer code (which is
> updated to 0.11 client library) never has any problems with offsets. My
> guess is because they are still using zookeeper to store their offsets.
> 
> This implies to me that the __consumer_offsets topic isn't being utilized
> by streams clients correctly.
> 
> I'm at a total loss at this point and would greatly appreciate any advice.
> Thank you.
> 



signature.asc
Description: OpenPGP digital signature


Kafka Streams 0.11 consumers losing offsets for all group.ids

2017-12-19 Thread Adam Gurson
I am running two kafka 0.11 clusters. Cluster A has two 0.11.0.0 brokers
with 3 zookeepers. Cluster B has 4 0.11.0.1 brokers with 5 zookeepers.

We have recently updated from running 0.8.2 client and brokers to 0.11. In
addition, we added two kafka streams group.id that process data from one of
the topics that all of the old code processes from.

Most of the time, scaling the streams clients up or down works ask
expected. The streams clients go into a rebalance and come up with all
consumer offsets correct for the topic.

However, I have found two cases were a sever loss of offsets is occuring:

On Cluster A (min.insync.replicas=1), I do a normal "cycle" of the brokers,
to stop/start them one at a time, giving time for the brokers to handshake
and exchange leadership as necessary. Twice now I have done this, and both
kafka streams consumers have rebalanced only to come up with totally messed
up offsets. The offsets for one group.id is set to 5,000,000 for all
partitions, and the other group.id offsets were set to a number just short
of 7,000,000.

On Cluster B (min.insync.replicas=2), I am running the exact same streams
code. I have seen cases where if I scale up or down twoo quickly (i.e. add
or remove too many streams clients at once) before a rebalance has
finished, the offsets for the group.ids are completely lost. This causes
the streams consumers to reset according to "auto.offset.reset".

In both cases, streams is calculating real-time metrics for data flowing
through our brokers. These are serious issues because it causes them to
completely get the counting wrong, either doubly counting or skipping data
altogether. I have scoured the web and have been unable to find anyone else
having this issue with streams.

I should also mention that all of our old 0.8.2 consumer code (which is
updated to 0.11 client library) never has any problems with offsets. My
guess is because they are still using zookeeper to store their offsets.

This implies to me that the __consumer_offsets topic isn't being utilized
by streams clients correctly.

I'm at a total loss at this point and would greatly appreciate any advice.
Thank you.