Re: Random consumer offset rewinds/resets

2017-08-10 Thread Ismael Juma
Great to hear, thanks for the update.

Ismael

On Wed, Aug 9, 2017 at 4:14 PM, Christiane Lemke  wrote:

> Hi Ismael,
>
> thanks a lot for your for your answer, it was indeed exactly the issue we
> had!
>
> We did see the ticket for the issue before, but the steps to reproduce
> included a consumer group restart, which we didn't do, so that we thought
> we our problem is different.
>
> However, since patching with the fixing commit, everything works fine.
>
> Thanks again! Christiane
>
> On 7 August 2017 at 22:07, Ismael Juma  wrote:
>
> > Hi Christiane,
> >
> > Thanks for the email. That looks like
> > https://issues.apache.org/jira/browse/KAFKA-5600
> >
> > Ismael
> >
> > On Mon, Aug 7, 2017 at 7:04 PM, Christiane Lemke <
> > christiane.le...@gmail.com
> > > wrote:
> >
> > > Hi all,
> > >
> > > we are fighting with offset rewinds of seemingly random size and
> hitting
> > > seemingly random partitions on restarting any node in our kafka
> cluster.
> > We
> > > are running out of ideas - any help or pointers to things to
> investigate
> > > are highly appreciated.
> > >
> > > Our kafka setup is dual data center with two local broker clusters (3
> > nodes
> > > each) and two aggregate broker clusters (5 nodes each), the latter
> > running
> > > mirror maker to consume messages from the local cluster.
> > >
> > > Issues seem to have appeared since we upgraded from 0.10.1.0 to 0.11,
> but
> > > not entirely sure it’s related.
> > >
> > > We first had the theory of too big a consumer offset topic (we use
> > > compaction for it) causing the issues on restart, and indeed we found
> > that
> > > cleaner threads had died after the upgrade. But restarting and cleaning
> > > this topic did not help the issue.
> > >
> > > Logs are pretty silent when it happens, before we cleaned the consumer
> > > offset topic, we got a few of these every time it happened, but no
> longer
> > > now:
> > >
> > > [2017-08-04 11:19:25,970] ERROR [Group Metadata Manager on Broker 472]:
> > > Error loading offsets from __consumer_offsets-14
> > > (kafka.coordinator.group.GroupMetadataManager)
> > > java.lang.IllegalStateException: Unexpected unload of active group
> > > tns-ticket-store-b144c9d1-425a-4b90-8310-f6e886741494 while loading
> > > partition __consumer_offsets-14
> > > at
> > > kafka.coordinator.group.GroupMetadataManager$$anonfun$
> > > loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:600)
> > > at
> > > kafka.coordinator.group.GroupMetadataManager$$anonfun$
> > > loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:595)
> > > at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> > > at
> > > kafka.coordinator.group.GroupMetadataManager.loadGroupsAndOffsets(
> > > GroupMetadataManager.scala:595)
> > > at
> > > kafka.coordinator.group.GroupMetadataManager.kafka$coordinator$group$
> > > GroupMetadataManager$$doLoadGroupsAndOffsets$1(
> > GroupMetadataManager.scala:
> > > 455)
> > > at
> > > kafka.coordinator.group.GroupMetadataManager$$anonfun$
> > > loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:441)
> > > at
> > > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> > > KafkaScheduler.scala:110)
> > > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
> > > at
> > > java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > > at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(
> > > ScheduledThreadPoolExecutor.java:293)
> > > at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1142)
> > > at
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:617)
> > > at java.lang.Thread.run(Thread.java:748)
> > >
> > > Does this seem familiar to anyone? Is there any suggestion as to what
> to
> > > look into closer to investigate this issue? Happy to give more details
> > > about anything that might be helpful.
> > >
> > > Thanks a lot in advance,
> > >
> > > Christiane
> > >
> >
>


Re: Random consumer offset rewinds/resets

2017-08-09 Thread Christiane Lemke
Hi Ismael,

thanks a lot for your for your answer, it was indeed exactly the issue we
had!

We did see the ticket for the issue before, but the steps to reproduce
included a consumer group restart, which we didn't do, so that we thought
we our problem is different.

However, since patching with the fixing commit, everything works fine.

Thanks again! Christiane

On 7 August 2017 at 22:07, Ismael Juma  wrote:

> Hi Christiane,
>
> Thanks for the email. That looks like
> https://issues.apache.org/jira/browse/KAFKA-5600
>
> Ismael
>
> On Mon, Aug 7, 2017 at 7:04 PM, Christiane Lemke <
> christiane.le...@gmail.com
> > wrote:
>
> > Hi all,
> >
> > we are fighting with offset rewinds of seemingly random size and hitting
> > seemingly random partitions on restarting any node in our kafka cluster.
> We
> > are running out of ideas - any help or pointers to things to investigate
> > are highly appreciated.
> >
> > Our kafka setup is dual data center with two local broker clusters (3
> nodes
> > each) and two aggregate broker clusters (5 nodes each), the latter
> running
> > mirror maker to consume messages from the local cluster.
> >
> > Issues seem to have appeared since we upgraded from 0.10.1.0 to 0.11, but
> > not entirely sure it’s related.
> >
> > We first had the theory of too big a consumer offset topic (we use
> > compaction for it) causing the issues on restart, and indeed we found
> that
> > cleaner threads had died after the upgrade. But restarting and cleaning
> > this topic did not help the issue.
> >
> > Logs are pretty silent when it happens, before we cleaned the consumer
> > offset topic, we got a few of these every time it happened, but no longer
> > now:
> >
> > [2017-08-04 11:19:25,970] ERROR [Group Metadata Manager on Broker 472]:
> > Error loading offsets from __consumer_offsets-14
> > (kafka.coordinator.group.GroupMetadataManager)
> > java.lang.IllegalStateException: Unexpected unload of active group
> > tns-ticket-store-b144c9d1-425a-4b90-8310-f6e886741494 while loading
> > partition __consumer_offsets-14
> > at
> > kafka.coordinator.group.GroupMetadataManager$$anonfun$
> > loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:600)
> > at
> > kafka.coordinator.group.GroupMetadataManager$$anonfun$
> > loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:595)
> > at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> > at
> > kafka.coordinator.group.GroupMetadataManager.loadGroupsAndOffsets(
> > GroupMetadataManager.scala:595)
> > at
> > kafka.coordinator.group.GroupMetadataManager.kafka$coordinator$group$
> > GroupMetadataManager$$doLoadGroupsAndOffsets$1(
> GroupMetadataManager.scala:
> > 455)
> > at
> > kafka.coordinator.group.GroupMetadataManager$$anonfun$
> > loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:441)
> > at
> > kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> > KafkaScheduler.scala:110)
> > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
> > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(
> > ScheduledThreadPoolExecutor.java:293)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > Does this seem familiar to anyone? Is there any suggestion as to what to
> > look into closer to investigate this issue? Happy to give more details
> > about anything that might be helpful.
> >
> > Thanks a lot in advance,
> >
> > Christiane
> >
>


Re: Random consumer offset rewinds/resets

2017-08-07 Thread Ismael Juma
Hi Christiane,

Thanks for the email. That looks like
https://issues.apache.org/jira/browse/KAFKA-5600

Ismael

On Mon, Aug 7, 2017 at 7:04 PM, Christiane Lemke  wrote:

> Hi all,
>
> we are fighting with offset rewinds of seemingly random size and hitting
> seemingly random partitions on restarting any node in our kafka cluster. We
> are running out of ideas - any help or pointers to things to investigate
> are highly appreciated.
>
> Our kafka setup is dual data center with two local broker clusters (3 nodes
> each) and two aggregate broker clusters (5 nodes each), the latter running
> mirror maker to consume messages from the local cluster.
>
> Issues seem to have appeared since we upgraded from 0.10.1.0 to 0.11, but
> not entirely sure it’s related.
>
> We first had the theory of too big a consumer offset topic (we use
> compaction for it) causing the issues on restart, and indeed we found that
> cleaner threads had died after the upgrade. But restarting and cleaning
> this topic did not help the issue.
>
> Logs are pretty silent when it happens, before we cleaned the consumer
> offset topic, we got a few of these every time it happened, but no longer
> now:
>
> [2017-08-04 11:19:25,970] ERROR [Group Metadata Manager on Broker 472]:
> Error loading offsets from __consumer_offsets-14
> (kafka.coordinator.group.GroupMetadataManager)
> java.lang.IllegalStateException: Unexpected unload of active group
> tns-ticket-store-b144c9d1-425a-4b90-8310-f6e886741494 while loading
> partition __consumer_offsets-14
> at
> kafka.coordinator.group.GroupMetadataManager$$anonfun$
> loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:600)
> at
> kafka.coordinator.group.GroupMetadataManager$$anonfun$
> loadGroupsAndOffsets$6.apply(GroupMetadataManager.scala:595)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at
> kafka.coordinator.group.GroupMetadataManager.loadGroupsAndOffsets(
> GroupMetadataManager.scala:595)
> at
> kafka.coordinator.group.GroupMetadataManager.kafka$coordinator$group$
> GroupMetadataManager$$doLoadGroupsAndOffsets$1(GroupMetadataManager.scala:
> 455)
> at
> kafka.coordinator.group.GroupMetadataManager$$anonfun$
> loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:441)
> at
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(
> KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
>
> Does this seem familiar to anyone? Is there any suggestion as to what to
> look into closer to investigate this issue? Happy to give more details
> about anything that might be helpful.
>
> Thanks a lot in advance,
>
> Christiane
>