Hi Jason

If I understand correctly, when coordinator is changed the consumer
should get 'NotCoordinatorForGroup' exception not 'IllegalGenerationId'.
Topic metadata change? like number of partitions changed ?
I was testing it in a pretty stable cluster, and it was reproduced several
times,
I had no such issue if we change session timeout to 3 minutes.
--- does this rule out the topic metadata change?

The logs are lost because I was running debug mode in our Erlang client to
help debugging this issue for my colleague who's using the new Java client.
My colleague has observed very likely the same pattern as I described above.
He is trying to get on hold a minimal setup for a reliable reproduction.

I will also try to reproduce it in Erlang, and post here a (hopefully
sensible)
sequence of timestamped heartbeat and commit requests and responses.

Will ask more questions if we have new findings.

Regards
-Zaiming



On Fri, Mar 25, 2016 at 5:43 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hi Zaiming,
>
> It rules out the most likely cause of rebalance, but not the only one.
> Rebalances can also be caused by a topic metadata change or a coordinator
> change. Can you post some logs from the consumer around the time that the
> unexpected rebalance occurred?
>
> -Jason
>
> On Fri, Mar 25, 2016 at 12:09 AM, Zaiming Shi <zmst...@gmail.com> wrote:
>
> > Hi Jason
> >
> > thanks for the reply!
> >
> > Forgot to mention that in we tried to test the simplest scenario in which
> > there was only one member in the group. I think that should rule out
> group
> >  rebalancing right?
> >
> > On Thursday, March 24, 2016, Jason Gustafson <ja...@confluent.io> wrote:
> >
> > > HI Zaiming,
> > >
> > > I think the problem is not that commit requests aren't considered as
> > > effective as heartbeats (they are), but that you can't rejoin the group
> > > using only commits/heartbeats. Every time the group rebalances, all
> > members
> > > must rejoin the group by sending a JoinGroup request. Once a rebalance
> > has
> > > begun (e.g. because a new consumer has been started), then each member
> > must
> > > send the JoinGroup before expiration of the session timeout. If not,
> then
> > > they will be kicked out of the group even if they are still sending
> > > heartbeats. Does that make sense?
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Wed, Mar 23, 2016 at 10:03 AM, Zaiming Shi <zmst...@gmail.com
> > > <javascript:;>> wrote:
> > >
> > > > Hi there!
> > > >
> > > > We have noticed that when committing requests are sent intensively,
> we
> > > > receive IllegalGenerationId.
> > > > Here is the settings we had problem with: session-timeout: 30 sec,
> > > > heartbeat-rate: 3 sec.
> > > > Problem resolved by increasing the session timeout to 180 sec.
> > > >
> > > > So I suppose, due to whatever reason (either the client didn't send
> > > > heartbeat, or the broker didn't process the heartbeats in time), the
> > > > session was considered dead in group coordinator.
> > > >
> > > > My question is: why commit requests can't be taken as an indicator of
> > > > member being alive? hence not to kill the session.
> > > >
> > > > Regards
> > > > -Zaiming
> > > >
> > >
> >
>

Reply via email to