Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-09 Thread Ismael Juma
Sounds good to me.

On Wed, Aug 9, 2017 at 11:32 PM, Colin McCabe  wrote:

> Hi Apurva,
>
> Thanks for taking another look.  Responses below:
>
> On Mon, Aug 7, 2017, at 17:47, Apurva Mehta wrote:
> > The KIP looks good to me. In your latest proposal, the change of state
> > would be captured as followed in the metrics for groups using Kafka for
> > membership management:
> >
> > PreparingRebalance -> CompletingRebalance -> Stable -> Dead?
>
> Right.  As described in
> core/src/main/scala/kafka/coordinator/group/GroupMetadata.scala
>
> >
> > If a group is just being used to store offsets, then it is always Empty?
>
> I believe so.  Groups can also be Empty if they have no more members,
> but the offsets have not yet expired.
>
> Of course, this KIP is just exposing the metrics, not changing how
> groups work in any way.
>
> Maybe we should start a vote, if this looks good to everyone?
>
> best,
> Colin
>
>
> >
> > If so, this makes sense to me.
> >
> > Thanks,
> > Apurva
> >
> > On Mon, Aug 7, 2017 at 5:09 PM, Colin McCabe  wrote:
> >
> > > How about PreparingRebalance / CompletingRebalance?
> > >
> > > cheers,
> > > Colin
> > >
> > >
> > > On Fri, Aug 4, 2017, at 09:03, Ismael Juma wrote:
> > > > I agree that we should make them consistent. I think RebalanceJoin
> and
> > > > RebalanceAssignment are reasonable names. I think they are a bit more
> > > > descriptive than `PreparingRebalance` and `CompletingRebalance`. If
> we
> > > > need
> > > > to add more states, it seems a little easier to do if the states are
> a
> > > > bit
> > > > more descriptive. I am OK with either of the 2 options as I think
> they
> > > > are
> > > > both better than the status quo.
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Aug 4, 2017 at 4:52 PM, Jason Gustafson 
> > > > wrote:
> > > >
> > > > > Hey Guozhang,
> > > > >
> > > > > Usually I think such naming inconsistencies are best avoided. It
> adds
> > > > > another level of confusion for people who have to dip into the
> code,
> > > figure
> > > > > out a problem, and ultimately explain it. Since we already have the
> > > > > PreparingRebalance state, maybe we could just rename the
> AwaitingSync
> > > state
> > > > > to CompletingRebalance?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang 
> > > wrote:
> > > > >
> > > > > > From an ops person's view who are mostly likely watching the
> metrics
> > > > > these
> > > > > > names may not be very clear as people may not know the internals
> > > well.
> > > > > I'd
> > > > > > prefer PrepareRebalance and CompleteRebalance since they may be
> > > easier to
> > > > > > understand thought not 100 percent accurately match to internal
> > > > > > implementation.
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson <
> ja...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Colin, Guozhang,
> > > > > > >
> > > > > > > I agree the current state names are not ideal for end users. I
> > > tend to
> > > > > > see
> > > > > > > the rebalance as joining the group and receiving the
> assignment.
> > > Maybe
> > > > > > the
> > > > > > > states could be named in those terms? For example:
> RebalanceJoin
> > > and
> > > > > > > RebalanceAssignment. What do you think?
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <
> > > wangg...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I feel we can change `AwaitSync` to `completeRebalance` while
> > > keeping
> > > > > > the
> > > > > > > > other as is.
> > > > > > > >
> > > > > > > > cc Jason?
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <
> > > cmcc...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks for the explanation.  I guess maybe we should just
> keep
> > > the
> > > > > > group
> > > > > > > >> names as they are, then?
> > > > > > > >>
> > > > > > > >> best,
> > > > > > > >> Colin
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > > > > > >> > To me `PreparingRebalance` sounds better than
> > > `StartingRebalance`
> > > > > > > since
> > > > > > > >> > only by the end of that stage we have formed a new group.
> More
> > > > > > > >> > specifically, this this the workflow from the
> coordinator's
> > > point
> > > > > of
> > > > > > > >> > view:
> > > > > > > >> >
> > > > > > > >> > 1. decided to trigger a rebalance, enter
> PreparingRebalance
> > > phase;
> > > > > > > >> >   |
> > > > > > > >> >   |   send out error code for all
> heartbeat
> > > > > reponses
> > > > > > > >> >  \|/
> > > > > > > >> >   |
> > > > > > > >> >   |   waiting for join group requests from
> 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-09 Thread Colin McCabe
Hi Apurva,

Thanks for taking another look.  Responses below:

On Mon, Aug 7, 2017, at 17:47, Apurva Mehta wrote:
> The KIP looks good to me. In your latest proposal, the change of state
> would be captured as followed in the metrics for groups using Kafka for
> membership management:
> 
> PreparingRebalance -> CompletingRebalance -> Stable -> Dead?

Right.  As described in
core/src/main/scala/kafka/coordinator/group/GroupMetadata.scala

> 
> If a group is just being used to store offsets, then it is always Empty?

I believe so.  Groups can also be Empty if they have no more members,
but the offsets have not yet expired.

Of course, this KIP is just exposing the metrics, not changing how
groups work in any way.

Maybe we should start a vote, if this looks good to everyone?

best,
Colin


> 
> If so, this makes sense to me.
> 
> Thanks,
> Apurva
> 
> On Mon, Aug 7, 2017 at 5:09 PM, Colin McCabe  wrote:
> 
> > How about PreparingRebalance / CompletingRebalance?
> >
> > cheers,
> > Colin
> >
> >
> > On Fri, Aug 4, 2017, at 09:03, Ismael Juma wrote:
> > > I agree that we should make them consistent. I think RebalanceJoin and
> > > RebalanceAssignment are reasonable names. I think they are a bit more
> > > descriptive than `PreparingRebalance` and `CompletingRebalance`. If we
> > > need
> > > to add more states, it seems a little easier to do if the states are a
> > > bit
> > > more descriptive. I am OK with either of the 2 options as I think they
> > > are
> > > both better than the status quo.
> > >
> > > Ismael
> > >
> > > On Fri, Aug 4, 2017 at 4:52 PM, Jason Gustafson 
> > > wrote:
> > >
> > > > Hey Guozhang,
> > > >
> > > > Usually I think such naming inconsistencies are best avoided. It adds
> > > > another level of confusion for people who have to dip into the code,
> > figure
> > > > out a problem, and ultimately explain it. Since we already have the
> > > > PreparingRebalance state, maybe we could just rename the AwaitingSync
> > state
> > > > to CompletingRebalance?
> > > >
> > > > -Jason
> > > >
> > > > On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang 
> > wrote:
> > > >
> > > > > From an ops person's view who are mostly likely watching the metrics
> > > > these
> > > > > names may not be very clear as people may not know the internals
> > well.
> > > > I'd
> > > > > prefer PrepareRebalance and CompleteRebalance since they may be
> > easier to
> > > > > understand thought not 100 percent accurately match to internal
> > > > > implementation.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson 
> > > > > wrote:
> > > > >
> > > > > > Hey Colin, Guozhang,
> > > > > >
> > > > > > I agree the current state names are not ideal for end users. I
> > tend to
> > > > > see
> > > > > > the rebalance as joining the group and receiving the assignment.
> > Maybe
> > > > > the
> > > > > > states could be named in those terms? For example: RebalanceJoin
> > and
> > > > > > RebalanceAssignment. What do you think?
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <
> > wangg...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I feel we can change `AwaitSync` to `completeRebalance` while
> > keeping
> > > > > the
> > > > > > > other as is.
> > > > > > >
> > > > > > > cc Jason?
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <
> > cmcc...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks for the explanation.  I guess maybe we should just keep
> > the
> > > > > group
> > > > > > >> names as they are, then?
> > > > > > >>
> > > > > > >> best,
> > > > > > >> Colin
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > > > > >> > To me `PreparingRebalance` sounds better than
> > `StartingRebalance`
> > > > > > since
> > > > > > >> > only by the end of that stage we have formed a new group. More
> > > > > > >> > specifically, this this the workflow from the coordinator's
> > point
> > > > of
> > > > > > >> > view:
> > > > > > >> >
> > > > > > >> > 1. decided to trigger a rebalance, enter PreparingRebalance
> > phase;
> > > > > > >> >   |
> > > > > > >> >   |   send out error code for all heartbeat
> > > > reponses
> > > > > > >> >  \|/
> > > > > > >> >   |
> > > > > > >> >   |   waiting for join group requests from
> > members
> > > > > > >> >  \|/
> > > > > > >> > 2. formed a new group, increment the generation number, now
> > start
> > > > > > >> > rebalancing, entering AwaitSync phase:
> > > > > > >> >   |
> > > > > > >> >   |   send out the join group responses for
> > > > whoever
> > > > > > >> > requested join
> > > > > > >> >  \|/
> > > > > > >> >

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-07 Thread Apurva Mehta
Hi Colin,

The KIP looks good to me. In your latest proposal, the change of state
would be captured as followed in the metrics for groups using Kafka for
membership management:

PreparingRebalance -> CompletingRebalance -> Stable -> Dead?

If a group is just being used to store offsets, then it is always Empty?

If so, this makes sense to me.

Thanks,
Apurva

On Mon, Aug 7, 2017 at 5:09 PM, Colin McCabe  wrote:

> How about PreparingRebalance / CompletingRebalance?
>
> cheers,
> Colin
>
>
> On Fri, Aug 4, 2017, at 09:03, Ismael Juma wrote:
> > I agree that we should make them consistent. I think RebalanceJoin and
> > RebalanceAssignment are reasonable names. I think they are a bit more
> > descriptive than `PreparingRebalance` and `CompletingRebalance`. If we
> > need
> > to add more states, it seems a little easier to do if the states are a
> > bit
> > more descriptive. I am OK with either of the 2 options as I think they
> > are
> > both better than the status quo.
> >
> > Ismael
> >
> > On Fri, Aug 4, 2017 at 4:52 PM, Jason Gustafson 
> > wrote:
> >
> > > Hey Guozhang,
> > >
> > > Usually I think such naming inconsistencies are best avoided. It adds
> > > another level of confusion for people who have to dip into the code,
> figure
> > > out a problem, and ultimately explain it. Since we already have the
> > > PreparingRebalance state, maybe we could just rename the AwaitingSync
> state
> > > to CompletingRebalance?
> > >
> > > -Jason
> > >
> > > On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang 
> wrote:
> > >
> > > > From an ops person's view who are mostly likely watching the metrics
> > > these
> > > > names may not be very clear as people may not know the internals
> well.
> > > I'd
> > > > prefer PrepareRebalance and CompleteRebalance since they may be
> easier to
> > > > understand thought not 100 percent accurately match to internal
> > > > implementation.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson 
> > > > wrote:
> > > >
> > > > > Hey Colin, Guozhang,
> > > > >
> > > > > I agree the current state names are not ideal for end users. I
> tend to
> > > > see
> > > > > the rebalance as joining the group and receiving the assignment.
> Maybe
> > > > the
> > > > > states could be named in those terms? For example: RebalanceJoin
> and
> > > > > RebalanceAssignment. What do you think?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <
> wangg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I feel we can change `AwaitSync` to `completeRebalance` while
> keeping
> > > > the
> > > > > > other as is.
> > > > > >
> > > > > > cc Jason?
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <
> cmcc...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > >> Thanks for the explanation.  I guess maybe we should just keep
> the
> > > > group
> > > > > >> names as they are, then?
> > > > > >>
> > > > > >> best,
> > > > > >> Colin
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > > > >> > To me `PreparingRebalance` sounds better than
> `StartingRebalance`
> > > > > since
> > > > > >> > only by the end of that stage we have formed a new group. More
> > > > > >> > specifically, this this the workflow from the coordinator's
> point
> > > of
> > > > > >> > view:
> > > > > >> >
> > > > > >> > 1. decided to trigger a rebalance, enter PreparingRebalance
> phase;
> > > > > >> >   |
> > > > > >> >   |   send out error code for all heartbeat
> > > reponses
> > > > > >> >  \|/
> > > > > >> >   |
> > > > > >> >   |   waiting for join group requests from
> members
> > > > > >> >  \|/
> > > > > >> > 2. formed a new group, increment the generation number, now
> start
> > > > > >> > rebalancing, entering AwaitSync phase:
> > > > > >> >   |
> > > > > >> >   |   send out the join group responses for
> > > whoever
> > > > > >> > requested join
> > > > > >> >  \|/
> > > > > >> >   |
> > > > > >> >   |   waiting for the sync group request from
> the
> > > > > leader
> > > > > >> >  \|/
> > > > > >> > 3. received assignment from the leader; the rebalance has
> ended,
> > > > start
> > > > > >> > ticking for all members, entering Stable phase.
> > > > > >> >   |
> > > > > >> >   |   for whoever else sending the sync group
> > > > request,
> > > > > >> > reply with the assignment
> > > > > >> >  \|/
> > > > > >> >
> > > > > >> > So from the coordinator's point of view the rebalance starts
> at
> > > > > >> beginning
> > > > > >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > > > > >> > `AwaitSync`
> > > 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-07 Thread Colin McCabe
How about PreparingRebalance / CompletingRebalance?

cheers,
Colin


On Fri, Aug 4, 2017, at 09:03, Ismael Juma wrote:
> I agree that we should make them consistent. I think RebalanceJoin and
> RebalanceAssignment are reasonable names. I think they are a bit more
> descriptive than `PreparingRebalance` and `CompletingRebalance`. If we
> need
> to add more states, it seems a little easier to do if the states are a
> bit
> more descriptive. I am OK with either of the 2 options as I think they
> are
> both better than the status quo.
> 
> Ismael
> 
> On Fri, Aug 4, 2017 at 4:52 PM, Jason Gustafson 
> wrote:
> 
> > Hey Guozhang,
> >
> > Usually I think such naming inconsistencies are best avoided. It adds
> > another level of confusion for people who have to dip into the code, figure
> > out a problem, and ultimately explain it. Since we already have the
> > PreparingRebalance state, maybe we could just rename the AwaitingSync state
> > to CompletingRebalance?
> >
> > -Jason
> >
> > On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang  wrote:
> >
> > > From an ops person's view who are mostly likely watching the metrics
> > these
> > > names may not be very clear as people may not know the internals well.
> > I'd
> > > prefer PrepareRebalance and CompleteRebalance since they may be easier to
> > > understand thought not 100 percent accurately match to internal
> > > implementation.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson 
> > > wrote:
> > >
> > > > Hey Colin, Guozhang,
> > > >
> > > > I agree the current state names are not ideal for end users. I tend to
> > > see
> > > > the rebalance as joining the group and receiving the assignment. Maybe
> > > the
> > > > states could be named in those terms? For example: RebalanceJoin and
> > > > RebalanceAssignment. What do you think?
> > > >
> > > > -Jason
> > > >
> > > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang 
> > > > wrote:
> > > >
> > > > > I feel we can change `AwaitSync` to `completeRebalance` while keeping
> > > the
> > > > > other as is.
> > > > >
> > > > > cc Jason?
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe 
> > > > wrote:
> > > > >
> > > > >> Thanks for the explanation.  I guess maybe we should just keep the
> > > group
> > > > >> names as they are, then?
> > > > >>
> > > > >> best,
> > > > >> Colin
> > > > >>
> > > > >>
> > > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > > >> > To me `PreparingRebalance` sounds better than `StartingRebalance`
> > > > since
> > > > >> > only by the end of that stage we have formed a new group. More
> > > > >> > specifically, this this the workflow from the coordinator's point
> > of
> > > > >> > view:
> > > > >> >
> > > > >> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
> > > > >> >   |
> > > > >> >   |   send out error code for all heartbeat
> > reponses
> > > > >> >  \|/
> > > > >> >   |
> > > > >> >   |   waiting for join group requests from members
> > > > >> >  \|/
> > > > >> > 2. formed a new group, increment the generation number, now start
> > > > >> > rebalancing, entering AwaitSync phase:
> > > > >> >   |
> > > > >> >   |   send out the join group responses for
> > whoever
> > > > >> > requested join
> > > > >> >  \|/
> > > > >> >   |
> > > > >> >   |   waiting for the sync group request from the
> > > > leader
> > > > >> >  \|/
> > > > >> > 3. received assignment from the leader; the rebalance has ended,
> > > start
> > > > >> > ticking for all members, entering Stable phase.
> > > > >> >   |
> > > > >> >   |   for whoever else sending the sync group
> > > request,
> > > > >> > reply with the assignment
> > > > >> >  \|/
> > > > >> >
> > > > >> > So from the coordinator's point of view the rebalance starts at
> > > > >> beginning
> > > > >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > > > >> > `AwaitSync`
> > > > >> > itself to `CompletingRebalance`.
> > > > >> >
> > > > >> > Guozhang
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma 
> > > > wrote:
> > > > >> >
> > > > >> > > Hi Guozhang,
> > > > >> > >
> > > > >> > > Thanks for the clarification. The naming does seem a bit
> > unclear.
> > > > >> Maybe
> > > > >> > > `PreparingRebalance` could be `StartingRebalance` or something
> > > that
> > > > >> makes
> > > > >> > > it clear that it is part of the rebalance instead of a step
> > before
> > > > the
> > > > >> > > actual rebalance. `AwaitingSync` could also be
> > > > `CompletingRebalance`,
> > > > >> but
> > > > >> > > not sure if that's 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-04 Thread Ismael Juma
I agree that we should make them consistent. I think RebalanceJoin and
RebalanceAssignment are reasonable names. I think they are a bit more
descriptive than `PreparingRebalance` and `CompletingRebalance`. If we need
to add more states, it seems a little easier to do if the states are a bit
more descriptive. I am OK with either of the 2 options as I think they are
both better than the status quo.

Ismael

On Fri, Aug 4, 2017 at 4:52 PM, Jason Gustafson  wrote:

> Hey Guozhang,
>
> Usually I think such naming inconsistencies are best avoided. It adds
> another level of confusion for people who have to dip into the code, figure
> out a problem, and ultimately explain it. Since we already have the
> PreparingRebalance state, maybe we could just rename the AwaitingSync state
> to CompletingRebalance?
>
> -Jason
>
> On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang  wrote:
>
> > From an ops person's view who are mostly likely watching the metrics
> these
> > names may not be very clear as people may not know the internals well.
> I'd
> > prefer PrepareRebalance and CompleteRebalance since they may be easier to
> > understand thought not 100 percent accurately match to internal
> > implementation.
> >
> >
> > Guozhang
> >
> >
> > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson 
> > wrote:
> >
> > > Hey Colin, Guozhang,
> > >
> > > I agree the current state names are not ideal for end users. I tend to
> > see
> > > the rebalance as joining the group and receiving the assignment. Maybe
> > the
> > > states could be named in those terms? For example: RebalanceJoin and
> > > RebalanceAssignment. What do you think?
> > >
> > > -Jason
> > >
> > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang 
> > > wrote:
> > >
> > > > I feel we can change `AwaitSync` to `completeRebalance` while keeping
> > the
> > > > other as is.
> > > >
> > > > cc Jason?
> > > >
> > > > Guozhang
> > > >
> > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe 
> > > wrote:
> > > >
> > > >> Thanks for the explanation.  I guess maybe we should just keep the
> > group
> > > >> names as they are, then?
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >>
> > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > >> > To me `PreparingRebalance` sounds better than `StartingRebalance`
> > > since
> > > >> > only by the end of that stage we have formed a new group. More
> > > >> > specifically, this this the workflow from the coordinator's point
> of
> > > >> > view:
> > > >> >
> > > >> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
> > > >> >   |
> > > >> >   |   send out error code for all heartbeat
> reponses
> > > >> >  \|/
> > > >> >   |
> > > >> >   |   waiting for join group requests from members
> > > >> >  \|/
> > > >> > 2. formed a new group, increment the generation number, now start
> > > >> > rebalancing, entering AwaitSync phase:
> > > >> >   |
> > > >> >   |   send out the join group responses for
> whoever
> > > >> > requested join
> > > >> >  \|/
> > > >> >   |
> > > >> >   |   waiting for the sync group request from the
> > > leader
> > > >> >  \|/
> > > >> > 3. received assignment from the leader; the rebalance has ended,
> > start
> > > >> > ticking for all members, entering Stable phase.
> > > >> >   |
> > > >> >   |   for whoever else sending the sync group
> > request,
> > > >> > reply with the assignment
> > > >> >  \|/
> > > >> >
> > > >> > So from the coordinator's point of view the rebalance starts at
> > > >> beginning
> > > >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > > >> > `AwaitSync`
> > > >> > itself to `CompletingRebalance`.
> > > >> >
> > > >> > Guozhang
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma 
> > > wrote:
> > > >> >
> > > >> > > Hi Guozhang,
> > > >> > >
> > > >> > > Thanks for the clarification. The naming does seem a bit
> unclear.
> > > >> Maybe
> > > >> > > `PreparingRebalance` could be `StartingRebalance` or something
> > that
> > > >> makes
> > > >> > > it clear that it is part of the rebalance instead of a step
> before
> > > the
> > > >> > > actual rebalance. `AwaitingSync` could also be
> > > `CompletingRebalance`,
> > > >> but
> > > >> > > not sure if that's better.
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <
> > wangg...@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > > Actually Rebalancing includes two steps, and we name them
> > > >> > > PrepareRebalance
> > > >> > > > and WaitSync (arguably they may not be the best names). But
> > these
> > > >> two
> > > >> > > steps
> > > >> > 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-04 Thread Guozhang Wang
Yeah I think consistency in present continuous tense sounds better:
CompletingRebalance.

Guozhang

On Fri, Aug 4, 2017 at 8:52 AM, Jason Gustafson  wrote:

> Hey Guozhang,
>
> Usually I think such naming inconsistencies are best avoided. It adds
> another level of confusion for people who have to dip into the code, figure
> out a problem, and ultimately explain it. Since we already have the
> PreparingRebalance state, maybe we could just rename the AwaitingSync state
> to CompletingRebalance?
>
> -Jason
>
> On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang  wrote:
>
> > From an ops person's view who are mostly likely watching the metrics
> these
> > names may not be very clear as people may not know the internals well.
> I'd
> > prefer PrepareRebalance and CompleteRebalance since they may be easier to
> > understand thought not 100 percent accurately match to internal
> > implementation.
> >
> >
> > Guozhang
> >
> >
> > On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson 
> > wrote:
> >
> > > Hey Colin, Guozhang,
> > >
> > > I agree the current state names are not ideal for end users. I tend to
> > see
> > > the rebalance as joining the group and receiving the assignment. Maybe
> > the
> > > states could be named in those terms? For example: RebalanceJoin and
> > > RebalanceAssignment. What do you think?
> > >
> > > -Jason
> > >
> > > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang 
> > > wrote:
> > >
> > > > I feel we can change `AwaitSync` to `completeRebalance` while keeping
> > the
> > > > other as is.
> > > >
> > > > cc Jason?
> > > >
> > > > Guozhang
> > > >
> > > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe 
> > > wrote:
> > > >
> > > >> Thanks for the explanation.  I guess maybe we should just keep the
> > group
> > > >> names as they are, then?
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >>
> > > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > > >> > To me `PreparingRebalance` sounds better than `StartingRebalance`
> > > since
> > > >> > only by the end of that stage we have formed a new group. More
> > > >> > specifically, this this the workflow from the coordinator's point
> of
> > > >> > view:
> > > >> >
> > > >> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
> > > >> >   |
> > > >> >   |   send out error code for all heartbeat
> reponses
> > > >> >  \|/
> > > >> >   |
> > > >> >   |   waiting for join group requests from members
> > > >> >  \|/
> > > >> > 2. formed a new group, increment the generation number, now start
> > > >> > rebalancing, entering AwaitSync phase:
> > > >> >   |
> > > >> >   |   send out the join group responses for
> whoever
> > > >> > requested join
> > > >> >  \|/
> > > >> >   |
> > > >> >   |   waiting for the sync group request from the
> > > leader
> > > >> >  \|/
> > > >> > 3. received assignment from the leader; the rebalance has ended,
> > start
> > > >> > ticking for all members, entering Stable phase.
> > > >> >   |
> > > >> >   |   for whoever else sending the sync group
> > request,
> > > >> > reply with the assignment
> > > >> >  \|/
> > > >> >
> > > >> > So from the coordinator's point of view the rebalance starts at
> > > >> beginning
> > > >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > > >> > `AwaitSync`
> > > >> > itself to `CompletingRebalance`.
> > > >> >
> > > >> > Guozhang
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma 
> > > wrote:
> > > >> >
> > > >> > > Hi Guozhang,
> > > >> > >
> > > >> > > Thanks for the clarification. The naming does seem a bit
> unclear.
> > > >> Maybe
> > > >> > > `PreparingRebalance` could be `StartingRebalance` or something
> > that
> > > >> makes
> > > >> > > it clear that it is part of the rebalance instead of a step
> before
> > > the
> > > >> > > actual rebalance. `AwaitingSync` could also be
> > > `CompletingRebalance`,
> > > >> but
> > > >> > > not sure if that's better.
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <
> > wangg...@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > > Actually Rebalancing includes two steps, and we name them
> > > >> > > PrepareRebalance
> > > >> > > > and WaitSync (arguably they may not be the best names). But
> > these
> > > >> two
> > > >> > > steps
> > > >> > > > together should be treated as the complete rebalance cycle.
> > > >> > > >
> > > >> > > >
> > > >> > > > Guozhang
> > > >> > > >
> > > >> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <
> > > cmcc...@apache.org>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > Hi all,
> > > >> > > > >
> > > >> 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-04 Thread Jason Gustafson
Hey Guozhang,

Usually I think such naming inconsistencies are best avoided. It adds
another level of confusion for people who have to dip into the code, figure
out a problem, and ultimately explain it. Since we already have the
PreparingRebalance state, maybe we could just rename the AwaitingSync state
to CompletingRebalance?

-Jason

On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang  wrote:

> From an ops person's view who are mostly likely watching the metrics these
> names may not be very clear as people may not know the internals well. I'd
> prefer PrepareRebalance and CompleteRebalance since they may be easier to
> understand thought not 100 percent accurately match to internal
> implementation.
>
>
> Guozhang
>
>
> On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson 
> wrote:
>
> > Hey Colin, Guozhang,
> >
> > I agree the current state names are not ideal for end users. I tend to
> see
> > the rebalance as joining the group and receiving the assignment. Maybe
> the
> > states could be named in those terms? For example: RebalanceJoin and
> > RebalanceAssignment. What do you think?
> >
> > -Jason
> >
> > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang 
> > wrote:
> >
> > > I feel we can change `AwaitSync` to `completeRebalance` while keeping
> the
> > > other as is.
> > >
> > > cc Jason?
> > >
> > > Guozhang
> > >
> > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe 
> > wrote:
> > >
> > >> Thanks for the explanation.  I guess maybe we should just keep the
> group
> > >> names as they are, then?
> > >>
> > >> best,
> > >> Colin
> > >>
> > >>
> > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > >> > To me `PreparingRebalance` sounds better than `StartingRebalance`
> > since
> > >> > only by the end of that stage we have formed a new group. More
> > >> > specifically, this this the workflow from the coordinator's point of
> > >> > view:
> > >> >
> > >> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
> > >> >   |
> > >> >   |   send out error code for all heartbeat reponses
> > >> >  \|/
> > >> >   |
> > >> >   |   waiting for join group requests from members
> > >> >  \|/
> > >> > 2. formed a new group, increment the generation number, now start
> > >> > rebalancing, entering AwaitSync phase:
> > >> >   |
> > >> >   |   send out the join group responses for whoever
> > >> > requested join
> > >> >  \|/
> > >> >   |
> > >> >   |   waiting for the sync group request from the
> > leader
> > >> >  \|/
> > >> > 3. received assignment from the leader; the rebalance has ended,
> start
> > >> > ticking for all members, entering Stable phase.
> > >> >   |
> > >> >   |   for whoever else sending the sync group
> request,
> > >> > reply with the assignment
> > >> >  \|/
> > >> >
> > >> > So from the coordinator's point of view the rebalance starts at
> > >> beginning
> > >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > >> > `AwaitSync`
> > >> > itself to `CompletingRebalance`.
> > >> >
> > >> > Guozhang
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma 
> > wrote:
> > >> >
> > >> > > Hi Guozhang,
> > >> > >
> > >> > > Thanks for the clarification. The naming does seem a bit unclear.
> > >> Maybe
> > >> > > `PreparingRebalance` could be `StartingRebalance` or something
> that
> > >> makes
> > >> > > it clear that it is part of the rebalance instead of a step before
> > the
> > >> > > actual rebalance. `AwaitingSync` could also be
> > `CompletingRebalance`,
> > >> but
> > >> > > not sure if that's better.
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <
> wangg...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > Actually Rebalancing includes two steps, and we name them
> > >> > > PrepareRebalance
> > >> > > > and WaitSync (arguably they may not be the best names). But
> these
> > >> two
> > >> > > steps
> > >> > > > together should be treated as the complete rebalance cycle.
> > >> > > >
> > >> > > >
> > >> > > > Guozhang
> > >> > > >
> > >> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <
> > cmcc...@apache.org>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi all,
> > >> > > > >
> > >> > > > > I think maybe it makes sense to rename the
> "PreparingRebalance"
> > >> > > consumer
> > >> > > > > group state to "Rebalancing."  To me, "Preparing" implies that
> > >> there
> > >> > > > > will be a later "rebalance" state that follows-- but there is
> > not.
> > >> > > > > Since we're now exposing this state name publicly in these
> > >> metrics,
> > >> > > > > perhaps it makes sense to do this rename now.  Thoughts?
> > >> > > > >
> > >> > > > > best,
> 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-03 Thread Guozhang Wang
>From an ops person's view who are mostly likely watching the metrics these
names may not be very clear as people may not know the internals well. I'd
prefer PrepareRebalance and CompleteRebalance since they may be easier to
understand thought not 100 percent accurately match to internal
implementation.


Guozhang


On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson  wrote:

> Hey Colin, Guozhang,
>
> I agree the current state names are not ideal for end users. I tend to see
> the rebalance as joining the group and receiving the assignment. Maybe the
> states could be named in those terms? For example: RebalanceJoin and
> RebalanceAssignment. What do you think?
>
> -Jason
>
> On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang 
> wrote:
>
> > I feel we can change `AwaitSync` to `completeRebalance` while keeping the
> > other as is.
> >
> > cc Jason?
> >
> > Guozhang
> >
> > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe 
> wrote:
> >
> >> Thanks for the explanation.  I guess maybe we should just keep the group
> >> names as they are, then?
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> >> > To me `PreparingRebalance` sounds better than `StartingRebalance`
> since
> >> > only by the end of that stage we have formed a new group. More
> >> > specifically, this this the workflow from the coordinator's point of
> >> > view:
> >> >
> >> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
> >> >   |
> >> >   |   send out error code for all heartbeat reponses
> >> >  \|/
> >> >   |
> >> >   |   waiting for join group requests from members
> >> >  \|/
> >> > 2. formed a new group, increment the generation number, now start
> >> > rebalancing, entering AwaitSync phase:
> >> >   |
> >> >   |   send out the join group responses for whoever
> >> > requested join
> >> >  \|/
> >> >   |
> >> >   |   waiting for the sync group request from the
> leader
> >> >  \|/
> >> > 3. received assignment from the leader; the rebalance has ended, start
> >> > ticking for all members, entering Stable phase.
> >> >   |
> >> >   |   for whoever else sending the sync group request,
> >> > reply with the assignment
> >> >  \|/
> >> >
> >> > So from the coordinator's point of view the rebalance starts at
> >> beginning
> >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> >> > `AwaitSync`
> >> > itself to `CompletingRebalance`.
> >> >
> >> > Guozhang
> >> >
> >> >
> >> >
> >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma 
> wrote:
> >> >
> >> > > Hi Guozhang,
> >> > >
> >> > > Thanks for the clarification. The naming does seem a bit unclear.
> >> Maybe
> >> > > `PreparingRebalance` could be `StartingRebalance` or something that
> >> makes
> >> > > it clear that it is part of the rebalance instead of a step before
> the
> >> > > actual rebalance. `AwaitingSync` could also be
> `CompletingRebalance`,
> >> but
> >> > > not sure if that's better.
> >> > >
> >> > > Ismael
> >> > >
> >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang 
> >> wrote:
> >> > >
> >> > > > Actually Rebalancing includes two steps, and we name them
> >> > > PrepareRebalance
> >> > > > and WaitSync (arguably they may not be the best names). But these
> >> two
> >> > > steps
> >> > > > together should be treated as the complete rebalance cycle.
> >> > > >
> >> > > >
> >> > > > Guozhang
> >> > > >
> >> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <
> cmcc...@apache.org>
> >> > > wrote:
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > I think maybe it makes sense to rename the "PreparingRebalance"
> >> > > consumer
> >> > > > > group state to "Rebalancing."  To me, "Preparing" implies that
> >> there
> >> > > > > will be a later "rebalance" state that follows-- but there is
> not.
> >> > > > > Since we're now exposing this state name publicly in these
> >> metrics,
> >> > > > > perhaps it makes sense to do this rename now.  Thoughts?
> >> > > > >
> >> > > > > best,
> >> > > > > Colin
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> >> > > > > > That's a good point.  I revised the KIP to add metrics for all
> >> the
> >> > > > group
> >> > > > > > states.
> >> > > > > >
> >> > > > > > best,
> >> > > > > > Colin
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> >> > > > > > > Ah, that's right Jason.
> >> > > > > > >
> >> > > > > > > With that I can be convinced to add one metric per each
> state.
> >> > > > > > >
> >> > > > > > > Guozhang
> >> > > > > > >
> >> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> >> > > > ja...@confluent.io>

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-08-03 Thread Jason Gustafson
Hey Colin, Guozhang,

I agree the current state names are not ideal for end users. I tend to see
the rebalance as joining the group and receiving the assignment. Maybe the
states could be named in those terms? For example: RebalanceJoin and
RebalanceAssignment. What do you think?

-Jason

On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang  wrote:

> I feel we can change `AwaitSync` to `completeRebalance` while keeping the
> other as is.
>
> cc Jason?
>
> Guozhang
>
> On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe  wrote:
>
>> Thanks for the explanation.  I guess maybe we should just keep the group
>> names as they are, then?
>>
>> best,
>> Colin
>>
>>
>> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
>> > To me `PreparingRebalance` sounds better than `StartingRebalance` since
>> > only by the end of that stage we have formed a new group. More
>> > specifically, this this the workflow from the coordinator's point of
>> > view:
>> >
>> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
>> >   |
>> >   |   send out error code for all heartbeat reponses
>> >  \|/
>> >   |
>> >   |   waiting for join group requests from members
>> >  \|/
>> > 2. formed a new group, increment the generation number, now start
>> > rebalancing, entering AwaitSync phase:
>> >   |
>> >   |   send out the join group responses for whoever
>> > requested join
>> >  \|/
>> >   |
>> >   |   waiting for the sync group request from the leader
>> >  \|/
>> > 3. received assignment from the leader; the rebalance has ended, start
>> > ticking for all members, entering Stable phase.
>> >   |
>> >   |   for whoever else sending the sync group request,
>> > reply with the assignment
>> >  \|/
>> >
>> > So from the coordinator's point of view the rebalance starts at
>> beginning
>> > of step 2 and ends at beginning of step 3. Maybe we can rename
>> > `AwaitSync`
>> > itself to `CompletingRebalance`.
>> >
>> > Guozhang
>> >
>> >
>> >
>> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma  wrote:
>> >
>> > > Hi Guozhang,
>> > >
>> > > Thanks for the clarification. The naming does seem a bit unclear.
>> Maybe
>> > > `PreparingRebalance` could be `StartingRebalance` or something that
>> makes
>> > > it clear that it is part of the rebalance instead of a step before the
>> > > actual rebalance. `AwaitingSync` could also be `CompletingRebalance`,
>> but
>> > > not sure if that's better.
>> > >
>> > > Ismael
>> > >
>> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang 
>> wrote:
>> > >
>> > > > Actually Rebalancing includes two steps, and we name them
>> > > PrepareRebalance
>> > > > and WaitSync (arguably they may not be the best names). But these
>> two
>> > > steps
>> > > > together should be treated as the complete rebalance cycle.
>> > > >
>> > > >
>> > > > Guozhang
>> > > >
>> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe 
>> > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > I think maybe it makes sense to rename the "PreparingRebalance"
>> > > consumer
>> > > > > group state to "Rebalancing."  To me, "Preparing" implies that
>> there
>> > > > > will be a later "rebalance" state that follows-- but there is not.
>> > > > > Since we're now exposing this state name publicly in these
>> metrics,
>> > > > > perhaps it makes sense to do this rename now.  Thoughts?
>> > > > >
>> > > > > best,
>> > > > > Colin
>> > > > >
>> > > > >
>> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
>> > > > > > That's a good point.  I revised the KIP to add metrics for all
>> the
>> > > > group
>> > > > > > states.
>> > > > > >
>> > > > > > best,
>> > > > > > Colin
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
>> > > > > > > Ah, that's right Jason.
>> > > > > > >
>> > > > > > > With that I can be convinced to add one metric per each state.
>> > > > > > >
>> > > > > > > Guozhang
>> > > > > > >
>> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
>> > > > ja...@confluent.io>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > >
>> > > > > > > > > "Dead" and "Empty" states are transient: groups usually
>> only
>> > > > > leaves in
>> > > > > > > > this
>> > > > > > > > > state for a short while and then being deleted or
>> transited to
>> > > > > other
>> > > > > > > > > states.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > This is not strictly true for the "Empty" state which we
>> also use
>> > > > to
>> > > > > > > > represent simple groups which only use the coordinator to
>> store
>> > > > > offsets. I
>> > > > > > > > think we may as well cover all the states if we're going to
>> cover
>> > > > > any of
>> > > > > > > > them 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-28 Thread Guozhang Wang
I feel we can change `AwaitSync` to `completeRebalance` while keeping the
other as is.

cc Jason?

Guozhang

On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe  wrote:

> Thanks for the explanation.  I guess maybe we should just keep the group
> names as they are, then?
>
> best,
> Colin
>
>
> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > To me `PreparingRebalance` sounds better than `StartingRebalance` since
> > only by the end of that stage we have formed a new group. More
> > specifically, this this the workflow from the coordinator's point of
> > view:
> >
> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
> >   |
> >   |   send out error code for all heartbeat reponses
> >  \|/
> >   |
> >   |   waiting for join group requests from members
> >  \|/
> > 2. formed a new group, increment the generation number, now start
> > rebalancing, entering AwaitSync phase:
> >   |
> >   |   send out the join group responses for whoever
> > requested join
> >  \|/
> >   |
> >   |   waiting for the sync group request from the leader
> >  \|/
> > 3. received assignment from the leader; the rebalance has ended, start
> > ticking for all members, entering Stable phase.
> >   |
> >   |   for whoever else sending the sync group request,
> > reply with the assignment
> >  \|/
> >
> > So from the coordinator's point of view the rebalance starts at beginning
> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > `AwaitSync`
> > itself to `CompletingRebalance`.
> >
> > Guozhang
> >
> >
> >
> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma  wrote:
> >
> > > Hi Guozhang,
> > >
> > > Thanks for the clarification. The naming does seem a bit unclear. Maybe
> > > `PreparingRebalance` could be `StartingRebalance` or something that
> makes
> > > it clear that it is part of the rebalance instead of a step before the
> > > actual rebalance. `AwaitingSync` could also be `CompletingRebalance`,
> but
> > > not sure if that's better.
> > >
> > > Ismael
> > >
> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang 
> wrote:
> > >
> > > > Actually Rebalancing includes two steps, and we name them
> > > PrepareRebalance
> > > > and WaitSync (arguably they may not be the best names). But these two
> > > steps
> > > > together should be treated as the complete rebalance cycle.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe 
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I think maybe it makes sense to rename the "PreparingRebalance"
> > > consumer
> > > > > group state to "Rebalancing."  To me, "Preparing" implies that
> there
> > > > > will be a later "rebalance" state that follows-- but there is not.
> > > > > Since we're now exposing this state name publicly in these metrics,
> > > > > perhaps it makes sense to do this rename now.  Thoughts?
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > > > > > That's a good point.  I revised the KIP to add metrics for all
> the
> > > > group
> > > > > > states.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > > > > > Ah, that's right Jason.
> > > > > > >
> > > > > > > With that I can be convinced to add one metric per each state.
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> > > > ja...@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > "Dead" and "Empty" states are transient: groups usually
> only
> > > > > leaves in
> > > > > > > > this
> > > > > > > > > state for a short while and then being deleted or
> transited to
> > > > > other
> > > > > > > > > states.
> > > > > > > >
> > > > > > > >
> > > > > > > > This is not strictly true for the "Empty" state which we
> also use
> > > > to
> > > > > > > > represent simple groups which only use the coordinator to
> store
> > > > > offsets. I
> > > > > > > > think we may as well cover all the states if we're going to
> cover
> > > > > any of
> > > > > > > > them specifically.
> > > > > > > >
> > > > > > > > -Jason
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang <
> > > wangg...@gmail.com
> > > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > My two cents:
> > > > > > > > >
> > > > > > > > > "Dead" and "Empty" states are transient: groups usually
> only
> > > > > leaves in
> > > > > > > > this
> > > > > > > > > state for a short while and then being deleted or
> transited to
> > > > > other
> > > > > > > > > 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-28 Thread Colin McCabe
Thanks for the explanation.  I guess maybe we should just keep the group
names as they are, then?

best,
Colin


On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> To me `PreparingRebalance` sounds better than `StartingRebalance` since
> only by the end of that stage we have formed a new group. More
> specifically, this this the workflow from the coordinator's point of
> view:
> 
> 1. decided to trigger a rebalance, enter PreparingRebalance phase;
>   |
>   |   send out error code for all heartbeat reponses
>  \|/
>   |
>   |   waiting for join group requests from members
>  \|/
> 2. formed a new group, increment the generation number, now start
> rebalancing, entering AwaitSync phase:
>   |
>   |   send out the join group responses for whoever
> requested join
>  \|/
>   |
>   |   waiting for the sync group request from the leader
>  \|/
> 3. received assignment from the leader; the rebalance has ended, start
> ticking for all members, entering Stable phase.
>   |
>   |   for whoever else sending the sync group request,
> reply with the assignment
>  \|/
> 
> So from the coordinator's point of view the rebalance starts at beginning
> of step 2 and ends at beginning of step 3. Maybe we can rename
> `AwaitSync`
> itself to `CompletingRebalance`.
> 
> Guozhang
> 
> 
> 
> On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma  wrote:
> 
> > Hi Guozhang,
> >
> > Thanks for the clarification. The naming does seem a bit unclear. Maybe
> > `PreparingRebalance` could be `StartingRebalance` or something that makes
> > it clear that it is part of the rebalance instead of a step before the
> > actual rebalance. `AwaitingSync` could also be `CompletingRebalance`, but
> > not sure if that's better.
> >
> > Ismael
> >
> > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang  wrote:
> >
> > > Actually Rebalancing includes two steps, and we name them
> > PrepareRebalance
> > > and WaitSync (arguably they may not be the best names). But these two
> > steps
> > > together should be treated as the complete rebalance cycle.
> > >
> > >
> > > Guozhang
> > >
> > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I think maybe it makes sense to rename the "PreparingRebalance"
> > consumer
> > > > group state to "Rebalancing."  To me, "Preparing" implies that there
> > > > will be a later "rebalance" state that follows-- but there is not.
> > > > Since we're now exposing this state name publicly in these metrics,
> > > > perhaps it makes sense to do this rename now.  Thoughts?
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > > > > That's a good point.  I revised the KIP to add metrics for all the
> > > group
> > > > > states.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > > > > Ah, that's right Jason.
> > > > > >
> > > > > > With that I can be convinced to add one metric per each state.
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> > > ja...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > >
> > > > > > > > "Dead" and "Empty" states are transient: groups usually only
> > > > leaves in
> > > > > > > this
> > > > > > > > state for a short while and then being deleted or transited to
> > > > other
> > > > > > > > states.
> > > > > > >
> > > > > > >
> > > > > > > This is not strictly true for the "Empty" state which we also use
> > > to
> > > > > > > represent simple groups which only use the coordinator to store
> > > > offsets. I
> > > > > > > think we may as well cover all the states if we're going to cover
> > > > any of
> > > > > > > them specifically.
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang <
> > wangg...@gmail.com
> > > >
> > > > wrote:
> > > > > > >
> > > > > > > > My two cents:
> > > > > > > >
> > > > > > > > "Dead" and "Empty" states are transient: groups usually only
> > > > leaves in
> > > > > > > this
> > > > > > > > state for a short while and then being deleted or transited to
> > > > other
> > > > > > > > states.
> > > > > > > >
> > > > > > > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > > > > > > **NumGroupsRebalancing
> > > > > > > > - **NumGroupsAwaitingSync`* should cover the above three, where
> > > > "Stable"
> > > > > > > > should be contributing most of the counts: If we have a bug
> > that
> > > > causes
> > > > > > > the
> > > > > > > > num.Dead / Empty to keep increasing, then we would observe
> > > > `NumGroups`
> > > > > > > keep
> > 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-26 Thread Guozhang Wang
To me `PreparingRebalance` sounds better than `StartingRebalance` since
only by the end of that stage we have formed a new group. More
specifically, this this the workflow from the coordinator's point of view:

1. decided to trigger a rebalance, enter PreparingRebalance phase;
  |
  |   send out error code for all heartbeat reponses
 \|/
  |
  |   waiting for join group requests from members
 \|/
2. formed a new group, increment the generation number, now start
rebalancing, entering AwaitSync phase:
  |
  |   send out the join group responses for whoever
requested join
 \|/
  |
  |   waiting for the sync group request from the leader
 \|/
3. received assignment from the leader; the rebalance has ended, start
ticking for all members, entering Stable phase.
  |
  |   for whoever else sending the sync group request,
reply with the assignment
 \|/

So from the coordinator's point of view the rebalance starts at beginning
of step 2 and ends at beginning of step 3. Maybe we can rename `AwaitSync`
itself to `CompletingRebalance`.

Guozhang



On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma  wrote:

> Hi Guozhang,
>
> Thanks for the clarification. The naming does seem a bit unclear. Maybe
> `PreparingRebalance` could be `StartingRebalance` or something that makes
> it clear that it is part of the rebalance instead of a step before the
> actual rebalance. `AwaitingSync` could also be `CompletingRebalance`, but
> not sure if that's better.
>
> Ismael
>
> On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang  wrote:
>
> > Actually Rebalancing includes two steps, and we name them
> PrepareRebalance
> > and WaitSync (arguably they may not be the best names). But these two
> steps
> > together should be treated as the complete rebalance cycle.
> >
> >
> > Guozhang
> >
> > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe 
> wrote:
> >
> > > Hi all,
> > >
> > > I think maybe it makes sense to rename the "PreparingRebalance"
> consumer
> > > group state to "Rebalancing."  To me, "Preparing" implies that there
> > > will be a later "rebalance" state that follows-- but there is not.
> > > Since we're now exposing this state name publicly in these metrics,
> > > perhaps it makes sense to do this rename now.  Thoughts?
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > > > That's a good point.  I revised the KIP to add metrics for all the
> > group
> > > > states.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > > > Ah, that's right Jason.
> > > > >
> > > > > With that I can be convinced to add one metric per each state.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> > ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > >
> > > > > > > "Dead" and "Empty" states are transient: groups usually only
> > > leaves in
> > > > > > this
> > > > > > > state for a short while and then being deleted or transited to
> > > other
> > > > > > > states.
> > > > > >
> > > > > >
> > > > > > This is not strictly true for the "Empty" state which we also use
> > to
> > > > > > represent simple groups which only use the coordinator to store
> > > offsets. I
> > > > > > think we may as well cover all the states if we're going to cover
> > > any of
> > > > > > them specifically.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang <
> wangg...@gmail.com
> > >
> > > wrote:
> > > > > >
> > > > > > > My two cents:
> > > > > > >
> > > > > > > "Dead" and "Empty" states are transient: groups usually only
> > > leaves in
> > > > > > this
> > > > > > > state for a short while and then being deleted or transited to
> > > other
> > > > > > > states.
> > > > > > >
> > > > > > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > > > > > **NumGroupsRebalancing
> > > > > > > - **NumGroupsAwaitingSync`* should cover the above three, where
> > > "Stable"
> > > > > > > should be contributing most of the counts: If we have a bug
> that
> > > causes
> > > > > > the
> > > > > > > num.Dead / Empty to keep increasing, then we would observe
> > > `NumGroups`
> > > > > > keep
> > > > > > > increasing which should be sufficient for alerting. And trouble
> > > shooting
> > > > > > of
> > > > > > > the issue could be relying on the log4j.
> > > > > > >
> > > > > > > *Guozhang*
> > > > > > >
> > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma <
> ism...@juma.me.uk>
> > > wrote:
> > > > > > >
> > > > > > > > Thanks for the KIP, Colin. This will definitely be useful.
> One
> > > > > > question:
> > > > > > > > would it 

Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-25 Thread Ismael Juma
Hi Guozhang,

Thanks for the clarification. The naming does seem a bit unclear. Maybe
`PreparingRebalance` could be `StartingRebalance` or something that makes
it clear that it is part of the rebalance instead of a step before the
actual rebalance. `AwaitingSync` could also be `CompletingRebalance`, but
not sure if that's better.

Ismael

On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang  wrote:

> Actually Rebalancing includes two steps, and we name them PrepareRebalance
> and WaitSync (arguably they may not be the best names). But these two steps
> together should be treated as the complete rebalance cycle.
>
>
> Guozhang
>
> On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe  wrote:
>
> > Hi all,
> >
> > I think maybe it makes sense to rename the "PreparingRebalance" consumer
> > group state to "Rebalancing."  To me, "Preparing" implies that there
> > will be a later "rebalance" state that follows-- but there is not.
> > Since we're now exposing this state name publicly in these metrics,
> > perhaps it makes sense to do this rename now.  Thoughts?
> >
> > best,
> > Colin
> >
> >
> > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > > That's a good point.  I revised the KIP to add metrics for all the
> group
> > > states.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > > Ah, that's right Jason.
> > > >
> > > > With that I can be convinced to add one metric per each state.
> > > >
> > > > Guozhang
> > > >
> > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > >
> > > > > > "Dead" and "Empty" states are transient: groups usually only
> > leaves in
> > > > > this
> > > > > > state for a short while and then being deleted or transited to
> > other
> > > > > > states.
> > > > >
> > > > >
> > > > > This is not strictly true for the "Empty" state which we also use
> to
> > > > > represent simple groups which only use the coordinator to store
> > offsets. I
> > > > > think we may as well cover all the states if we're going to cover
> > any of
> > > > > them specifically.
> > > > >
> > > > > -Jason
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang  >
> > wrote:
> > > > >
> > > > > > My two cents:
> > > > > >
> > > > > > "Dead" and "Empty" states are transient: groups usually only
> > leaves in
> > > > > this
> > > > > > state for a short while and then being deleted or transited to
> > other
> > > > > > states.
> > > > > >
> > > > > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > > > > **NumGroupsRebalancing
> > > > > > - **NumGroupsAwaitingSync`* should cover the above three, where
> > "Stable"
> > > > > > should be contributing most of the counts: If we have a bug that
> > causes
> > > > > the
> > > > > > num.Dead / Empty to keep increasing, then we would observe
> > `NumGroups`
> > > > > keep
> > > > > > increasing which should be sufficient for alerting. And trouble
> > shooting
> > > > > of
> > > > > > the issue could be relying on the log4j.
> > > > > >
> > > > > > *Guozhang*
> > > > > >
> > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma 
> > wrote:
> > > > > >
> > > > > > > Thanks for the KIP, Colin. This will definitely be useful. One
> > > > > question:
> > > > > > > would it be useful to have a metric for for the number of
> groups
> > in
> > > > > each
> > > > > > > possible state? The KIP suggests "PreparingRebalance" and
> > > > > "AwaitingSync".
> > > > > > > That leaves "Stable", "Dead" and "Empty". Are those not useful?
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe <
> > cmcc...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I posted "KIP-180: Add a broker metric specifying the number
> of
> > > > > > consumer
> > > > > > > > group rebalances in progress" for discussion:
> > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > > > > > > consumer+group+rebalances+in+progress
> > > > > > > >
> > > > > > > > Check it out.
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-24 Thread Guozhang Wang
Actually Rebalancing includes two steps, and we name them PrepareRebalance
and WaitSync (arguably they may not be the best names). But these two steps
together should be treated as the complete rebalance cycle.


Guozhang

On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe  wrote:

> Hi all,
>
> I think maybe it makes sense to rename the "PreparingRebalance" consumer
> group state to "Rebalancing."  To me, "Preparing" implies that there
> will be a later "rebalance" state that follows-- but there is not.
> Since we're now exposing this state name publicly in these metrics,
> perhaps it makes sense to do this rename now.  Thoughts?
>
> best,
> Colin
>
>
> On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > That's a good point.  I revised the KIP to add metrics for all the group
> > states.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > Ah, that's right Jason.
> > >
> > > With that I can be convinced to add one metric per each state.
> > >
> > > Guozhang
> > >
> > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson 
> > > wrote:
> > >
> > > > >
> > > > > "Dead" and "Empty" states are transient: groups usually only
> leaves in
> > > > this
> > > > > state for a short while and then being deleted or transited to
> other
> > > > > states.
> > > >
> > > >
> > > > This is not strictly true for the "Empty" state which we also use to
> > > > represent simple groups which only use the coordinator to store
> offsets. I
> > > > think we may as well cover all the states if we're going to cover
> any of
> > > > them specifically.
> > > >
> > > > -Jason
> > > >
> > > >
> > > >
> > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang 
> wrote:
> > > >
> > > > > My two cents:
> > > > >
> > > > > "Dead" and "Empty" states are transient: groups usually only
> leaves in
> > > > this
> > > > > state for a short while and then being deleted or transited to
> other
> > > > > states.
> > > > >
> > > > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > > > **NumGroupsRebalancing
> > > > > - **NumGroupsAwaitingSync`* should cover the above three, where
> "Stable"
> > > > > should be contributing most of the counts: If we have a bug that
> causes
> > > > the
> > > > > num.Dead / Empty to keep increasing, then we would observe
> `NumGroups`
> > > > keep
> > > > > increasing which should be sufficient for alerting. And trouble
> shooting
> > > > of
> > > > > the issue could be relying on the log4j.
> > > > >
> > > > > *Guozhang*
> > > > >
> > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma 
> wrote:
> > > > >
> > > > > > Thanks for the KIP, Colin. This will definitely be useful. One
> > > > question:
> > > > > > would it be useful to have a metric for for the number of groups
> in
> > > > each
> > > > > > possible state? The KIP suggests "PreparingRebalance" and
> > > > "AwaitingSync".
> > > > > > That leaves "Stable", "Dead" and "Empty". Are those not useful?
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe <
> cmcc...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I posted "KIP-180: Add a broker metric specifying the number of
> > > > > consumer
> > > > > > > group rebalances in progress" for discussion:
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > > > > > consumer+group+rebalances+in+progress
> > > > > > >
> > > > > > > Check it out.
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-24 Thread Ismael Juma
Sounds good to me.

Ismael

On Mon, Jul 24, 2017 at 6:46 PM, Colin McCabe  wrote:

> Hi all,
>
> I think maybe it makes sense to rename the "PreparingRebalance" consumer
> group state to "Rebalancing."  To me, "Preparing" implies that there
> will be a later "rebalance" state that follows-- but there is not.
> Since we're now exposing this state name publicly in these metrics,
> perhaps it makes sense to do this rename now.  Thoughts?
>
> best,
> Colin
>
>
> On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > That's a good point.  I revised the KIP to add metrics for all the group
> > states.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > > Ah, that's right Jason.
> > >
> > > With that I can be convinced to add one metric per each state.
> > >
> > > Guozhang
> > >
> > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson 
> > > wrote:
> > >
> > > > >
> > > > > "Dead" and "Empty" states are transient: groups usually only
> leaves in
> > > > this
> > > > > state for a short while and then being deleted or transited to
> other
> > > > > states.
> > > >
> > > >
> > > > This is not strictly true for the "Empty" state which we also use to
> > > > represent simple groups which only use the coordinator to store
> offsets. I
> > > > think we may as well cover all the states if we're going to cover
> any of
> > > > them specifically.
> > > >
> > > > -Jason
> > > >
> > > >
> > > >
> > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang 
> wrote:
> > > >
> > > > > My two cents:
> > > > >
> > > > > "Dead" and "Empty" states are transient: groups usually only
> leaves in
> > > > this
> > > > > state for a short while and then being deleted or transited to
> other
> > > > > states.
> > > > >
> > > > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > > > **NumGroupsRebalancing
> > > > > - **NumGroupsAwaitingSync`* should cover the above three, where
> "Stable"
> > > > > should be contributing most of the counts: If we have a bug that
> causes
> > > > the
> > > > > num.Dead / Empty to keep increasing, then we would observe
> `NumGroups`
> > > > keep
> > > > > increasing which should be sufficient for alerting. And trouble
> shooting
> > > > of
> > > > > the issue could be relying on the log4j.
> > > > >
> > > > > *Guozhang*
> > > > >
> > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma 
> wrote:
> > > > >
> > > > > > Thanks for the KIP, Colin. This will definitely be useful. One
> > > > question:
> > > > > > would it be useful to have a metric for for the number of groups
> in
> > > > each
> > > > > > possible state? The KIP suggests "PreparingRebalance" and
> > > > "AwaitingSync".
> > > > > > That leaves "Stable", "Dead" and "Empty". Are those not useful?
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe <
> cmcc...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I posted "KIP-180: Add a broker metric specifying the number of
> > > > > consumer
> > > > > > > group rebalances in progress" for discussion:
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > > > > > consumer+group+rebalances+in+progress
> > > > > > >
> > > > > > > Check it out.
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
>


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-24 Thread Colin McCabe
Hi all,

I think maybe it makes sense to rename the "PreparingRebalance" consumer
group state to "Rebalancing."  To me, "Preparing" implies that there
will be a later "rebalance" state that follows-- but there is not. 
Since we're now exposing this state name publicly in these metrics,
perhaps it makes sense to do this rename now.  Thoughts?

best,
Colin


On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> That's a good point.  I revised the KIP to add metrics for all the group
> states.
> 
> best,
> Colin
> 
> 
> On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > Ah, that's right Jason.
> > 
> > With that I can be convinced to add one metric per each state.
> > 
> > Guozhang
> > 
> > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson 
> > wrote:
> > 
> > > >
> > > > "Dead" and "Empty" states are transient: groups usually only leaves in
> > > this
> > > > state for a short while and then being deleted or transited to other
> > > > states.
> > >
> > >
> > > This is not strictly true for the "Empty" state which we also use to
> > > represent simple groups which only use the coordinator to store offsets. I
> > > think we may as well cover all the states if we're going to cover any of
> > > them specifically.
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang  wrote:
> > >
> > > > My two cents:
> > > >
> > > > "Dead" and "Empty" states are transient: groups usually only leaves in
> > > this
> > > > state for a short while and then being deleted or transited to other
> > > > states.
> > > >
> > > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > > **NumGroupsRebalancing
> > > > - **NumGroupsAwaitingSync`* should cover the above three, where "Stable"
> > > > should be contributing most of the counts: If we have a bug that causes
> > > the
> > > > num.Dead / Empty to keep increasing, then we would observe `NumGroups`
> > > keep
> > > > increasing which should be sufficient for alerting. And trouble shooting
> > > of
> > > > the issue could be relying on the log4j.
> > > >
> > > > *Guozhang*
> > > >
> > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma  wrote:
> > > >
> > > > > Thanks for the KIP, Colin. This will definitely be useful. One
> > > question:
> > > > > would it be useful to have a metric for for the number of groups in
> > > each
> > > > > possible state? The KIP suggests "PreparingRebalance" and
> > > "AwaitingSync".
> > > > > That leaves "Stable", "Dead" and "Empty". Are those not useful?
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe 
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I posted "KIP-180: Add a broker metric specifying the number of
> > > > consumer
> > > > > > group rebalances in progress" for discussion:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > > > > consumer+group+rebalances+in+progress
> > > > > >
> > > > > > Check it out.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> > 
> > 
> > 
> > -- 
> > -- Guozhang


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-21 Thread Colin McCabe
That's a good point.  I revised the KIP to add metrics for all the group
states.

best,
Colin


On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> Ah, that's right Jason.
> 
> With that I can be convinced to add one metric per each state.
> 
> Guozhang
> 
> On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson 
> wrote:
> 
> > >
> > > "Dead" and "Empty" states are transient: groups usually only leaves in
> > this
> > > state for a short while and then being deleted or transited to other
> > > states.
> >
> >
> > This is not strictly true for the "Empty" state which we also use to
> > represent simple groups which only use the coordinator to store offsets. I
> > think we may as well cover all the states if we're going to cover any of
> > them specifically.
> >
> > -Jason
> >
> >
> >
> > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang  wrote:
> >
> > > My two cents:
> > >
> > > "Dead" and "Empty" states are transient: groups usually only leaves in
> > this
> > > state for a short while and then being deleted or transited to other
> > > states.
> > >
> > > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > > **NumGroupsRebalancing
> > > - **NumGroupsAwaitingSync`* should cover the above three, where "Stable"
> > > should be contributing most of the counts: If we have a bug that causes
> > the
> > > num.Dead / Empty to keep increasing, then we would observe `NumGroups`
> > keep
> > > increasing which should be sufficient for alerting. And trouble shooting
> > of
> > > the issue could be relying on the log4j.
> > >
> > > *Guozhang*
> > >
> > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma  wrote:
> > >
> > > > Thanks for the KIP, Colin. This will definitely be useful. One
> > question:
> > > > would it be useful to have a metric for for the number of groups in
> > each
> > > > possible state? The KIP suggests "PreparingRebalance" and
> > "AwaitingSync".
> > > > That leaves "Stable", "Dead" and "Empty". Are those not useful?
> > > >
> > > > Ismael
> > > >
> > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe 
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I posted "KIP-180: Add a broker metric specifying the number of
> > > consumer
> > > > > group rebalances in progress" for discussion:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > > > consumer+group+rebalances+in+progress
> > > > >
> > > > > Check it out.
> > > > >
> > > > > cheers,
> > > > > Colin
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
> 
> 
> 
> -- 
> -- Guozhang


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-21 Thread Guozhang Wang
Ah, that's right Jason.

With that I can be convinced to add one metric per each state.

Guozhang

On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson 
wrote:

> >
> > "Dead" and "Empty" states are transient: groups usually only leaves in
> this
> > state for a short while and then being deleted or transited to other
> > states.
>
>
> This is not strictly true for the "Empty" state which we also use to
> represent simple groups which only use the coordinator to store offsets. I
> think we may as well cover all the states if we're going to cover any of
> them specifically.
>
> -Jason
>
>
>
> On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang  wrote:
>
> > My two cents:
> >
> > "Dead" and "Empty" states are transient: groups usually only leaves in
> this
> > state for a short while and then being deleted or transited to other
> > states.
> >
> > Since we have the existing "*NumGroups*" metric, `*NumGroups -
> > **NumGroupsRebalancing
> > - **NumGroupsAwaitingSync`* should cover the above three, where "Stable"
> > should be contributing most of the counts: If we have a bug that causes
> the
> > num.Dead / Empty to keep increasing, then we would observe `NumGroups`
> keep
> > increasing which should be sufficient for alerting. And trouble shooting
> of
> > the issue could be relying on the log4j.
> >
> > *Guozhang*
> >
> > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma  wrote:
> >
> > > Thanks for the KIP, Colin. This will definitely be useful. One
> question:
> > > would it be useful to have a metric for for the number of groups in
> each
> > > possible state? The KIP suggests "PreparingRebalance" and
> "AwaitingSync".
> > > That leaves "Stable", "Dead" and "Empty". Are those not useful?
> > >
> > > Ismael
> > >
> > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I posted "KIP-180: Add a broker metric specifying the number of
> > consumer
> > > > group rebalances in progress" for discussion:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > > consumer+group+rebalances+in+progress
> > > >
> > > > Check it out.
> > > >
> > > > cheers,
> > > > Colin
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-21 Thread Jason Gustafson
>
> "Dead" and "Empty" states are transient: groups usually only leaves in this
> state for a short while and then being deleted or transited to other
> states.


This is not strictly true for the "Empty" state which we also use to
represent simple groups which only use the coordinator to store offsets. I
think we may as well cover all the states if we're going to cover any of
them specifically.

-Jason



On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang  wrote:

> My two cents:
>
> "Dead" and "Empty" states are transient: groups usually only leaves in this
> state for a short while and then being deleted or transited to other
> states.
>
> Since we have the existing "*NumGroups*" metric, `*NumGroups -
> **NumGroupsRebalancing
> - **NumGroupsAwaitingSync`* should cover the above three, where "Stable"
> should be contributing most of the counts: If we have a bug that causes the
> num.Dead / Empty to keep increasing, then we would observe `NumGroups` keep
> increasing which should be sufficient for alerting. And trouble shooting of
> the issue could be relying on the log4j.
>
> *Guozhang*
>
> On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma  wrote:
>
> > Thanks for the KIP, Colin. This will definitely be useful. One question:
> > would it be useful to have a metric for for the number of groups in each
> > possible state? The KIP suggests "PreparingRebalance" and "AwaitingSync".
> > That leaves "Stable", "Dead" and "Empty". Are those not useful?
> >
> > Ismael
> >
> > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe 
> wrote:
> >
> > > Hi all,
> > >
> > > I posted "KIP-180: Add a broker metric specifying the number of
> consumer
> > > group rebalances in progress" for discussion:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > > consumer+group+rebalances+in+progress
> > >
> > > Check it out.
> > >
> > > cheers,
> > > Colin
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-21 Thread Guozhang Wang
My two cents:

"Dead" and "Empty" states are transient: groups usually only leaves in this
state for a short while and then being deleted or transited to other states.

Since we have the existing "*NumGroups*" metric, `*NumGroups -
**NumGroupsRebalancing
- **NumGroupsAwaitingSync`* should cover the above three, where "Stable"
should be contributing most of the counts: If we have a bug that causes the
num.Dead / Empty to keep increasing, then we would observe `NumGroups` keep
increasing which should be sufficient for alerting. And trouble shooting of
the issue could be relying on the log4j.

*Guozhang*

On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma  wrote:

> Thanks for the KIP, Colin. This will definitely be useful. One question:
> would it be useful to have a metric for for the number of groups in each
> possible state? The KIP suggests "PreparingRebalance" and "AwaitingSync".
> That leaves "Stable", "Dead" and "Empty". Are those not useful?
>
> Ismael
>
> On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe  wrote:
>
> > Hi all,
> >
> > I posted "KIP-180: Add a broker metric specifying the number of consumer
> > group rebalances in progress" for discussion:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 180%3A+Add+a+broker+metric+specifying+the+number+of+
> > consumer+group+rebalances+in+progress
> >
> > Check it out.
> >
> > cheers,
> > Colin
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-21 Thread Ismael Juma
Thanks for the KIP, Colin. This will definitely be useful. One question:
would it be useful to have a metric for for the number of groups in each
possible state? The KIP suggests "PreparingRebalance" and "AwaitingSync".
That leaves "Stable", "Dead" and "Empty". Are those not useful?

Ismael

On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe  wrote:

> Hi all,
>
> I posted "KIP-180: Add a broker metric specifying the number of consumer
> group rebalances in progress" for discussion:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 180%3A+Add+a+broker+metric+specifying+the+number+of+
> consumer+group+rebalances+in+progress
>
> Check it out.
>
> cheers,
> Colin
>


Re: [DISCUSS] KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress

2017-07-20 Thread Guozhang Wang
Colin,

Thanks for the writeup! This KIP lgtm.


Guozhang



On Thu, Jul 20, 2017 at 10:52 AM, Colin McCabe  wrote:

> Hi all,
>
> I posted "KIP-180: Add a broker metric specifying the number of consumer
> group rebalances in progress" for discussion:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 180%3A+Add+a+broker+metric+specifying+the+number+of+
> consumer+group+rebalances+in+progress
>
> Check it out.
>
> cheers,
> Colin
>



-- 
-- Guozhang