Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-22 Thread Luke Chen
+1 to make it into V3.1.
And the partial and risk-free proposal to make default to ["range",
"cooperative-sticky"] in 3.0 is a brilliant idea.
Sounds good to me.

I can work on it for V3.0 if there's no objections.
Thank you.

Luke

On Wed, Jun 23, 2021 at 12:38 PM Ismael Juma  wrote:

> +1 to making the switch in 3.1 with more time versus rushing it in 3.0.
>
> Ismael
>
> On Tue, Jun 22, 2021 at 8:58 PM Sophie Blee-Goldman
>  wrote:
>
> > I believe we've figured out the root cause of KAFKA-12896
> > , and should have a
> fix
> > prepared shortly. See the linked issues for more details.
> >
> > Regarding KIP-726 itself, given that the latest proposal is fully
> > compatible and does not require any breaking changes, we may or may not
> > push to get this into 3.0. Cooperative rebalancing is a huge improvement
> > for the majority of consumer applications, but it's already available for
> > those who want it, so I don't feel too badly about letting it slip from
> 3.0
> > in order to focus on other things. I would also personally feel better
> > about merging it at the beginning of 3.1 so we have the entire release
> > cycle to flush out any potential regressions, rather than rushing it in
> at
> > the last moment (even though the majority of the work has been done for a
> > while).
> >
> > If anyone feels strongly about getting this into 3.0, please let me know,
> > as it's definitely still within reach. Otherwise I'll probably aim for
> 3.1
> > instead.
> >
> > Note (@Luke in particular): we could opt for a partial and risk-free
> > improvement in 3.0, that would make it easier for users to turn on
> > cooperative rebalancing without actually enabling it for them (yet). If
> we
> > change the default config to ["range", "cooperative-sticky"] in 3.0, then
> > the RangeAssignor will remain the default but any application can upgrade
> > to the CooperativeStickyAssignor with just a single rolling bounce (to
> > remove the "range" assignor), rather than the usual two. We can then go
> > ahead and fully make the CooperativeStickyAssignor the default assignor
> in
> > 3.1 by changing the default config to ["cooperative-sticky", "range"] as
> > this KIP has proposed.
> >
> > Thoughts?
> >
> > On Thu, Jun 10, 2021 at 1:46 AM Luke Chen  wrote:
> >
> > > Hi Ryan,
> > > Thanks for your good comments. I've listed your comments in "Rejected
> > > Alternatives" in KIP.
> > >
> > > 1. Some cooperative-sticky related defects might not free before V3.0
> > > → We've marked important defects as blocker for V3.0, ex:
> > KAFKA-12896.
> > > Please raise any important defect if you found any.
> > >
> > > 2. Cooperative-sticky assignor is also very new for C/C++ users in
> > > librdkafka, so not many in that community have tried incremental
> > > cooperative yet. And bugs are still recently being worked out there
> too.
> > > → Thanks for raising this. I checked this library and found
> currently
> > > only 1 cooperative-sticky related bug open, which is good (10 bugs are
> > > fixed). Anyway, I think the clients can always change the assignor to
> > other
> > > assignors if there are still bugs in the library.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Thu, Jun 10, 2021 at 12:40 PM Ryan Leslie 
> > > wrote:
> > >
> > > > Thanks for the quick replies, Luke and Sophie.
> > > >
> > > > I've not voted, but I agree with accepting the KIP since it's a
> > superior
> > > > feature. I was just reacting mostly to this comment since it didn't
> > > mention
> > > > open issues:
> > > >
> > > > > > > Thanks Luke. We may as well get this KIP in to 3.0 so that we
> can
> > > > fully
> > > > > > > enable cooperative rebalancing
> > > > > > > by default in 3.0 if we have KAFKA-12477 done in time, and if
> we
> > > > don't
> > > > > > then
> > > > > > > there's no harm as it's
> > > > > > > not going to change the behavior.
> > > >
> > > > But I see now, as Luke said, that the main issue is already
> considered
> > a
> > > > blocker so it was assumed. Though, I did also wonder if any bugs that
> > may
> > > > have existed since several version ago should actually hold up 3.0,
> > > which I
> > > > know is especially about moving away from ZooKeeper.
> > > >
> > > > My sentiment was just that during many release cycles of Kafka since
> > > > cooperative was introduced, there have been issues discovered. And
> that
> > > > makes sense given that the implementation was complex and quite a lot
> > of
> > > > code changed to make it happen. Hopefully the last of the kinks will
> > have
> > > > been worked out before 3.0. I just wondered if it should be a default
> > in
> > > > 3.0 if it hasn't yet been free of defects for a significant period of
> > > time.
> > > > KIP-726 doesn't list any drawbacks for cooperative-sticky, but
> perhaps
> > > this
> > > > is one. I also appreciate that it's already successfully adopted by
> > many,
> > > > particularly streams / connect users. But 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-22 Thread Ismael Juma
+1 to making the switch in 3.1 with more time versus rushing it in 3.0.

Ismael

On Tue, Jun 22, 2021 at 8:58 PM Sophie Blee-Goldman
 wrote:

> I believe we've figured out the root cause of KAFKA-12896
> , and should have a fix
> prepared shortly. See the linked issues for more details.
>
> Regarding KIP-726 itself, given that the latest proposal is fully
> compatible and does not require any breaking changes, we may or may not
> push to get this into 3.0. Cooperative rebalancing is a huge improvement
> for the majority of consumer applications, but it's already available for
> those who want it, so I don't feel too badly about letting it slip from 3.0
> in order to focus on other things. I would also personally feel better
> about merging it at the beginning of 3.1 so we have the entire release
> cycle to flush out any potential regressions, rather than rushing it in at
> the last moment (even though the majority of the work has been done for a
> while).
>
> If anyone feels strongly about getting this into 3.0, please let me know,
> as it's definitely still within reach. Otherwise I'll probably aim for 3.1
> instead.
>
> Note (@Luke in particular): we could opt for a partial and risk-free
> improvement in 3.0, that would make it easier for users to turn on
> cooperative rebalancing without actually enabling it for them (yet). If we
> change the default config to ["range", "cooperative-sticky"] in 3.0, then
> the RangeAssignor will remain the default but any application can upgrade
> to the CooperativeStickyAssignor with just a single rolling bounce (to
> remove the "range" assignor), rather than the usual two. We can then go
> ahead and fully make the CooperativeStickyAssignor the default assignor in
> 3.1 by changing the default config to ["cooperative-sticky", "range"] as
> this KIP has proposed.
>
> Thoughts?
>
> On Thu, Jun 10, 2021 at 1:46 AM Luke Chen  wrote:
>
> > Hi Ryan,
> > Thanks for your good comments. I've listed your comments in "Rejected
> > Alternatives" in KIP.
> >
> > 1. Some cooperative-sticky related defects might not free before V3.0
> > → We've marked important defects as blocker for V3.0, ex:
> KAFKA-12896.
> > Please raise any important defect if you found any.
> >
> > 2. Cooperative-sticky assignor is also very new for C/C++ users in
> > librdkafka, so not many in that community have tried incremental
> > cooperative yet. And bugs are still recently being worked out there too.
> > → Thanks for raising this. I checked this library and found currently
> > only 1 cooperative-sticky related bug open, which is good (10 bugs are
> > fixed). Anyway, I think the clients can always change the assignor to
> other
> > assignors if there are still bugs in the library.
> >
> > Thank you.
> > Luke
> >
> > On Thu, Jun 10, 2021 at 12:40 PM Ryan Leslie 
> > wrote:
> >
> > > Thanks for the quick replies, Luke and Sophie.
> > >
> > > I've not voted, but I agree with accepting the KIP since it's a
> superior
> > > feature. I was just reacting mostly to this comment since it didn't
> > mention
> > > open issues:
> > >
> > > > > > Thanks Luke. We may as well get this KIP in to 3.0 so that we can
> > > fully
> > > > > > enable cooperative rebalancing
> > > > > > by default in 3.0 if we have KAFKA-12477 done in time, and if we
> > > don't
> > > > > then
> > > > > > there's no harm as it's
> > > > > > not going to change the behavior.
> > >
> > > But I see now, as Luke said, that the main issue is already considered
> a
> > > blocker so it was assumed. Though, I did also wonder if any bugs that
> may
> > > have existed since several version ago should actually hold up 3.0,
> > which I
> > > know is especially about moving away from ZooKeeper.
> > >
> > > My sentiment was just that during many release cycles of Kafka since
> > > cooperative was introduced, there have been issues discovered. And that
> > > makes sense given that the implementation was complex and quite a lot
> of
> > > code changed to make it happen. Hopefully the last of the kinks will
> have
> > > been worked out before 3.0. I just wondered if it should be a default
> in
> > > 3.0 if it hasn't yet been free of defects for a significant period of
> > time.
> > > KIP-726 doesn't list any drawbacks for cooperative-sticky, but perhaps
> > this
> > > is one. I also appreciate that it's already successfully adopted by
> many,
> > > particularly streams / connect users. But this may also be where the
> > > feature has the most benefit due to expensive setup/teardown during
> > > rebalance, and stop-the-world can be less of a concern for many
> "regular
> > > consumers".
> > >
> > > This is probably irrelevant here, but another thing to mention is that
> > the
> > > feature is also very new for C/C++ users in librdkafka, so not many in
> > that
> > > community have tried incremental cooperative yet. And bugs are still
> > > recently being worked out there too.
> > >
> > > Just playing 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-22 Thread Sophie Blee-Goldman
I believe we've figured out the root cause of KAFKA-12896
, and should have a fix
prepared shortly. See the linked issues for more details.

Regarding KIP-726 itself, given that the latest proposal is fully
compatible and does not require any breaking changes, we may or may not
push to get this into 3.0. Cooperative rebalancing is a huge improvement
for the majority of consumer applications, but it's already available for
those who want it, so I don't feel too badly about letting it slip from 3.0
in order to focus on other things. I would also personally feel better
about merging it at the beginning of 3.1 so we have the entire release
cycle to flush out any potential regressions, rather than rushing it in at
the last moment (even though the majority of the work has been done for a
while).

If anyone feels strongly about getting this into 3.0, please let me know,
as it's definitely still within reach. Otherwise I'll probably aim for 3.1
instead.

Note (@Luke in particular): we could opt for a partial and risk-free
improvement in 3.0, that would make it easier for users to turn on
cooperative rebalancing without actually enabling it for them (yet). If we
change the default config to ["range", "cooperative-sticky"] in 3.0, then
the RangeAssignor will remain the default but any application can upgrade
to the CooperativeStickyAssignor with just a single rolling bounce (to
remove the "range" assignor), rather than the usual two. We can then go
ahead and fully make the CooperativeStickyAssignor the default assignor in
3.1 by changing the default config to ["cooperative-sticky", "range"] as
this KIP has proposed.

Thoughts?

On Thu, Jun 10, 2021 at 1:46 AM Luke Chen  wrote:

> Hi Ryan,
> Thanks for your good comments. I've listed your comments in "Rejected
> Alternatives" in KIP.
>
> 1. Some cooperative-sticky related defects might not free before V3.0
> → We've marked important defects as blocker for V3.0, ex: KAFKA-12896.
> Please raise any important defect if you found any.
>
> 2. Cooperative-sticky assignor is also very new for C/C++ users in
> librdkafka, so not many in that community have tried incremental
> cooperative yet. And bugs are still recently being worked out there too.
> → Thanks for raising this. I checked this library and found currently
> only 1 cooperative-sticky related bug open, which is good (10 bugs are
> fixed). Anyway, I think the clients can always change the assignor to other
> assignors if there are still bugs in the library.
>
> Thank you.
> Luke
>
> On Thu, Jun 10, 2021 at 12:40 PM Ryan Leslie 
> wrote:
>
> > Thanks for the quick replies, Luke and Sophie.
> >
> > I've not voted, but I agree with accepting the KIP since it's a superior
> > feature. I was just reacting mostly to this comment since it didn't
> mention
> > open issues:
> >
> > > > > Thanks Luke. We may as well get this KIP in to 3.0 so that we can
> > fully
> > > > > enable cooperative rebalancing
> > > > > by default in 3.0 if we have KAFKA-12477 done in time, and if we
> > don't
> > > > then
> > > > > there's no harm as it's
> > > > > not going to change the behavior.
> >
> > But I see now, as Luke said, that the main issue is already considered a
> > blocker so it was assumed. Though, I did also wonder if any bugs that may
> > have existed since several version ago should actually hold up 3.0,
> which I
> > know is especially about moving away from ZooKeeper.
> >
> > My sentiment was just that during many release cycles of Kafka since
> > cooperative was introduced, there have been issues discovered. And that
> > makes sense given that the implementation was complex and quite a lot of
> > code changed to make it happen. Hopefully the last of the kinks will have
> > been worked out before 3.0. I just wondered if it should be a default in
> > 3.0 if it hasn't yet been free of defects for a significant period of
> time.
> > KIP-726 doesn't list any drawbacks for cooperative-sticky, but perhaps
> this
> > is one. I also appreciate that it's already successfully adopted by many,
> > particularly streams / connect users. But this may also be where the
> > feature has the most benefit due to expensive setup/teardown during
> > rebalance, and stop-the-world can be less of a concern for many "regular
> > consumers".
> >
> > This is probably irrelevant here, but another thing to mention is that
> the
> > feature is also very new for C/C++ users in librdkafka, so not many in
> that
> > community have tried incremental cooperative yet. And bugs are still
> > recently being worked out there too.
> >
> > Just playing devil's advocate here, sorry to come across as a negative
> > nancy!
> >
> > On 2021/06/09 00:05:41, Sophie Blee-Goldman  >
> > wrote:
> > > Hey Ryan,
> > >
> > > Yes, I believe any open bugs regarding the cooperative-sticky assignor
> > > should be considered as blockers
> > > to it being made the default, if not blockers to the release in
> general.
> > I

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-10 Thread Luke Chen
Hi Ryan,
Thanks for your good comments. I've listed your comments in "Rejected
Alternatives" in KIP.

1. Some cooperative-sticky related defects might not free before V3.0
→ We've marked important defects as blocker for V3.0, ex: KAFKA-12896.
Please raise any important defect if you found any.

2. Cooperative-sticky assignor is also very new for C/C++ users in
librdkafka, so not many in that community have tried incremental
cooperative yet. And bugs are still recently being worked out there too.
→ Thanks for raising this. I checked this library and found currently
only 1 cooperative-sticky related bug open, which is good (10 bugs are
fixed). Anyway, I think the clients can always change the assignor to other
assignors if there are still bugs in the library.

Thank you.
Luke

On Thu, Jun 10, 2021 at 12:40 PM Ryan Leslie  wrote:

> Thanks for the quick replies, Luke and Sophie.
>
> I've not voted, but I agree with accepting the KIP since it's a superior
> feature. I was just reacting mostly to this comment since it didn't mention
> open issues:
>
> > > > Thanks Luke. We may as well get this KIP in to 3.0 so that we can
> fully
> > > > enable cooperative rebalancing
> > > > by default in 3.0 if we have KAFKA-12477 done in time, and if we
> don't
> > > then
> > > > there's no harm as it's
> > > > not going to change the behavior.
>
> But I see now, as Luke said, that the main issue is already considered a
> blocker so it was assumed. Though, I did also wonder if any bugs that may
> have existed since several version ago should actually hold up 3.0, which I
> know is especially about moving away from ZooKeeper.
>
> My sentiment was just that during many release cycles of Kafka since
> cooperative was introduced, there have been issues discovered. And that
> makes sense given that the implementation was complex and quite a lot of
> code changed to make it happen. Hopefully the last of the kinks will have
> been worked out before 3.0. I just wondered if it should be a default in
> 3.0 if it hasn't yet been free of defects for a significant period of time.
> KIP-726 doesn't list any drawbacks for cooperative-sticky, but perhaps this
> is one. I also appreciate that it's already successfully adopted by many,
> particularly streams / connect users. But this may also be where the
> feature has the most benefit due to expensive setup/teardown during
> rebalance, and stop-the-world can be less of a concern for many "regular
> consumers".
>
> This is probably irrelevant here, but another thing to mention is that the
> feature is also very new for C/C++ users in librdkafka, so not many in that
> community have tried incremental cooperative yet. And bugs are still
> recently being worked out there too.
>
> Just playing devil's advocate here, sorry to come across as a negative
> nancy!
>
> On 2021/06/09 00:05:41, Sophie Blee-Goldman 
> wrote:
> > Hey Ryan,
> >
> > Yes, I believe any open bugs regarding the cooperative-sticky assignor
> > should be considered as blockers
> > to it being made the default, if not blockers to the release in general.
> I
> > don't think they need to block the
> > acceptance of this KIP, though, just possibly the implementation of it.
>


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-09 Thread Ryan Leslie
Thanks for the quick replies, Luke and Sophie.

I've not voted, but I agree with accepting the KIP since it's a superior 
feature. I was just reacting mostly to this comment since it didn't mention 
open issues:

> > > Thanks Luke. We may as well get this KIP in to 3.0 so that we can fully
> > > enable cooperative rebalancing
> > > by default in 3.0 if we have KAFKA-12477 done in time, and if we don't
> > then
> > > there's no harm as it's
> > > not going to change the behavior.

But I see now, as Luke said, that the main issue is already considered a 
blocker so it was assumed. Though, I did also wonder if any bugs that may have 
existed since several version ago should actually hold up 3.0, which I know is 
especially about moving away from ZooKeeper.

My sentiment was just that during many release cycles of Kafka since 
cooperative was introduced, there have been issues discovered. And that makes 
sense given that the implementation was complex and quite a lot of code changed 
to make it happen. Hopefully the last of the kinks will have been worked out 
before 3.0. I just wondered if it should be a default in 3.0 if it hasn't yet 
been free of defects for a significant period of time. KIP-726 doesn't list any 
drawbacks for cooperative-sticky, but perhaps this is one. I also appreciate 
that it's already successfully adopted by many, particularly streams / connect 
users. But this may also be where the feature has the most benefit due to 
expensive setup/teardown during rebalance, and stop-the-world can be less of a 
concern for many "regular consumers".

This is probably irrelevant here, but another thing to mention is that the 
feature is also very new for C/C++ users in librdkafka, so not many in that 
community have tried incremental cooperative yet. And bugs are still recently 
being worked out there too.

Just playing devil's advocate here, sorry to come across as a negative nancy!

On 2021/06/09 00:05:41, Sophie Blee-Goldman  
wrote: 
> Hey Ryan,
> 
> Yes, I believe any open bugs regarding the cooperative-sticky assignor
> should be considered as blockers
> to it being made the default, if not blockers to the release in general. I
> don't think they need to block the
> acceptance of this KIP, though, just possibly the implementation of it.


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-08 Thread Luke Chen
Hi Konstantine,
Thanks for your good comments. I've updated the KIP.

Thank you.
Luke


On Wed, Jun 9, 2021 at 2:42 AM Konstantine Karantasis
 wrote:

> Thanks for the KIP Luke.
>
> Looks good overall. Just a few minor suggestions:
>
> 1. How about replacing
> "Note that this change will also propagate to Connect." with -> "Note that
> this change will also automatically be inherited by sink connectors, like
> any other application that uses Kafka consumers, as long as a consumer
> assignor is not explicitly defined in their configuration."
>
> The current sentence is a bit general and it would be good to avoid any
> confusion with Connect's rebalancing protocol for tasks, which already
> supports a single bounce upgrade given that rebalancing protocols in
> Connect have a linear lineage until today.
>
> 2. Let's remove the placeholder text from the Rejected Alternatives and
> simply state that there aren't any. Unless something is worth mentioning in
> that section.
>
> Thanks,
> Konstantine
>
> On Tue, Jun 8, 2021 at 5:20 AM Luke Chen  wrote:
>
> > Hi Ryan,
> > Thanks for the comments. The KAFKA-12896
> > <https://issues.apache.org/jira/browse/KAFKA-12896> is already set as
> > blocker for V3.0, which means, V3.0 won't be released before KAFKA-12896
> > <https://issues.apache.org/jira/browse/KAFKA-12896> is fixed. I think
> this
> > KIP and KAFKA-12896 <https://issues.apache.org/jira/browse/KAFKA-12896>
> > and
> > work in parallel for V3.0 without conflict.
> >
> > Thank you.
> > Luke
> >
> > On Tue, Jun 8, 2021 at 11:30 AM Ryan Leslie (BLOOMBERG/ 919 3RD A) <
> > rles...@bloomberg.net> wrote:
> >
> > > Hey guys,
> > >
> > > Should open bugs concerning cooperative-sticky also be considered
> > blockers
> > > to making it the default? For example, KAFKA-12896 is perhaps still
> being
> > > investigated:
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-12896
> > >
> > > Thanks,
> > >
> > > Ryan
> > >
> > > From: dev@kafka.apache.org At: 06/07/21 19:37:45 UTC-4:00To:
> > > dev@kafka.apache.org
> > > Subject: Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as
> the
> > > default assignor
> > >
> > > Thanks Luke. We may as well get this KIP in to 3.0 so that we can fully
> > > enable cooperative rebalancing
> > > by default in 3.0 if we have KAFKA-12477 done in time, and if we don't
> > then
> > > there's no harm as it's
> > > not going to change the behavior.
> > >
> > > On Wed, Jun 2, 2021 at 7:28 PM Luke Chen  wrote:
> > >
> > > > Hi Sophie,
> > > > Thanks for the reminder. Yes, I was thinking this KIP doesn't have to
> > be
> > > > put into a major release since it will be fully backward compatible,
> so
> > > no
> > > > need to push it. Currently, if we want to work on this KIP, we need
> > > > KAFKA-12477 and KAFKA-12487. But you're right, we can at least try
> our
> > > best
> > > > to see if we can make it into V3.0 since cooperative rebalancing is a
> > > major
> > > > improvement. I'll kick off a vote later.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Thu, Jun 3, 2021 at 7:08 AM Sophie Blee-Goldman
> > > >  wrote:
> > > >
> > > > > Hey Luke,
> > > > >
> > > > > It's been a while since the last update on this, which is mostly my
> > > fault
> > > > > for picking up
> > > > > other things in the meantime. I'm planning to get back to work
> > > > > on KAFKA-12477 next
> > > > > week but there are minimal changes to the current implementation
> > given
> > > > the
> > > > > proposal
> > > > > I put forth earlier in this KIP discussion, so I think we're good
> to
> > > go.
> > > > >
> > > > > Although this KIP no longer requires a major release since it
> should
> > be
> > > > > fully compatible, I
> > > > > still hope we can get it in to 3.0 since cooperative rebalancing
> is a
> > > > major
> > > > > improvement to
> > > > > the life of a consumer group (and its operator). Can we make sure
> the
> > > KIP
> > > > > reflects the latest
> > > > > and then kick off a vote by next Monday at the late

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-08 Thread Sophie Blee-Goldman
Hey Ryan,

Yes, I believe any open bugs regarding the cooperative-sticky assignor
should be considered as blockers
to it being made the default, if not blockers to the release in general. I
don't think they need to block the
acceptance of this KIP, though, just possibly the implementation of it.

On Tue, Jun 8, 2021 at 11:42 AM Konstantine Karantasis
 wrote:

> Thanks for the KIP Luke.
>
> Looks good overall. Just a few minor suggestions:
>
> 1. How about replacing
> "Note that this change will also propagate to Connect." with -> "Note that
> this change will also automatically be inherited by sink connectors, like
> any other application that uses Kafka consumers, as long as a consumer
> assignor is not explicitly defined in their configuration."
>
> The current sentence is a bit general and it would be good to avoid any
> confusion with Connect's rebalancing protocol for tasks, which already
> supports a single bounce upgrade given that rebalancing protocols in
> Connect have a linear lineage until today.
>
> 2. Let's remove the placeholder text from the Rejected Alternatives and
> simply state that there aren't any. Unless something is worth mentioning in
> that section.
>
> Thanks,
> Konstantine
>
> On Tue, Jun 8, 2021 at 5:20 AM Luke Chen  wrote:
>
> > Hi Ryan,
> > Thanks for the comments. The KAFKA-12896
> > <https://issues.apache.org/jira/browse/KAFKA-12896> is already set as
> > blocker for V3.0, which means, V3.0 won't be released before KAFKA-12896
> > <https://issues.apache.org/jira/browse/KAFKA-12896> is fixed. I think
> this
> > KIP and KAFKA-12896 <https://issues.apache.org/jira/browse/KAFKA-12896>
> > and
> > work in parallel for V3.0 without conflict.
> >
> > Thank you.
> > Luke
> >
> > On Tue, Jun 8, 2021 at 11:30 AM Ryan Leslie (BLOOMBERG/ 919 3RD A) <
> > rles...@bloomberg.net> wrote:
> >
> > > Hey guys,
> > >
> > > Should open bugs concerning cooperative-sticky also be considered
> > blockers
> > > to making it the default? For example, KAFKA-12896 is perhaps still
> being
> > > investigated:
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-12896
> > >
> > > Thanks,
> > >
> > > Ryan
> > >
> > > From: dev@kafka.apache.org At: 06/07/21 19:37:45 UTC-4:00To:
> > > dev@kafka.apache.org
> > > Subject: Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as
> the
> > > default assignor
> > >
> > > Thanks Luke. We may as well get this KIP in to 3.0 so that we can fully
> > > enable cooperative rebalancing
> > > by default in 3.0 if we have KAFKA-12477 done in time, and if we don't
> > then
> > > there's no harm as it's
> > > not going to change the behavior.
> > >
> > > On Wed, Jun 2, 2021 at 7:28 PM Luke Chen  wrote:
> > >
> > > > Hi Sophie,
> > > > Thanks for the reminder. Yes, I was thinking this KIP doesn't have to
> > be
> > > > put into a major release since it will be fully backward compatible,
> so
> > > no
> > > > need to push it. Currently, if we want to work on this KIP, we need
> > > > KAFKA-12477 and KAFKA-12487. But you're right, we can at least try
> our
> > > best
> > > > to see if we can make it into V3.0 since cooperative rebalancing is a
> > > major
> > > > improvement. I'll kick off a vote later.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Thu, Jun 3, 2021 at 7:08 AM Sophie Blee-Goldman
> > > >  wrote:
> > > >
> > > > > Hey Luke,
> > > > >
> > > > > It's been a while since the last update on this, which is mostly my
> > > fault
> > > > > for picking up
> > > > > other things in the meantime. I'm planning to get back to work
> > > > > on KAFKA-12477 next
> > > > > week but there are minimal changes to the current implementation
> > given
> > > > the
> > > > > proposal
> > > > > I put forth earlier in this KIP discussion, so I think we're good
> to
> > > go.
> > > > >
> > > > > Although this KIP no longer requires a major release since it
> should
> > be
> > > > > fully compatible, I
> > > > > still hope we can get it in to 3.0 since cooperative rebalancing
> is a
> > > > major
> > > > > improvement to
> > > > > t

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-08 Thread Konstantine Karantasis
Thanks for the KIP Luke.

Looks good overall. Just a few minor suggestions:

1. How about replacing
"Note that this change will also propagate to Connect." with -> "Note that
this change will also automatically be inherited by sink connectors, like
any other application that uses Kafka consumers, as long as a consumer
assignor is not explicitly defined in their configuration."

The current sentence is a bit general and it would be good to avoid any
confusion with Connect's rebalancing protocol for tasks, which already
supports a single bounce upgrade given that rebalancing protocols in
Connect have a linear lineage until today.

2. Let's remove the placeholder text from the Rejected Alternatives and
simply state that there aren't any. Unless something is worth mentioning in
that section.

Thanks,
Konstantine

On Tue, Jun 8, 2021 at 5:20 AM Luke Chen  wrote:

> Hi Ryan,
> Thanks for the comments. The KAFKA-12896
> <https://issues.apache.org/jira/browse/KAFKA-12896> is already set as
> blocker for V3.0, which means, V3.0 won't be released before KAFKA-12896
> <https://issues.apache.org/jira/browse/KAFKA-12896> is fixed. I think this
> KIP and KAFKA-12896 <https://issues.apache.org/jira/browse/KAFKA-12896>
> and
> work in parallel for V3.0 without conflict.
>
> Thank you.
> Luke
>
> On Tue, Jun 8, 2021 at 11:30 AM Ryan Leslie (BLOOMBERG/ 919 3RD A) <
> rles...@bloomberg.net> wrote:
>
> > Hey guys,
> >
> > Should open bugs concerning cooperative-sticky also be considered
> blockers
> > to making it the default? For example, KAFKA-12896 is perhaps still being
> > investigated:
> >
> > https://issues.apache.org/jira/browse/KAFKA-12896
> >
> > Thanks,
> >
> > Ryan
> >
> > From: dev@kafka.apache.org At: 06/07/21 19:37:45 UTC-4:00To:
> > dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the
> > default assignor
> >
> > Thanks Luke. We may as well get this KIP in to 3.0 so that we can fully
> > enable cooperative rebalancing
> > by default in 3.0 if we have KAFKA-12477 done in time, and if we don't
> then
> > there's no harm as it's
> > not going to change the behavior.
> >
> > On Wed, Jun 2, 2021 at 7:28 PM Luke Chen  wrote:
> >
> > > Hi Sophie,
> > > Thanks for the reminder. Yes, I was thinking this KIP doesn't have to
> be
> > > put into a major release since it will be fully backward compatible, so
> > no
> > > need to push it. Currently, if we want to work on this KIP, we need
> > > KAFKA-12477 and KAFKA-12487. But you're right, we can at least try our
> > best
> > > to see if we can make it into V3.0 since cooperative rebalancing is a
> > major
> > > improvement. I'll kick off a vote later.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Thu, Jun 3, 2021 at 7:08 AM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > Hey Luke,
> > > >
> > > > It's been a while since the last update on this, which is mostly my
> > fault
> > > > for picking up
> > > > other things in the meantime. I'm planning to get back to work
> > > > on KAFKA-12477 next
> > > > week but there are minimal changes to the current implementation
> given
> > > the
> > > > proposal
> > > > I put forth earlier in this KIP discussion, so I think we're good to
> > go.
> > > >
> > > > Although this KIP no longer requires a major release since it should
> be
> > > > fully compatible, I
> > > > still hope we can get it in to 3.0 since cooperative rebalancing is a
> > > major
> > > > improvement to
> > > > the life of a consumer group (and its operator). Can we make sure the
> > KIP
> > > > reflects the latest
> > > > and then kick off a vote by next Monday at the latest so we can make
> > KIP
> > > > freeze?
> > > >
> > > > Thanks!
> > > > Sophie
> > > >
> > > > On Fri, Apr 16, 2021 at 2:33 PM Guozhang Wang 
> > > wrote:
> > > >
> > > > > 1) From user's perspective, it is always possible that a commit
> > within
> > > > > onPartitionsRevoked throw in practice (e.g. if the member missed
> the
> > > > > previous rebalance where its assigned partitions are already
> > > re-assigned)
> > > > > -- and the onPartitionsLost was introduced for that exact reason,
> > i.e.
> > > it
> > 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-08 Thread Luke Chen
Hi Ryan,
Thanks for the comments. The KAFKA-12896
<https://issues.apache.org/jira/browse/KAFKA-12896> is already set as
blocker for V3.0, which means, V3.0 won't be released before KAFKA-12896
<https://issues.apache.org/jira/browse/KAFKA-12896> is fixed. I think this
KIP and KAFKA-12896 <https://issues.apache.org/jira/browse/KAFKA-12896> and
work in parallel for V3.0 without conflict.

Thank you.
Luke

On Tue, Jun 8, 2021 at 11:30 AM Ryan Leslie (BLOOMBERG/ 919 3RD A) <
rles...@bloomberg.net> wrote:

> Hey guys,
>
> Should open bugs concerning cooperative-sticky also be considered blockers
> to making it the default? For example, KAFKA-12896 is perhaps still being
> investigated:
>
> https://issues.apache.org/jira/browse/KAFKA-12896
>
> Thanks,
>
> Ryan
>
> From: dev@kafka.apache.org At: 06/07/21 19:37:45 UTC-4:00To:
> dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the
> default assignor
>
> Thanks Luke. We may as well get this KIP in to 3.0 so that we can fully
> enable cooperative rebalancing
> by default in 3.0 if we have KAFKA-12477 done in time, and if we don't then
> there's no harm as it's
> not going to change the behavior.
>
> On Wed, Jun 2, 2021 at 7:28 PM Luke Chen  wrote:
>
> > Hi Sophie,
> > Thanks for the reminder. Yes, I was thinking this KIP doesn't have to be
> > put into a major release since it will be fully backward compatible, so
> no
> > need to push it. Currently, if we want to work on this KIP, we need
> > KAFKA-12477 and KAFKA-12487. But you're right, we can at least try our
> best
> > to see if we can make it into V3.0 since cooperative rebalancing is a
> major
> > improvement. I'll kick off a vote later.
> >
> > Thank you.
> > Luke
> >
> > On Thu, Jun 3, 2021 at 7:08 AM Sophie Blee-Goldman
> >  wrote:
> >
> > > Hey Luke,
> > >
> > > It's been a while since the last update on this, which is mostly my
> fault
> > > for picking up
> > > other things in the meantime. I'm planning to get back to work
> > > on KAFKA-12477 next
> > > week but there are minimal changes to the current implementation given
> > the
> > > proposal
> > > I put forth earlier in this KIP discussion, so I think we're good to
> go.
> > >
> > > Although this KIP no longer requires a major release since it should be
> > > fully compatible, I
> > > still hope we can get it in to 3.0 since cooperative rebalancing is a
> > major
> > > improvement to
> > > the life of a consumer group (and its operator). Can we make sure the
> KIP
> > > reflects the latest
> > > and then kick off a vote by next Monday at the latest so we can make
> KIP
> > > freeze?
> > >
> > > Thanks!
> > > Sophie
> > >
> > > On Fri, Apr 16, 2021 at 2:33 PM Guozhang Wang 
> > wrote:
> > >
> > > > 1) From user's perspective, it is always possible that a commit
> within
> > > > onPartitionsRevoked throw in practice (e.g. if the member missed the
> > > > previous rebalance where its assigned partitions are already
> > re-assigned)
> > > > -- and the onPartitionsLost was introduced for that exact reason,
> i.e.
> > it
> > > > is primarily for optimizations, but not for correctness guarantees --
> > on
> > > > the other hand, it would be surprising to users to see the commit
> > returns
> > > > and then later found it not going through. Given that, I'd suggest we
> > > still
> > > > throw the exception right away. Regarding the flag itself though, I
> > agree
> > > > that keeping it set until the next succeeded join group makes sense
> to
> > be
> > > > safer.
> > > >
> > > > 2) That's crystal, thank you for the clarification.
> > > >
> > > > On Wed, Apr 14, 2021 at 6:46 PM Sophie Blee-Goldman
> > > >  wrote:
> > > >
> > > > > 1) Once the short-circuit is triggered, the member will downgrade
> to
> > > the
> > > > > EAGER protocol, but
> > > > > won't necessarily try to rejoin the group right away.
> > > > >
> > > > > In the "happy path", the user has implemented #onPartitionsLost
> > > correctly
> > > > > and will not attempt
> > > > > to commit partitions that are lost. And since these partitions have
> > > > indeed
> > > > > been revoked, the user
> > > > > app

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-07 Thread Ryan Leslie (BLOOMBERG/ 919 3RD A)
Hey guys,

Should open bugs concerning cooperative-sticky also be considered blockers to 
making it the default? For example, KAFKA-12896 is perhaps still being 
investigated:

https://issues.apache.org/jira/browse/KAFKA-12896

Thanks,

Ryan

From: dev@kafka.apache.org At: 06/07/21 19:37:45 UTC-4:00To:  
dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the 
default assignor

Thanks Luke. We may as well get this KIP in to 3.0 so that we can fully
enable cooperative rebalancing
by default in 3.0 if we have KAFKA-12477 done in time, and if we don't then
there's no harm as it's
not going to change the behavior.

On Wed, Jun 2, 2021 at 7:28 PM Luke Chen  wrote:

> Hi Sophie,
> Thanks for the reminder. Yes, I was thinking this KIP doesn't have to be
> put into a major release since it will be fully backward compatible, so no
> need to push it. Currently, if we want to work on this KIP, we need
> KAFKA-12477 and KAFKA-12487. But you're right, we can at least try our best
> to see if we can make it into V3.0 since cooperative rebalancing is a major
> improvement. I'll kick off a vote later.
>
> Thank you.
> Luke
>
> On Thu, Jun 3, 2021 at 7:08 AM Sophie Blee-Goldman
>  wrote:
>
> > Hey Luke,
> >
> > It's been a while since the last update on this, which is mostly my fault
> > for picking up
> > other things in the meantime. I'm planning to get back to work
> > on KAFKA-12477 next
> > week but there are minimal changes to the current implementation given
> the
> > proposal
> > I put forth earlier in this KIP discussion, so I think we're good to go.
> >
> > Although this KIP no longer requires a major release since it should be
> > fully compatible, I
> > still hope we can get it in to 3.0 since cooperative rebalancing is a
> major
> > improvement to
> > the life of a consumer group (and its operator). Can we make sure the KIP
> > reflects the latest
> > and then kick off a vote by next Monday at the latest so we can make KIP
> > freeze?
> >
> > Thanks!
> > Sophie
> >
> > On Fri, Apr 16, 2021 at 2:33 PM Guozhang Wang 
> wrote:
> >
> > > 1) From user's perspective, it is always possible that a commit within
> > > onPartitionsRevoked throw in practice (e.g. if the member missed the
> > > previous rebalance where its assigned partitions are already
> re-assigned)
> > > -- and the onPartitionsLost was introduced for that exact reason, i.e.
> it
> > > is primarily for optimizations, but not for correctness guarantees --
> on
> > > the other hand, it would be surprising to users to see the commit
> returns
> > > and then later found it not going through. Given that, I'd suggest we
> > still
> > > throw the exception right away. Regarding the flag itself though, I
> agree
> > > that keeping it set until the next succeeded join group makes sense to
> be
> > > safer.
> > >
> > > 2) That's crystal, thank you for the clarification.
> > >
> > > On Wed, Apr 14, 2021 at 6:46 PM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > 1) Once the short-circuit is triggered, the member will downgrade to
> > the
> > > > EAGER protocol, but
> > > > won't necessarily try to rejoin the group right away.
> > > >
> > > > In the "happy path", the user has implemented #onPartitionsLost
> > correctly
> > > > and will not attempt
> > > > to commit partitions that are lost. And since these partitions have
> > > indeed
> > > > been revoked, the user
> > > > application should not attempt to commit those partitions after this
> > > point.
> > > > In this case, there's no
> > > > reason for the consumer to immediately rejoin the group. Since a
> > > > non-cooperative assignor was
> > > > selected, we know that all partitions have been assigned. This member
> > can
> > > > continue on as usual,
> > > > processing the remaining un-revoked partitions and will follow the
> > EAGER
> > > > protocol in the next
> > > > rebalance. There's no user-facing impact or handling required; all
> that
> > > > happens is that the work
> > > > since the last commit on those revoked partitions has been lost.
> > > >
> > > > In the less-happy path, the user has implemented #onPartitionsLost
> > > > incorrectly or not implemented
> > > > it at all, falling back on the default which invokes
> > #onPartitionsRevoked
> > > &

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-07 Thread Sophie Blee-Goldman
Thanks Luke. We may as well get this KIP in to 3.0 so that we can fully
enable cooperative rebalancing
by default in 3.0 if we have KAFKA-12477 done in time, and if we don't then
there's no harm as it's
not going to change the behavior.

On Wed, Jun 2, 2021 at 7:28 PM Luke Chen  wrote:

> Hi Sophie,
> Thanks for the reminder. Yes, I was thinking this KIP doesn't have to be
> put into a major release since it will be fully backward compatible, so no
> need to push it. Currently, if we want to work on this KIP, we need
> KAFKA-12477 and KAFKA-12487. But you're right, we can at least try our best
> to see if we can make it into V3.0 since cooperative rebalancing is a major
> improvement. I'll kick off a vote later.
>
> Thank you.
> Luke
>
> On Thu, Jun 3, 2021 at 7:08 AM Sophie Blee-Goldman
>  wrote:
>
> > Hey Luke,
> >
> > It's been a while since the last update on this, which is mostly my fault
> > for picking up
> > other things in the meantime. I'm planning to get back to work
> > on KAFKA-12477 next
> > week but there are minimal changes to the current implementation given
> the
> > proposal
> > I put forth earlier in this KIP discussion, so I think we're good to go.
> >
> > Although this KIP no longer requires a major release since it should be
> > fully compatible, I
> > still hope we can get it in to 3.0 since cooperative rebalancing is a
> major
> > improvement to
> > the life of a consumer group (and its operator). Can we make sure the KIP
> > reflects the latest
> > and then kick off a vote by next Monday at the latest so we can make KIP
> > freeze?
> >
> > Thanks!
> > Sophie
> >
> > On Fri, Apr 16, 2021 at 2:33 PM Guozhang Wang 
> wrote:
> >
> > > 1) From user's perspective, it is always possible that a commit within
> > > onPartitionsRevoked throw in practice (e.g. if the member missed the
> > > previous rebalance where its assigned partitions are already
> re-assigned)
> > > -- and the onPartitionsLost was introduced for that exact reason, i.e.
> it
> > > is primarily for optimizations, but not for correctness guarantees --
> on
> > > the other hand, it would be surprising to users to see the commit
> returns
> > > and then later found it not going through. Given that, I'd suggest we
> > still
> > > throw the exception right away. Regarding the flag itself though, I
> agree
> > > that keeping it set until the next succeeded join group makes sense to
> be
> > > safer.
> > >
> > > 2) That's crystal, thank you for the clarification.
> > >
> > > On Wed, Apr 14, 2021 at 6:46 PM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > 1) Once the short-circuit is triggered, the member will downgrade to
> > the
> > > > EAGER protocol, but
> > > > won't necessarily try to rejoin the group right away.
> > > >
> > > > In the "happy path", the user has implemented #onPartitionsLost
> > correctly
> > > > and will not attempt
> > > > to commit partitions that are lost. And since these partitions have
> > > indeed
> > > > been revoked, the user
> > > > application should not attempt to commit those partitions after this
> > > point.
> > > > In this case, there's no
> > > > reason for the consumer to immediately rejoin the group. Since a
> > > > non-cooperative assignor was
> > > > selected, we know that all partitions have been assigned. This member
> > can
> > > > continue on as usual,
> > > > processing the remaining un-revoked partitions and will follow the
> > EAGER
> > > > protocol in the next
> > > > rebalance. There's no user-facing impact or handling required; all
> that
> > > > happens is that the work
> > > > since the last commit on those revoked partitions has been lost.
> > > >
> > > > In the less-happy path, the user has implemented #onPartitionsLost
> > > > incorrectly or not implemented
> > > > it at all, falling back on the default which invokes
> > #onPartitionsRevoked
> > > > which in turn will attempt to
> > > > commit those partitions during the rebalance callback. In this case
> we
> > > rely
> > > > on the flag to prevent
> > > > this commit request from being sent to the broker.
> > > >
> > > > Originally I was thinking we should throw a CommitFailedException up
> > > > through the #onPartitionsLost
> > > > callback, and eventually up through poll(), then rejoin the group.
> But
> > > now
> > > > I'm wondering if this is really
> > > > necessary -- the important point in all cases is just to prevent the
> > > > commit, but there's no reason the
> > > > consumer should not be allowed to continue processing its other
> > > partitions,
> > > > and it hasn't dropped out
> > > > of the group. What do you think about this slight amendment to my
> > > original
> > > > proposal: if a user does end up
> > > > calling commit for whatever reason when invoking #onPartitionsLost,
> > we'll
> > > > just swallow the resulting
> > > > CommitFailedException. So the user application wouldn't see anything,
> > and
> > > > the only impact would be
> > > > that these partitions were not able to commit those last 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-02 Thread Luke Chen
Hi Sophie,
Thanks for the reminder. Yes, I was thinking this KIP doesn't have to be
put into a major release since it will be fully backward compatible, so no
need to push it. Currently, if we want to work on this KIP, we need
KAFKA-12477 and KAFKA-12487. But you're right, we can at least try our best
to see if we can make it into V3.0 since cooperative rebalancing is a major
improvement. I'll kick off a vote later.

Thank you.
Luke

On Thu, Jun 3, 2021 at 7:08 AM Sophie Blee-Goldman
 wrote:

> Hey Luke,
>
> It's been a while since the last update on this, which is mostly my fault
> for picking up
> other things in the meantime. I'm planning to get back to work
> on KAFKA-12477 next
> week but there are minimal changes to the current implementation given the
> proposal
> I put forth earlier in this KIP discussion, so I think we're good to go.
>
> Although this KIP no longer requires a major release since it should be
> fully compatible, I
> still hope we can get it in to 3.0 since cooperative rebalancing is a major
> improvement to
> the life of a consumer group (and its operator). Can we make sure the KIP
> reflects the latest
> and then kick off a vote by next Monday at the latest so we can make KIP
> freeze?
>
> Thanks!
> Sophie
>
> On Fri, Apr 16, 2021 at 2:33 PM Guozhang Wang  wrote:
>
> > 1) From user's perspective, it is always possible that a commit within
> > onPartitionsRevoked throw in practice (e.g. if the member missed the
> > previous rebalance where its assigned partitions are already re-assigned)
> > -- and the onPartitionsLost was introduced for that exact reason, i.e. it
> > is primarily for optimizations, but not for correctness guarantees -- on
> > the other hand, it would be surprising to users to see the commit returns
> > and then later found it not going through. Given that, I'd suggest we
> still
> > throw the exception right away. Regarding the flag itself though, I agree
> > that keeping it set until the next succeeded join group makes sense to be
> > safer.
> >
> > 2) That's crystal, thank you for the clarification.
> >
> > On Wed, Apr 14, 2021 at 6:46 PM Sophie Blee-Goldman
> >  wrote:
> >
> > > 1) Once the short-circuit is triggered, the member will downgrade to
> the
> > > EAGER protocol, but
> > > won't necessarily try to rejoin the group right away.
> > >
> > > In the "happy path", the user has implemented #onPartitionsLost
> correctly
> > > and will not attempt
> > > to commit partitions that are lost. And since these partitions have
> > indeed
> > > been revoked, the user
> > > application should not attempt to commit those partitions after this
> > point.
> > > In this case, there's no
> > > reason for the consumer to immediately rejoin the group. Since a
> > > non-cooperative assignor was
> > > selected, we know that all partitions have been assigned. This member
> can
> > > continue on as usual,
> > > processing the remaining un-revoked partitions and will follow the
> EAGER
> > > protocol in the next
> > > rebalance. There's no user-facing impact or handling required; all that
> > > happens is that the work
> > > since the last commit on those revoked partitions has been lost.
> > >
> > > In the less-happy path, the user has implemented #onPartitionsLost
> > > incorrectly or not implemented
> > > it at all, falling back on the default which invokes
> #onPartitionsRevoked
> > > which in turn will attempt to
> > > commit those partitions during the rebalance callback. In this case we
> > rely
> > > on the flag to prevent
> > > this commit request from being sent to the broker.
> > >
> > > Originally I was thinking we should throw a CommitFailedException up
> > > through the #onPartitionsLost
> > > callback, and eventually up through poll(), then rejoin the group. But
> > now
> > > I'm wondering if this is really
> > > necessary -- the important point in all cases is just to prevent the
> > > commit, but there's no reason the
> > > consumer should not be allowed to continue processing its other
> > partitions,
> > > and it hasn't dropped out
> > > of the group. What do you think about this slight amendment to my
> > original
> > > proposal: if a user does end up
> > > calling commit for whatever reason when invoking #onPartitionsLost,
> we'll
> > > just swallow the resulting
> > > CommitFailedException. So the user application wouldn't see anything,
> and
> > > the only impact would be
> > > that these partitions were not able to commit those last set of offsets
> > on
> > > the revoked partitions.
> > >
> > > WDYT? My only concern there is that the user might have some implicit
> > > assumption that unless a
> > > CommitFailedException was thrown, the offsets of revoked partitions
> were
> > > successfully committed
> > > and they may have some downstream logic that should trigger only in
> this
> > > case. If that's a concern,
> > > then I would keep the original proposal which says a
> > CommitFailedException
> > > will be thrown up through
> > > poll(), and leave it 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-06-02 Thread Sophie Blee-Goldman
Hey Luke,

It's been a while since the last update on this, which is mostly my fault
for picking up
other things in the meantime. I'm planning to get back to work
on KAFKA-12477 next
week but there are minimal changes to the current implementation given the
proposal
I put forth earlier in this KIP discussion, so I think we're good to go.

Although this KIP no longer requires a major release since it should be
fully compatible, I
still hope we can get it in to 3.0 since cooperative rebalancing is a major
improvement to
the life of a consumer group (and its operator). Can we make sure the KIP
reflects the latest
and then kick off a vote by next Monday at the latest so we can make KIP
freeze?

Thanks!
Sophie

On Fri, Apr 16, 2021 at 2:33 PM Guozhang Wang  wrote:

> 1) From user's perspective, it is always possible that a commit within
> onPartitionsRevoked throw in practice (e.g. if the member missed the
> previous rebalance where its assigned partitions are already re-assigned)
> -- and the onPartitionsLost was introduced for that exact reason, i.e. it
> is primarily for optimizations, but not for correctness guarantees -- on
> the other hand, it would be surprising to users to see the commit returns
> and then later found it not going through. Given that, I'd suggest we still
> throw the exception right away. Regarding the flag itself though, I agree
> that keeping it set until the next succeeded join group makes sense to be
> safer.
>
> 2) That's crystal, thank you for the clarification.
>
> On Wed, Apr 14, 2021 at 6:46 PM Sophie Blee-Goldman
>  wrote:
>
> > 1) Once the short-circuit is triggered, the member will downgrade to the
> > EAGER protocol, but
> > won't necessarily try to rejoin the group right away.
> >
> > In the "happy path", the user has implemented #onPartitionsLost correctly
> > and will not attempt
> > to commit partitions that are lost. And since these partitions have
> indeed
> > been revoked, the user
> > application should not attempt to commit those partitions after this
> point.
> > In this case, there's no
> > reason for the consumer to immediately rejoin the group. Since a
> > non-cooperative assignor was
> > selected, we know that all partitions have been assigned. This member can
> > continue on as usual,
> > processing the remaining un-revoked partitions and will follow the EAGER
> > protocol in the next
> > rebalance. There's no user-facing impact or handling required; all that
> > happens is that the work
> > since the last commit on those revoked partitions has been lost.
> >
> > In the less-happy path, the user has implemented #onPartitionsLost
> > incorrectly or not implemented
> > it at all, falling back on the default which invokes #onPartitionsRevoked
> > which in turn will attempt to
> > commit those partitions during the rebalance callback. In this case we
> rely
> > on the flag to prevent
> > this commit request from being sent to the broker.
> >
> > Originally I was thinking we should throw a CommitFailedException up
> > through the #onPartitionsLost
> > callback, and eventually up through poll(), then rejoin the group. But
> now
> > I'm wondering if this is really
> > necessary -- the important point in all cases is just to prevent the
> > commit, but there's no reason the
> > consumer should not be allowed to continue processing its other
> partitions,
> > and it hasn't dropped out
> > of the group. What do you think about this slight amendment to my
> original
> > proposal: if a user does end up
> > calling commit for whatever reason when invoking #onPartitionsLost, we'll
> > just swallow the resulting
> > CommitFailedException. So the user application wouldn't see anything, and
> > the only impact would be
> > that these partitions were not able to commit those last set of offsets
> on
> > the revoked partitions.
> >
> > WDYT? My only concern there is that the user might have some implicit
> > assumption that unless a
> > CommitFailedException was thrown, the offsets of revoked partitions were
> > successfully committed
> > and they may have some downstream logic that should trigger only in this
> > case. If that's a concern,
> > then I would keep the original proposal which says a
> CommitFailedException
> > will be thrown up through
> > poll(), and leave it up to the user to decide if they want to trigger a
> new
> > rebalance/rejoin the group or not.
> >
> > Regarding the flag which prevents committing the revoked partitions, this
> > will need to continue
> > blocking such commit attempts until the next time the consumer rejoins
> the
> > group, ie until the end
> > of the next successful rebalance. Technically this shouldn't matter,
> since
> > the consumer no longer
> > owns those partitions this member shouldn't attempt to commit them
> anyways.
> > Usually we can
> > rely on the broker rejecting commit attempts on partitions that are not
> > owned, in which case the
> > consumer will throw a CommitFailedException. This is similar, except that
> > we 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-16 Thread Guozhang Wang
1) From user's perspective, it is always possible that a commit within
onPartitionsRevoked throw in practice (e.g. if the member missed the
previous rebalance where its assigned partitions are already re-assigned)
-- and the onPartitionsLost was introduced for that exact reason, i.e. it
is primarily for optimizations, but not for correctness guarantees -- on
the other hand, it would be surprising to users to see the commit returns
and then later found it not going through. Given that, I'd suggest we still
throw the exception right away. Regarding the flag itself though, I agree
that keeping it set until the next succeeded join group makes sense to be
safer.

2) That's crystal, thank you for the clarification.

On Wed, Apr 14, 2021 at 6:46 PM Sophie Blee-Goldman
 wrote:

> 1) Once the short-circuit is triggered, the member will downgrade to the
> EAGER protocol, but
> won't necessarily try to rejoin the group right away.
>
> In the "happy path", the user has implemented #onPartitionsLost correctly
> and will not attempt
> to commit partitions that are lost. And since these partitions have indeed
> been revoked, the user
> application should not attempt to commit those partitions after this point.
> In this case, there's no
> reason for the consumer to immediately rejoin the group. Since a
> non-cooperative assignor was
> selected, we know that all partitions have been assigned. This member can
> continue on as usual,
> processing the remaining un-revoked partitions and will follow the EAGER
> protocol in the next
> rebalance. There's no user-facing impact or handling required; all that
> happens is that the work
> since the last commit on those revoked partitions has been lost.
>
> In the less-happy path, the user has implemented #onPartitionsLost
> incorrectly or not implemented
> it at all, falling back on the default which invokes #onPartitionsRevoked
> which in turn will attempt to
> commit those partitions during the rebalance callback. In this case we rely
> on the flag to prevent
> this commit request from being sent to the broker.
>
> Originally I was thinking we should throw a CommitFailedException up
> through the #onPartitionsLost
> callback, and eventually up through poll(), then rejoin the group. But now
> I'm wondering if this is really
> necessary -- the important point in all cases is just to prevent the
> commit, but there's no reason the
> consumer should not be allowed to continue processing its other partitions,
> and it hasn't dropped out
> of the group. What do you think about this slight amendment to my original
> proposal: if a user does end up
> calling commit for whatever reason when invoking #onPartitionsLost, we'll
> just swallow the resulting
> CommitFailedException. So the user application wouldn't see anything, and
> the only impact would be
> that these partitions were not able to commit those last set of offsets on
> the revoked partitions.
>
> WDYT? My only concern there is that the user might have some implicit
> assumption that unless a
> CommitFailedException was thrown, the offsets of revoked partitions were
> successfully committed
> and they may have some downstream logic that should trigger only in this
> case. If that's a concern,
> then I would keep the original proposal which says a CommitFailedException
> will be thrown up through
> poll(), and leave it up to the user to decide if they want to trigger a new
> rebalance/rejoin the group or not.
>
> Regarding the flag which prevents committing the revoked partitions, this
> will need to continue
> blocking such commit attempts until the next time the consumer rejoins the
> group, ie until the end
> of the next successful rebalance. Technically this shouldn't matter, since
> the consumer no longer
> owns those partitions this member shouldn't attempt to commit them anyways.
> Usually we can
> rely on the broker rejecting commit attempts on partitions that are not
> owned, in which case the
> consumer will throw a CommitFailedException. This is similar, except that
> we can't rely on the
> broker having been informed of the change in ownership before this consumer
> might attempt to
> commit. So to avoid this race condition, we'll keep the "blockCommit" flag
> until the next rebalance
> when we can be certain that the broker is clear on this
> partition's ownership.
>
> 2) I guess maybe the wording here is unclear -- what I meant is that all
> 3.0 applications will *eventually*
> enable cooperative rebalancing in the stable state. This doesn't mean that
> it will select COOPERATIVE
> when it first starts up, and in order for this dynamic protocol upgrade to
> be safe we do indeed need to
> start off with EAGER and only upgrade once the selected assignor indicates
> that it's safe to do so.
> (This only applies if multiple assignors are used, if the assignors are
> "cooperative-sticky" only then it
> will just start out and forever remain on COOPERATIVE, like in Streams)
>
> Since it's just the first rebalance, the 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-14 Thread Sophie Blee-Goldman
1) Once the short-circuit is triggered, the member will downgrade to the
EAGER protocol, but
won't necessarily try to rejoin the group right away.

In the "happy path", the user has implemented #onPartitionsLost correctly
and will not attempt
to commit partitions that are lost. And since these partitions have indeed
been revoked, the user
application should not attempt to commit those partitions after this point.
In this case, there's no
reason for the consumer to immediately rejoin the group. Since a
non-cooperative assignor was
selected, we know that all partitions have been assigned. This member can
continue on as usual,
processing the remaining un-revoked partitions and will follow the EAGER
protocol in the next
rebalance. There's no user-facing impact or handling required; all that
happens is that the work
since the last commit on those revoked partitions has been lost.

In the less-happy path, the user has implemented #onPartitionsLost
incorrectly or not implemented
it at all, falling back on the default which invokes #onPartitionsRevoked
which in turn will attempt to
commit those partitions during the rebalance callback. In this case we rely
on the flag to prevent
this commit request from being sent to the broker.

Originally I was thinking we should throw a CommitFailedException up
through the #onPartitionsLost
callback, and eventually up through poll(), then rejoin the group. But now
I'm wondering if this is really
necessary -- the important point in all cases is just to prevent the
commit, but there's no reason the
consumer should not be allowed to continue processing its other partitions,
and it hasn't dropped out
of the group. What do you think about this slight amendment to my original
proposal: if a user does end up
calling commit for whatever reason when invoking #onPartitionsLost, we'll
just swallow the resulting
CommitFailedException. So the user application wouldn't see anything, and
the only impact would be
that these partitions were not able to commit those last set of offsets on
the revoked partitions.

WDYT? My only concern there is that the user might have some implicit
assumption that unless a
CommitFailedException was thrown, the offsets of revoked partitions were
successfully committed
and they may have some downstream logic that should trigger only in this
case. If that's a concern,
then I would keep the original proposal which says a CommitFailedException
will be thrown up through
poll(), and leave it up to the user to decide if they want to trigger a new
rebalance/rejoin the group or not.

Regarding the flag which prevents committing the revoked partitions, this
will need to continue
blocking such commit attempts until the next time the consumer rejoins the
group, ie until the end
of the next successful rebalance. Technically this shouldn't matter, since
the consumer no longer
owns those partitions this member shouldn't attempt to commit them anyways.
Usually we can
rely on the broker rejecting commit attempts on partitions that are not
owned, in which case the
consumer will throw a CommitFailedException. This is similar, except that
we can't rely on the
broker having been informed of the change in ownership before this consumer
might attempt to
commit. So to avoid this race condition, we'll keep the "blockCommit" flag
until the next rebalance
when we can be certain that the broker is clear on this
partition's ownership.

2) I guess maybe the wording here is unclear -- what I meant is that all
3.0 applications will *eventually*
enable cooperative rebalancing in the stable state. This doesn't mean that
it will select COOPERATIVE
when it first starts up, and in order for this dynamic protocol upgrade to
be safe we do indeed need to
start off with EAGER and only upgrade once the selected assignor indicates
that it's safe to do so.
(This only applies if multiple assignors are used, if the assignors are
"cooperative-sticky" only then it
will just start out and forever remain on COOPERATIVE, like in Streams)

Since it's just the first rebalance, the choice of COOPERATIVE vs EAGER
actually doesn't matter at
all since the consumer won't own any partitions until it's joined the
group. So we may as well continue
the initial protocol selection strategy of "highest commonly supported
protocol", but the point is that
3.0 applications will upgrade to COOPERATIVE as soon as they have any
partitions. If you can think
of a better way to phrase "New applications on 3.0 will enable cooperative
rebalancing by default" then
please let me know.


Thanks for the response -- hope this makes sense so far, but I'm happy to
elaborate any aspects of the
proposal which aren't clear. I'll also update the ticket description
for KAFKA-12477 with the latest.


On Wed, Apr 14, 2021 at 12:03 PM Guozhang Wang  wrote:

> Hello Sophie,
>
> Thanks for the detailed explanation, a few clarifying questions:
>
> 1) when the short-circuit is triggered, what would happen next? Would the
> consumers switch back to EAGER, 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-14 Thread Guozhang Wang
Hello Sophie,

Thanks for the detailed explanation, a few clarifying questions:

1) when the short-circuit is triggered, what would happen next? Would the
consumers switch back to EAGER, and try to re-join the group, and then upon
succeeding the next rebalance reset the flag to allow committing? Or would
we just fail the consumer immediately.

2) at the overview you mentioned "New applications on 3.0 will enable
cooperative rebalancing by default", but in the detailed description as
"With ["cooperative-sticky", "range”], the initial protocol will be EAGER
when the member first joins the group." which seems contradictory? If we
want to have cooperative behavior be the default, then with the
default ["cooperative-sticky", "range”] the member would start with
COOPERATIVE protocol right away.


Guozhang



On Mon, Apr 12, 2021 at 5:19 AM Chris Egerton 
wrote:

> Whoops, small correction--meant to say
> ConsumerRebalanceListener::onPartitionsLost, not Consumer::onPartitionsLost
>
> On Mon, Apr 12, 2021 at 8:17 AM Chris Egerton  wrote:
>
> > Hi Sophie,
> >
> > This sounds fantastic. I've made a note on KAFKA-12487 about being sure
> to
> > implement Consumer::onPartitionsLost to avoid unnecessary task failures
> on
> > consumer protocol downgrade, but besides that, I don't think things could
> > get any smoother for Connect users or developers. The automatic protocol
> > upgrade/downgrade behavior appears safe, intuitive, and pain-free.
> >
> > Really excited for this development and hoping we can see it come to
> > fruition in time for the 3.0 release!
> >
> > Cheers,
> >
> > Chris
> >
> > On Fri, Apr 9, 2021 at 2:43 PM Sophie Blee-Goldman
> >  wrote:
> >
> >> 1) Yes, all of the above will be part of KAFKA-12477 (not KIP-726)
> >>
> >> 2) No, KAFKA-12638 would be nice to have but I don't think it's
> >> appropriate
> >> to remove
> >> the default implementation of #onPartitionsLost in 3.0 since we never
> gave
> >> any indication
> >> yet that we intend to remove it
> >>
> >> 3) Yes, this would be similar to when a Consumer drops out of the group.
> >> It's always been
> >> possible for a member to miss a rebalance and have its partition be
> >> reassigned to another
> >> member, during which time both members would claim to own said
> partition.
> >> But this is safe
> >> because the member who dropped out is blocked from committing offsets on
> >> that partition.
> >>
> >> On Fri, Apr 9, 2021 at 2:46 AM Luke Chen  wrote:
> >>
> >> > Hi Sophie,
> >> > That sounds great to take care of each case I can think of.
> >> > Questions:
> >> > 1. Do you mean the short-Circuit will also be implemented in
> >> KAFKA-12477?
> >> > 2. I don't think KAFKA-12638 is the blocker of this KIP-726, Am I
> right?
> >> > 3. So, does that mean we still have possibility to have multiple
> >> consumer
> >> > owned the same topic partition? And in this situation, we avoid them
> >> doing
> >> > committing, and waiting for next rebalance (should be soon). Is my
> >> > understanding correct?
> >> >
> >> > Thank you very much for finding this great solution.
> >> >
> >> > Luke
> >> >
> >> > On Fri, Apr 9, 2021 at 11:37 AM Sophie Blee-Goldman
> >> >  wrote:
> >> >
> >> > > Alright, here's the detailed proposal for KAFKA-12477. This assumes
> we
> >> > will
> >> > > change the default assignor to ["cooperative-sticky", "range"] in
> >> > KIP-726.
> >> > > It also acknowledges that users may attempt any kind of upgrade
> >> without
> >> > > reading the docs, and so we need to put in safeguards against data
> >> > > corruption rather than assume everyone will follow the safe upgrade
> >> path.
> >> > >
> >> > > With this proposal,
> >> > > 1) New applications on 3.0 will enable cooperative rebalancing by
> >> default
> >> > > 2) Existing applications which don’t set an assignor can safely
> >> upgrade
> >> > to
> >> > > 3.0 using a single rolling bounce with no extra steps, and will
> >> > > automatically transition to cooperative rebalancing
> >> > > 3) Existing applications which do set an assignor that uses EAGER
> can
> >> > > likewise upgrade their applications to COOPERATIVE with a single
> >> rolling
> >> > > bounce
> >> > > 4) Once on 3.0, applications can safely go back and forth between
> >> EAGER
> >> > and
> >> > > COOPERATIVE
> >> > > 5) Applications can safely downgrade away from 3.0
> >> > >
> >> > > The high-level idea for dynamic protocol upgrades is that the group
> >> will
> >> > > leverage the assignor selected by the group coordinator to determine
> >> when
> >> > > it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to
> >> protect
> >> > the
> >> > > group in case of rare events or user misconfiguration. The group
> >> > > coordinator selects the most preferred assignor that’s supported by
> >> all
> >> > > members of the group, so we know that all members will support
> >> > COOPERATIVE
> >> > > once we receive the “cooperative-sticky” assignor after a rebalance.
> >> At
> >> > > this point, each member can 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-12 Thread Chris Egerton
Whoops, small correction--meant to say
ConsumerRebalanceListener::onPartitionsLost, not Consumer::onPartitionsLost

On Mon, Apr 12, 2021 at 8:17 AM Chris Egerton  wrote:

> Hi Sophie,
>
> This sounds fantastic. I've made a note on KAFKA-12487 about being sure to
> implement Consumer::onPartitionsLost to avoid unnecessary task failures on
> consumer protocol downgrade, but besides that, I don't think things could
> get any smoother for Connect users or developers. The automatic protocol
> upgrade/downgrade behavior appears safe, intuitive, and pain-free.
>
> Really excited for this development and hoping we can see it come to
> fruition in time for the 3.0 release!
>
> Cheers,
>
> Chris
>
> On Fri, Apr 9, 2021 at 2:43 PM Sophie Blee-Goldman
>  wrote:
>
>> 1) Yes, all of the above will be part of KAFKA-12477 (not KIP-726)
>>
>> 2) No, KAFKA-12638 would be nice to have but I don't think it's
>> appropriate
>> to remove
>> the default implementation of #onPartitionsLost in 3.0 since we never gave
>> any indication
>> yet that we intend to remove it
>>
>> 3) Yes, this would be similar to when a Consumer drops out of the group.
>> It's always been
>> possible for a member to miss a rebalance and have its partition be
>> reassigned to another
>> member, during which time both members would claim to own said partition.
>> But this is safe
>> because the member who dropped out is blocked from committing offsets on
>> that partition.
>>
>> On Fri, Apr 9, 2021 at 2:46 AM Luke Chen  wrote:
>>
>> > Hi Sophie,
>> > That sounds great to take care of each case I can think of.
>> > Questions:
>> > 1. Do you mean the short-Circuit will also be implemented in
>> KAFKA-12477?
>> > 2. I don't think KAFKA-12638 is the blocker of this KIP-726, Am I right?
>> > 3. So, does that mean we still have possibility to have multiple
>> consumer
>> > owned the same topic partition? And in this situation, we avoid them
>> doing
>> > committing, and waiting for next rebalance (should be soon). Is my
>> > understanding correct?
>> >
>> > Thank you very much for finding this great solution.
>> >
>> > Luke
>> >
>> > On Fri, Apr 9, 2021 at 11:37 AM Sophie Blee-Goldman
>> >  wrote:
>> >
>> > > Alright, here's the detailed proposal for KAFKA-12477. This assumes we
>> > will
>> > > change the default assignor to ["cooperative-sticky", "range"] in
>> > KIP-726.
>> > > It also acknowledges that users may attempt any kind of upgrade
>> without
>> > > reading the docs, and so we need to put in safeguards against data
>> > > corruption rather than assume everyone will follow the safe upgrade
>> path.
>> > >
>> > > With this proposal,
>> > > 1) New applications on 3.0 will enable cooperative rebalancing by
>> default
>> > > 2) Existing applications which don’t set an assignor can safely
>> upgrade
>> > to
>> > > 3.0 using a single rolling bounce with no extra steps, and will
>> > > automatically transition to cooperative rebalancing
>> > > 3) Existing applications which do set an assignor that uses EAGER can
>> > > likewise upgrade their applications to COOPERATIVE with a single
>> rolling
>> > > bounce
>> > > 4) Once on 3.0, applications can safely go back and forth between
>> EAGER
>> > and
>> > > COOPERATIVE
>> > > 5) Applications can safely downgrade away from 3.0
>> > >
>> > > The high-level idea for dynamic protocol upgrades is that the group
>> will
>> > > leverage the assignor selected by the group coordinator to determine
>> when
>> > > it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to
>> protect
>> > the
>> > > group in case of rare events or user misconfiguration. The group
>> > > coordinator selects the most preferred assignor that’s supported by
>> all
>> > > members of the group, so we know that all members will support
>> > COOPERATIVE
>> > > once we receive the “cooperative-sticky” assignor after a rebalance.
>> At
>> > > this point, each member can upgrade their own protocol to COOPERATIVE.
>> > > However, there may be situations in which an EAGER member may join the
>> > > group even after upgrading to COOPERATIVE. For example, during a
>> rolling
>> > > upgrade if the last remaining member on the old bytecode misses a
>> > > rebalance, the other members will be allowed to upgrade to
>> COOPERATIVE.
>> > If
>> > > the old member rejoins and is chosen to be the group leader before
>> it’s
>> > > upgraded to 3.0, it won’t be aware that the other members of the group
>> > have
>> > > not yet revoked their partitions when computing the assignment.
>> > >
>> > > Short Circuit:
>> > > The risk of mixing the cooperative and eager rebalancing protocols is
>> > that
>> > > a partition may be assigned to one member while it has yet to be
>> revoked
>> > > from its previous owner. The danger is that the new owner may begin
>> > > processing and committing offsets for this partition while the
>> previous
>> > > owner is also committing offsets in its #onPartitionsRevoked callback,
>> > > which is invoked at the end of the rebalance 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-12 Thread Chris Egerton
Hi Sophie,

This sounds fantastic. I've made a note on KAFKA-12487 about being sure to
implement Consumer::onPartitionsLost to avoid unnecessary task failures on
consumer protocol downgrade, but besides that, I don't think things could
get any smoother for Connect users or developers. The automatic protocol
upgrade/downgrade behavior appears safe, intuitive, and pain-free.

Really excited for this development and hoping we can see it come to
fruition in time for the 3.0 release!

Cheers,

Chris

On Fri, Apr 9, 2021 at 2:43 PM Sophie Blee-Goldman
 wrote:

> 1) Yes, all of the above will be part of KAFKA-12477 (not KIP-726)
>
> 2) No, KAFKA-12638 would be nice to have but I don't think it's appropriate
> to remove
> the default implementation of #onPartitionsLost in 3.0 since we never gave
> any indication
> yet that we intend to remove it
>
> 3) Yes, this would be similar to when a Consumer drops out of the group.
> It's always been
> possible for a member to miss a rebalance and have its partition be
> reassigned to another
> member, during which time both members would claim to own said partition.
> But this is safe
> because the member who dropped out is blocked from committing offsets on
> that partition.
>
> On Fri, Apr 9, 2021 at 2:46 AM Luke Chen  wrote:
>
> > Hi Sophie,
> > That sounds great to take care of each case I can think of.
> > Questions:
> > 1. Do you mean the short-Circuit will also be implemented in KAFKA-12477?
> > 2. I don't think KAFKA-12638 is the blocker of this KIP-726, Am I right?
> > 3. So, does that mean we still have possibility to have multiple consumer
> > owned the same topic partition? And in this situation, we avoid them
> doing
> > committing, and waiting for next rebalance (should be soon). Is my
> > understanding correct?
> >
> > Thank you very much for finding this great solution.
> >
> > Luke
> >
> > On Fri, Apr 9, 2021 at 11:37 AM Sophie Blee-Goldman
> >  wrote:
> >
> > > Alright, here's the detailed proposal for KAFKA-12477. This assumes we
> > will
> > > change the default assignor to ["cooperative-sticky", "range"] in
> > KIP-726.
> > > It also acknowledges that users may attempt any kind of upgrade without
> > > reading the docs, and so we need to put in safeguards against data
> > > corruption rather than assume everyone will follow the safe upgrade
> path.
> > >
> > > With this proposal,
> > > 1) New applications on 3.0 will enable cooperative rebalancing by
> default
> > > 2) Existing applications which don’t set an assignor can safely upgrade
> > to
> > > 3.0 using a single rolling bounce with no extra steps, and will
> > > automatically transition to cooperative rebalancing
> > > 3) Existing applications which do set an assignor that uses EAGER can
> > > likewise upgrade their applications to COOPERATIVE with a single
> rolling
> > > bounce
> > > 4) Once on 3.0, applications can safely go back and forth between EAGER
> > and
> > > COOPERATIVE
> > > 5) Applications can safely downgrade away from 3.0
> > >
> > > The high-level idea for dynamic protocol upgrades is that the group
> will
> > > leverage the assignor selected by the group coordinator to determine
> when
> > > it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to protect
> > the
> > > group in case of rare events or user misconfiguration. The group
> > > coordinator selects the most preferred assignor that’s supported by all
> > > members of the group, so we know that all members will support
> > COOPERATIVE
> > > once we receive the “cooperative-sticky” assignor after a rebalance. At
> > > this point, each member can upgrade their own protocol to COOPERATIVE.
> > > However, there may be situations in which an EAGER member may join the
> > > group even after upgrading to COOPERATIVE. For example, during a
> rolling
> > > upgrade if the last remaining member on the old bytecode misses a
> > > rebalance, the other members will be allowed to upgrade to COOPERATIVE.
> > If
> > > the old member rejoins and is chosen to be the group leader before it’s
> > > upgraded to 3.0, it won’t be aware that the other members of the group
> > have
> > > not yet revoked their partitions when computing the assignment.
> > >
> > > Short Circuit:
> > > The risk of mixing the cooperative and eager rebalancing protocols is
> > that
> > > a partition may be assigned to one member while it has yet to be
> revoked
> > > from its previous owner. The danger is that the new owner may begin
> > > processing and committing offsets for this partition while the previous
> > > owner is also committing offsets in its #onPartitionsRevoked callback,
> > > which is invoked at the end of the rebalance in the cooperative
> protocol.
> > > This can result in these consumers overwriting each other’s offsets and
> > > getting a corrupted view of the partition. Note that it’s not possible
> to
> > > commit during a rebalance, so we can protect against offset corruption
> by
> > > blocking further commits after we detect that the 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-09 Thread Sophie Blee-Goldman
1) Yes, all of the above will be part of KAFKA-12477 (not KIP-726)

2) No, KAFKA-12638 would be nice to have but I don't think it's appropriate
to remove
the default implementation of #onPartitionsLost in 3.0 since we never gave
any indication
yet that we intend to remove it

3) Yes, this would be similar to when a Consumer drops out of the group.
It's always been
possible for a member to miss a rebalance and have its partition be
reassigned to another
member, during which time both members would claim to own said partition.
But this is safe
because the member who dropped out is blocked from committing offsets on
that partition.

On Fri, Apr 9, 2021 at 2:46 AM Luke Chen  wrote:

> Hi Sophie,
> That sounds great to take care of each case I can think of.
> Questions:
> 1. Do you mean the short-Circuit will also be implemented in KAFKA-12477?
> 2. I don't think KAFKA-12638 is the blocker of this KIP-726, Am I right?
> 3. So, does that mean we still have possibility to have multiple consumer
> owned the same topic partition? And in this situation, we avoid them doing
> committing, and waiting for next rebalance (should be soon). Is my
> understanding correct?
>
> Thank you very much for finding this great solution.
>
> Luke
>
> On Fri, Apr 9, 2021 at 11:37 AM Sophie Blee-Goldman
>  wrote:
>
> > Alright, here's the detailed proposal for KAFKA-12477. This assumes we
> will
> > change the default assignor to ["cooperative-sticky", "range"] in
> KIP-726.
> > It also acknowledges that users may attempt any kind of upgrade without
> > reading the docs, and so we need to put in safeguards against data
> > corruption rather than assume everyone will follow the safe upgrade path.
> >
> > With this proposal,
> > 1) New applications on 3.0 will enable cooperative rebalancing by default
> > 2) Existing applications which don’t set an assignor can safely upgrade
> to
> > 3.0 using a single rolling bounce with no extra steps, and will
> > automatically transition to cooperative rebalancing
> > 3) Existing applications which do set an assignor that uses EAGER can
> > likewise upgrade their applications to COOPERATIVE with a single rolling
> > bounce
> > 4) Once on 3.0, applications can safely go back and forth between EAGER
> and
> > COOPERATIVE
> > 5) Applications can safely downgrade away from 3.0
> >
> > The high-level idea for dynamic protocol upgrades is that the group will
> > leverage the assignor selected by the group coordinator to determine when
> > it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to protect
> the
> > group in case of rare events or user misconfiguration. The group
> > coordinator selects the most preferred assignor that’s supported by all
> > members of the group, so we know that all members will support
> COOPERATIVE
> > once we receive the “cooperative-sticky” assignor after a rebalance. At
> > this point, each member can upgrade their own protocol to COOPERATIVE.
> > However, there may be situations in which an EAGER member may join the
> > group even after upgrading to COOPERATIVE. For example, during a rolling
> > upgrade if the last remaining member on the old bytecode misses a
> > rebalance, the other members will be allowed to upgrade to COOPERATIVE.
> If
> > the old member rejoins and is chosen to be the group leader before it’s
> > upgraded to 3.0, it won’t be aware that the other members of the group
> have
> > not yet revoked their partitions when computing the assignment.
> >
> > Short Circuit:
> > The risk of mixing the cooperative and eager rebalancing protocols is
> that
> > a partition may be assigned to one member while it has yet to be revoked
> > from its previous owner. The danger is that the new owner may begin
> > processing and committing offsets for this partition while the previous
> > owner is also committing offsets in its #onPartitionsRevoked callback,
> > which is invoked at the end of the rebalance in the cooperative protocol.
> > This can result in these consumers overwriting each other’s offsets and
> > getting a corrupted view of the partition. Note that it’s not possible to
> > commit during a rebalance, so we can protect against offset corruption by
> > blocking further commits after we detect that the group leader may not
> > understand COOPERATIVE, but before we invoke #onPartitionsRevoked. This
> is
> > the “short-circuit” — if we detect that the group is in an unsafe state,
> we
> > invoke #onPartitionsLost instead of #onPartitionsRevoked and explicitly
> > prevent offsets from being committed on those revoked partitions.
> >
> > Consumer procedure:
> > Upon startup, the consumer will initially select the highest
> > commonly-supported protocol across its configured assignors. With
> > ["cooperative-sticky", "range”], the initial protocol will be EAGER when
> > the member first joins the group. Following a rebalance, each member will
> > check the selected assignor. If the chosen assignor supports COOPERATIVE,
> > the member can upgrade their 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-09 Thread Luke Chen
Hi Sophie,
That sounds great to take care of each case I can think of.
Questions:
1. Do you mean the short-Circuit will also be implemented in KAFKA-12477?
2. I don't think KAFKA-12638 is the blocker of this KIP-726, Am I right?
3. So, does that mean we still have possibility to have multiple consumer
owned the same topic partition? And in this situation, we avoid them doing
committing, and waiting for next rebalance (should be soon). Is my
understanding correct?

Thank you very much for finding this great solution.

Luke

On Fri, Apr 9, 2021 at 11:37 AM Sophie Blee-Goldman
 wrote:

> Alright, here's the detailed proposal for KAFKA-12477. This assumes we will
> change the default assignor to ["cooperative-sticky", "range"] in KIP-726.
> It also acknowledges that users may attempt any kind of upgrade without
> reading the docs, and so we need to put in safeguards against data
> corruption rather than assume everyone will follow the safe upgrade path.
>
> With this proposal,
> 1) New applications on 3.0 will enable cooperative rebalancing by default
> 2) Existing applications which don’t set an assignor can safely upgrade to
> 3.0 using a single rolling bounce with no extra steps, and will
> automatically transition to cooperative rebalancing
> 3) Existing applications which do set an assignor that uses EAGER can
> likewise upgrade their applications to COOPERATIVE with a single rolling
> bounce
> 4) Once on 3.0, applications can safely go back and forth between EAGER and
> COOPERATIVE
> 5) Applications can safely downgrade away from 3.0
>
> The high-level idea for dynamic protocol upgrades is that the group will
> leverage the assignor selected by the group coordinator to determine when
> it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to protect the
> group in case of rare events or user misconfiguration. The group
> coordinator selects the most preferred assignor that’s supported by all
> members of the group, so we know that all members will support COOPERATIVE
> once we receive the “cooperative-sticky” assignor after a rebalance. At
> this point, each member can upgrade their own protocol to COOPERATIVE.
> However, there may be situations in which an EAGER member may join the
> group even after upgrading to COOPERATIVE. For example, during a rolling
> upgrade if the last remaining member on the old bytecode misses a
> rebalance, the other members will be allowed to upgrade to COOPERATIVE. If
> the old member rejoins and is chosen to be the group leader before it’s
> upgraded to 3.0, it won’t be aware that the other members of the group have
> not yet revoked their partitions when computing the assignment.
>
> Short Circuit:
> The risk of mixing the cooperative and eager rebalancing protocols is that
> a partition may be assigned to one member while it has yet to be revoked
> from its previous owner. The danger is that the new owner may begin
> processing and committing offsets for this partition while the previous
> owner is also committing offsets in its #onPartitionsRevoked callback,
> which is invoked at the end of the rebalance in the cooperative protocol.
> This can result in these consumers overwriting each other’s offsets and
> getting a corrupted view of the partition. Note that it’s not possible to
> commit during a rebalance, so we can protect against offset corruption by
> blocking further commits after we detect that the group leader may not
> understand COOPERATIVE, but before we invoke #onPartitionsRevoked. This is
> the “short-circuit” — if we detect that the group is in an unsafe state, we
> invoke #onPartitionsLost instead of #onPartitionsRevoked and explicitly
> prevent offsets from being committed on those revoked partitions.
>
> Consumer procedure:
> Upon startup, the consumer will initially select the highest
> commonly-supported protocol across its configured assignors. With
> ["cooperative-sticky", "range”], the initial protocol will be EAGER when
> the member first joins the group. Following a rebalance, each member will
> check the selected assignor. If the chosen assignor supports COOPERATIVE,
> the member can upgrade their used protocol to COOPERATIVE and no further
> action is required. If the member is already on COOPERATIVE but the
> selected assignor does NOT support it, then we need to trigger the
> short-circuit. In this case we will invoke #onPartitionsLost instead of
> #onPartitionsRevoked, and set a flag to block any attempts at committing
> those partitions which have been revoked. If a commit is attempted, as may
> be the case if the user does not implement #onPartitionsLost (see
> KAFKA-12638), we will throw a CommitFailedException which will be bubbled
> up through poll() after completing the rebalance. The member will then
> downgrade its protocol to EAGER for the next rebalance.
>
> Let me know what you think,
> Sophie
>
> On Fri, Apr 2, 2021 at 7:08 PM Luke Chen  wrote:
>
> > Hi Sophie,
> > Making the default to "cooperative-sticky, range" is a 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-08 Thread Sophie Blee-Goldman
Alright, here's the detailed proposal for KAFKA-12477. This assumes we will
change the default assignor to ["cooperative-sticky", "range"] in KIP-726.
It also acknowledges that users may attempt any kind of upgrade without
reading the docs, and so we need to put in safeguards against data
corruption rather than assume everyone will follow the safe upgrade path.

With this proposal,
1) New applications on 3.0 will enable cooperative rebalancing by default
2) Existing applications which don’t set an assignor can safely upgrade to
3.0 using a single rolling bounce with no extra steps, and will
automatically transition to cooperative rebalancing
3) Existing applications which do set an assignor that uses EAGER can
likewise upgrade their applications to COOPERATIVE with a single rolling
bounce
4) Once on 3.0, applications can safely go back and forth between EAGER and
COOPERATIVE
5) Applications can safely downgrade away from 3.0

The high-level idea for dynamic protocol upgrades is that the group will
leverage the assignor selected by the group coordinator to determine when
it’s safe to upgrade to COOPERATIVE, and trigger a fail-safe to protect the
group in case of rare events or user misconfiguration. The group
coordinator selects the most preferred assignor that’s supported by all
members of the group, so we know that all members will support COOPERATIVE
once we receive the “cooperative-sticky” assignor after a rebalance. At
this point, each member can upgrade their own protocol to COOPERATIVE.
However, there may be situations in which an EAGER member may join the
group even after upgrading to COOPERATIVE. For example, during a rolling
upgrade if the last remaining member on the old bytecode misses a
rebalance, the other members will be allowed to upgrade to COOPERATIVE. If
the old member rejoins and is chosen to be the group leader before it’s
upgraded to 3.0, it won’t be aware that the other members of the group have
not yet revoked their partitions when computing the assignment.

Short Circuit:
The risk of mixing the cooperative and eager rebalancing protocols is that
a partition may be assigned to one member while it has yet to be revoked
from its previous owner. The danger is that the new owner may begin
processing and committing offsets for this partition while the previous
owner is also committing offsets in its #onPartitionsRevoked callback,
which is invoked at the end of the rebalance in the cooperative protocol.
This can result in these consumers overwriting each other’s offsets and
getting a corrupted view of the partition. Note that it’s not possible to
commit during a rebalance, so we can protect against offset corruption by
blocking further commits after we detect that the group leader may not
understand COOPERATIVE, but before we invoke #onPartitionsRevoked. This is
the “short-circuit” — if we detect that the group is in an unsafe state, we
invoke #onPartitionsLost instead of #onPartitionsRevoked and explicitly
prevent offsets from being committed on those revoked partitions.

Consumer procedure:
Upon startup, the consumer will initially select the highest
commonly-supported protocol across its configured assignors. With
["cooperative-sticky", "range”], the initial protocol will be EAGER when
the member first joins the group. Following a rebalance, each member will
check the selected assignor. If the chosen assignor supports COOPERATIVE,
the member can upgrade their used protocol to COOPERATIVE and no further
action is required. If the member is already on COOPERATIVE but the
selected assignor does NOT support it, then we need to trigger the
short-circuit. In this case we will invoke #onPartitionsLost instead of
#onPartitionsRevoked, and set a flag to block any attempts at committing
those partitions which have been revoked. If a commit is attempted, as may
be the case if the user does not implement #onPartitionsLost (see
KAFKA-12638), we will throw a CommitFailedException which will be bubbled
up through poll() after completing the rebalance. The member will then
downgrade its protocol to EAGER for the next rebalance.

Let me know what you think,
Sophie

On Fri, Apr 2, 2021 at 7:08 PM Luke Chen  wrote:

> Hi Sophie,
> Making the default to "cooperative-sticky, range" is a smart idea, to
> ensure we can at least fall back to rangeAssignor if consumers are not
> following our recommended upgrade path. I updated the KIP accordingly.
>
> Hi Chris,
> No problem, I updated the KIP to include the change in Connect.
>
> Thank you very much.
>
> Luke
>
> On Thu, Apr 1, 2021 at 3:24 AM Chris Egerton 
> wrote:
>
> > Hi all,
> >
> > @Sophie - I like the sound of the dual-protocol default. The smooth
> upgrade
> > path it permits sounds fantastic!
> >
> > @Luke - Do you think we can also include Connect in this KIP? Right now
> we
> > don't set any custom partition assignment strategies for the consumer
> > groups we bring up for sink tasks, and if we continue to just use the
> > default, the assignment 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-02 Thread Luke Chen
Hi Sophie,
Making the default to "cooperative-sticky, range" is a smart idea, to
ensure we can at least fall back to rangeAssignor if consumers are not
following our recommended upgrade path. I updated the KIP accordingly.

Hi Chris,
No problem, I updated the KIP to include the change in Connect.

Thank you very much.

Luke

On Thu, Apr 1, 2021 at 3:24 AM Chris Egerton 
wrote:

> Hi all,
>
> @Sophie - I like the sound of the dual-protocol default. The smooth upgrade
> path it permits sounds fantastic!
>
> @Luke - Do you think we can also include Connect in this KIP? Right now we
> don't set any custom partition assignment strategies for the consumer
> groups we bring up for sink tasks, and if we continue to just use the
> default, the assignment strategy for those consumer groups would change on
> Connect clusters once people upgrade to 3.0. I think this is fine (assuming
> we can take care of https://issues.apache.org/jira/browse/KAFKA-12487
> before then, which I'm fairly optimistic about), but it might be worth a
> sentence or two in the KIP explaining that the change in default will
> intentionally propagate to Connect. And, if we think Connect should be left
> out of this change and stay on the range assignor instead, we should
> probably call that fact out in the KIP as well and state that Connect will
> now override the default partition assignment strategy to be the range
> assignor (assuming the user hasn't specified a value for
> consumer.partition.assignment.strategy in their worker config or for
> consumer.override.partition.assignment.strategy in their connector config).
>
> Cheers,
>
> Chris
>
> On Wed, Mar 31, 2021 at 12:18 AM Sophie Blee-Goldman
>  wrote:
>
> > Ok I'm still fleshing out all the details of KAFKA-12477 but I think we
> can
> > simplify some things a bit, and avoid
> > any kind of "fail-fast" which will require user intervention. In fact I
> > think we can avoid requiring the user to make
> > any changes at all for KIP-726, so we don't have to worry about whether
> > they actually read our documentation:
> >
> > Instead of making ["cooperative-sticky"] the default, we change the
> default
> > to ["cooperative-sticky", "range"].
> > Since "range" is the old default, this is equivalent to the first rolling
> > bounce of the safe upgrade path in KIP-429.
> >
> > Of course this also means that under the current protocol selection
> > mechanism we won't actually upgrade to
> > cooperative rebalancing with the default assignor. But that's where
> > KAFKA-12477 will come in.
> >
> > @Guozhang Wang   I'll get back to you with a
> > concrete proposal and answer your questions, I just want to point out
> > that it's possible to side-step the risk of users shooting themselves in
> > the foot (well, at least in this one specific case,
> > obviously they always find a way)
> >
> > On Tue, Mar 30, 2021 at 10:37 AM Guozhang Wang 
> wrote:
> >
> > > Hi Sophie,
> > >
> > > My question is more related to KAFKA-12477, but since your latest
> replies
> > > are on this thread I figured we can follow-up on the same venue. Just
> so
> > I
> > > understand your latest comments above about the approach:
> > >
> > > * I think, we would need to persist this decision so that the group
> would
> > > never go back to the eager protocol, this bit would be written to the
> > > internal topic's assignment message. Is that correct?
> > > * Maybe you can describe the steps, after the group has decided to move
> > > forward with cooperative protocols, when:
> > > 1) a new member joined the group with the old version, and hence only
> > > recognized eager protocol and executing the eager protocol with its
> first
> > > rebalance, what would happen.
> > > 2) in addition to 1), the new member joined the group with the old
> > version
> > > and only recognized the old subscription format, and was selected as
> the
> > > leader, what would happen.
> > >
> > > Guozhang
> > >
> > >
> > >
> > >
> > > On Mon, Mar 29, 2021 at 10:30 PM Luke Chen  wrote:
> > >
> > > > Hi Sophie & Ismael,
> > > > Thank you for your feedback.
> > > > No problem, let's pause this KIP and wait for this improvement:
> > > KAFKA-12477
> > > > .
> > > >
> > > > Stay tuned :)
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma 
> wrote:
> > > >
> > > > > Hi Sophie,
> > > > >
> > > > > I didn't analyze the KIP in detail, but the two suggestions you
> > > mentioned
> > > > > sound like great improvements.
> > > > >
> > > > > A bit more context: breaking changes for a widely used product like
> > > Kafka
> > > > > are costly and hence why we try as hard as we can to avoid them.
> When
> > > it
> > > > > comes to the brokers, they are often managed by a central group (or
> > > > they're
> > > > > in the Cloud), so they're a bit easier to manage. Even so, it's
> still
> > > > > possible to upgrade from 0.8.x directly to 2.7 since all protocol
> > > > 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-31 Thread Chris Egerton
Hi all,

@Sophie - I like the sound of the dual-protocol default. The smooth upgrade
path it permits sounds fantastic!

@Luke - Do you think we can also include Connect in this KIP? Right now we
don't set any custom partition assignment strategies for the consumer
groups we bring up for sink tasks, and if we continue to just use the
default, the assignment strategy for those consumer groups would change on
Connect clusters once people upgrade to 3.0. I think this is fine (assuming
we can take care of https://issues.apache.org/jira/browse/KAFKA-12487
before then, which I'm fairly optimistic about), but it might be worth a
sentence or two in the KIP explaining that the change in default will
intentionally propagate to Connect. And, if we think Connect should be left
out of this change and stay on the range assignor instead, we should
probably call that fact out in the KIP as well and state that Connect will
now override the default partition assignment strategy to be the range
assignor (assuming the user hasn't specified a value for
consumer.partition.assignment.strategy in their worker config or for
consumer.override.partition.assignment.strategy in their connector config).

Cheers,

Chris

On Wed, Mar 31, 2021 at 12:18 AM Sophie Blee-Goldman
 wrote:

> Ok I'm still fleshing out all the details of KAFKA-12477 but I think we can
> simplify some things a bit, and avoid
> any kind of "fail-fast" which will require user intervention. In fact I
> think we can avoid requiring the user to make
> any changes at all for KIP-726, so we don't have to worry about whether
> they actually read our documentation:
>
> Instead of making ["cooperative-sticky"] the default, we change the default
> to ["cooperative-sticky", "range"].
> Since "range" is the old default, this is equivalent to the first rolling
> bounce of the safe upgrade path in KIP-429.
>
> Of course this also means that under the current protocol selection
> mechanism we won't actually upgrade to
> cooperative rebalancing with the default assignor. But that's where
> KAFKA-12477 will come in.
>
> @Guozhang Wang   I'll get back to you with a
> concrete proposal and answer your questions, I just want to point out
> that it's possible to side-step the risk of users shooting themselves in
> the foot (well, at least in this one specific case,
> obviously they always find a way)
>
> On Tue, Mar 30, 2021 at 10:37 AM Guozhang Wang  wrote:
>
> > Hi Sophie,
> >
> > My question is more related to KAFKA-12477, but since your latest replies
> > are on this thread I figured we can follow-up on the same venue. Just so
> I
> > understand your latest comments above about the approach:
> >
> > * I think, we would need to persist this decision so that the group would
> > never go back to the eager protocol, this bit would be written to the
> > internal topic's assignment message. Is that correct?
> > * Maybe you can describe the steps, after the group has decided to move
> > forward with cooperative protocols, when:
> > 1) a new member joined the group with the old version, and hence only
> > recognized eager protocol and executing the eager protocol with its first
> > rebalance, what would happen.
> > 2) in addition to 1), the new member joined the group with the old
> version
> > and only recognized the old subscription format, and was selected as the
> > leader, what would happen.
> >
> > Guozhang
> >
> >
> >
> >
> > On Mon, Mar 29, 2021 at 10:30 PM Luke Chen  wrote:
> >
> > > Hi Sophie & Ismael,
> > > Thank you for your feedback.
> > > No problem, let's pause this KIP and wait for this improvement:
> > KAFKA-12477
> > > .
> > >
> > > Stay tuned :)
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma  wrote:
> > >
> > > > Hi Sophie,
> > > >
> > > > I didn't analyze the KIP in detail, but the two suggestions you
> > mentioned
> > > > sound like great improvements.
> > > >
> > > > A bit more context: breaking changes for a widely used product like
> > Kafka
> > > > are costly and hence why we try as hard as we can to avoid them. When
> > it
> > > > comes to the brokers, they are often managed by a central group (or
> > > they're
> > > > in the Cloud), so they're a bit easier to manage. Even so, it's still
> > > > possible to upgrade from 0.8.x directly to 2.7 since all protocol
> > > versions
> > > > are still supported. When it comes to the basic clients (producer,
> > > > consumer, admin client), they're often embedded in applications so we
> > > have
> > > > to be even more conservative.
> > > >
> > > > Ismael
> > > >
> > > > On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
> > > >  wrote:
> > > >
> > > > > Ismael,
> > > > >
> > > > > It seems like given 3.0 is a breaking release, we have to rely on
> > users
> > > > > being aware of this and responsible
> > > > > enough to read the upgrade guide. Otherwise we could never ever
> make
> > > any
> > > > > breaking changes beyond just

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-30 Thread Sophie Blee-Goldman
Ok I'm still fleshing out all the details of KAFKA-12477 but I think we can
simplify some things a bit, and avoid
any kind of "fail-fast" which will require user intervention. In fact I
think we can avoid requiring the user to make
any changes at all for KIP-726, so we don't have to worry about whether
they actually read our documentation:

Instead of making ["cooperative-sticky"] the default, we change the default
to ["cooperative-sticky", "range"].
Since "range" is the old default, this is equivalent to the first rolling
bounce of the safe upgrade path in KIP-429.

Of course this also means that under the current protocol selection
mechanism we won't actually upgrade to
cooperative rebalancing with the default assignor. But that's where
KAFKA-12477 will come in.

@Guozhang Wang   I'll get back to you with a
concrete proposal and answer your questions, I just want to point out
that it's possible to side-step the risk of users shooting themselves in
the foot (well, at least in this one specific case,
obviously they always find a way)

On Tue, Mar 30, 2021 at 10:37 AM Guozhang Wang  wrote:

> Hi Sophie,
>
> My question is more related to KAFKA-12477, but since your latest replies
> are on this thread I figured we can follow-up on the same venue. Just so I
> understand your latest comments above about the approach:
>
> * I think, we would need to persist this decision so that the group would
> never go back to the eager protocol, this bit would be written to the
> internal topic's assignment message. Is that correct?
> * Maybe you can describe the steps, after the group has decided to move
> forward with cooperative protocols, when:
> 1) a new member joined the group with the old version, and hence only
> recognized eager protocol and executing the eager protocol with its first
> rebalance, what would happen.
> 2) in addition to 1), the new member joined the group with the old version
> and only recognized the old subscription format, and was selected as the
> leader, what would happen.
>
> Guozhang
>
>
>
>
> On Mon, Mar 29, 2021 at 10:30 PM Luke Chen  wrote:
>
> > Hi Sophie & Ismael,
> > Thank you for your feedback.
> > No problem, let's pause this KIP and wait for this improvement:
> KAFKA-12477
> > .
> >
> > Stay tuned :)
> >
> > Thank you.
> > Luke
> >
> > On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma  wrote:
> >
> > > Hi Sophie,
> > >
> > > I didn't analyze the KIP in detail, but the two suggestions you
> mentioned
> > > sound like great improvements.
> > >
> > > A bit more context: breaking changes for a widely used product like
> Kafka
> > > are costly and hence why we try as hard as we can to avoid them. When
> it
> > > comes to the brokers, they are often managed by a central group (or
> > they're
> > > in the Cloud), so they're a bit easier to manage. Even so, it's still
> > > possible to upgrade from 0.8.x directly to 2.7 since all protocol
> > versions
> > > are still supported. When it comes to the basic clients (producer,
> > > consumer, admin client), they're often embedded in applications so we
> > have
> > > to be even more conservative.
> > >
> > > Ismael
> > >
> > > On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > Ismael,
> > > >
> > > > It seems like given 3.0 is a breaking release, we have to rely on
> users
> > > > being aware of this and responsible
> > > > enough to read the upgrade guide. Otherwise we could never ever make
> > any
> > > > breaking changes beyond just
> > > > removing deprecated APIs or other compilation-breaking errors that
> > would
> > > be
> > > > immediately visible, no?
> > > >
> > > > That said, obviously it's better to have a circuit-breaker that will
> > fail
> > > > fast in case of a user misconfiguration
> > > > rather than silently corrupting the consumer group state -- eg for
> two
> > > > consumers to overlap in their ownership
> > > > of the same partition(s). We could definitely implement this, and now
> > > that
> > > > I think about it this might solve a
> > > > related problem in KAFKA-12477
> > > > . We just add a
> new
> > > > field to the Assignment in which the group leader
> > > > indicates whether it's on a recent enough version to understand
> > > cooperative
> > > > rebalancing. If an upgraded member
> > > > joins the group, it'll only be allowed to start following the new
> > > > rebalancing protocol after receiving the go-ahead
> > > > from the group leader.
> > > >
> > > > If we do go ahead and add this new field in the Assignment then I'm
> > > pretty
> > > > confident we can reduce the number
> > > > of required rolling bounces to just one with KAFKA-12477
> > > > . In that case we
> > > > should
> > > > be in much better shape to
> > > > feel good about changing the default to the
> CooperativeStickyAssignor.
> > > How
> > > > does that sound?
> > > >
> > 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-30 Thread Guozhang Wang
Hi Sophie,

My question is more related to KAFKA-12477, but since your latest replies
are on this thread I figured we can follow-up on the same venue. Just so I
understand your latest comments above about the approach:

* I think, we would need to persist this decision so that the group would
never go back to the eager protocol, this bit would be written to the
internal topic's assignment message. Is that correct?
* Maybe you can describe the steps, after the group has decided to move
forward with cooperative protocols, when:
1) a new member joined the group with the old version, and hence only
recognized eager protocol and executing the eager protocol with its first
rebalance, what would happen.
2) in addition to 1), the new member joined the group with the old version
and only recognized the old subscription format, and was selected as the
leader, what would happen.

Guozhang




On Mon, Mar 29, 2021 at 10:30 PM Luke Chen  wrote:

> Hi Sophie & Ismael,
> Thank you for your feedback.
> No problem, let's pause this KIP and wait for this improvement: KAFKA-12477
> .
>
> Stay tuned :)
>
> Thank you.
> Luke
>
> On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma  wrote:
>
> > Hi Sophie,
> >
> > I didn't analyze the KIP in detail, but the two suggestions you mentioned
> > sound like great improvements.
> >
> > A bit more context: breaking changes for a widely used product like Kafka
> > are costly and hence why we try as hard as we can to avoid them. When it
> > comes to the brokers, they are often managed by a central group (or
> they're
> > in the Cloud), so they're a bit easier to manage. Even so, it's still
> > possible to upgrade from 0.8.x directly to 2.7 since all protocol
> versions
> > are still supported. When it comes to the basic clients (producer,
> > consumer, admin client), they're often embedded in applications so we
> have
> > to be even more conservative.
> >
> > Ismael
> >
> > On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
> >  wrote:
> >
> > > Ismael,
> > >
> > > It seems like given 3.0 is a breaking release, we have to rely on users
> > > being aware of this and responsible
> > > enough to read the upgrade guide. Otherwise we could never ever make
> any
> > > breaking changes beyond just
> > > removing deprecated APIs or other compilation-breaking errors that
> would
> > be
> > > immediately visible, no?
> > >
> > > That said, obviously it's better to have a circuit-breaker that will
> fail
> > > fast in case of a user misconfiguration
> > > rather than silently corrupting the consumer group state -- eg for two
> > > consumers to overlap in their ownership
> > > of the same partition(s). We could definitely implement this, and now
> > that
> > > I think about it this might solve a
> > > related problem in KAFKA-12477
> > > . We just add a new
> > > field to the Assignment in which the group leader
> > > indicates whether it's on a recent enough version to understand
> > cooperative
> > > rebalancing. If an upgraded member
> > > joins the group, it'll only be allowed to start following the new
> > > rebalancing protocol after receiving the go-ahead
> > > from the group leader.
> > >
> > > If we do go ahead and add this new field in the Assignment then I'm
> > pretty
> > > confident we can reduce the number
> > > of required rolling bounces to just one with KAFKA-12477
> > > . In that case we
> > > should
> > > be in much better shape to
> > > feel good about changing the default to the CooperativeStickyAssignor.
> > How
> > > does that sound?
> > >
> > > To be clear, I'm not proposing we do this as part of KIP-726. Here's my
> > > take:
> > >
> > > Let's pause this KIP while I work on making these two improvements in
> > > KAFKA-12477 . Once
> I
> > > can
> > > confirm the
> > > short-circuit and single rolling bounce will be available for 3.0, I'll
> > > report back on this thread. Then we can move
> > > forward with this KIP again.
> > >
> > > Thoughts?
> > > Sophie
> > >
> > > On Mon, Mar 29, 2021 at 12:01 AM Luke Chen  wrote:
> > >
> > > > Hi Ismael,
> > > > Thanks for your good question. Answer them below:
> > > > *1. Are we saying that every consumer upgraded would have to follow
> the
> > > > complex path described in the KIP? *
> > > > --> We suggest that every consumer did these 2 steps of rolling
> > upgrade.
> > > > And after KAFKA-12477 <
> > https://issues.apache.org/jira/browse/KAFKA-12477
> > > >
> > > > is completed, it can be reduced to 1 rolling upgrade.
> > > >
> > > > *2. what happens if they don't read the instructions and upgrade as
> > they
> > > > have in the past?*
> > > > --> The reason we want 2 steps of rolling upgrade is that we want to
> > > avoid
> > > > the situation where leader is on old byte-code and only recognize
> > > "eager",
> > > > but due to 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-29 Thread Luke Chen
Hi Sophie & Ismael,
Thank you for your feedback.
No problem, let's pause this KIP and wait for this improvement: KAFKA-12477
.

Stay tuned :)

Thank you.
Luke

On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma  wrote:

> Hi Sophie,
>
> I didn't analyze the KIP in detail, but the two suggestions you mentioned
> sound like great improvements.
>
> A bit more context: breaking changes for a widely used product like Kafka
> are costly and hence why we try as hard as we can to avoid them. When it
> comes to the brokers, they are often managed by a central group (or they're
> in the Cloud), so they're a bit easier to manage. Even so, it's still
> possible to upgrade from 0.8.x directly to 2.7 since all protocol versions
> are still supported. When it comes to the basic clients (producer,
> consumer, admin client), they're often embedded in applications so we have
> to be even more conservative.
>
> Ismael
>
> On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
>  wrote:
>
> > Ismael,
> >
> > It seems like given 3.0 is a breaking release, we have to rely on users
> > being aware of this and responsible
> > enough to read the upgrade guide. Otherwise we could never ever make any
> > breaking changes beyond just
> > removing deprecated APIs or other compilation-breaking errors that would
> be
> > immediately visible, no?
> >
> > That said, obviously it's better to have a circuit-breaker that will fail
> > fast in case of a user misconfiguration
> > rather than silently corrupting the consumer group state -- eg for two
> > consumers to overlap in their ownership
> > of the same partition(s). We could definitely implement this, and now
> that
> > I think about it this might solve a
> > related problem in KAFKA-12477
> > . We just add a new
> > field to the Assignment in which the group leader
> > indicates whether it's on a recent enough version to understand
> cooperative
> > rebalancing. If an upgraded member
> > joins the group, it'll only be allowed to start following the new
> > rebalancing protocol after receiving the go-ahead
> > from the group leader.
> >
> > If we do go ahead and add this new field in the Assignment then I'm
> pretty
> > confident we can reduce the number
> > of required rolling bounces to just one with KAFKA-12477
> > . In that case we
> > should
> > be in much better shape to
> > feel good about changing the default to the CooperativeStickyAssignor.
> How
> > does that sound?
> >
> > To be clear, I'm not proposing we do this as part of KIP-726. Here's my
> > take:
> >
> > Let's pause this KIP while I work on making these two improvements in
> > KAFKA-12477 . Once I
> > can
> > confirm the
> > short-circuit and single rolling bounce will be available for 3.0, I'll
> > report back on this thread. Then we can move
> > forward with this KIP again.
> >
> > Thoughts?
> > Sophie
> >
> > On Mon, Mar 29, 2021 at 12:01 AM Luke Chen  wrote:
> >
> > > Hi Ismael,
> > > Thanks for your good question. Answer them below:
> > > *1. Are we saying that every consumer upgraded would have to follow the
> > > complex path described in the KIP? *
> > > --> We suggest that every consumer did these 2 steps of rolling
> upgrade.
> > > And after KAFKA-12477 <
> https://issues.apache.org/jira/browse/KAFKA-12477
> > >
> > > is completed, it can be reduced to 1 rolling upgrade.
> > >
> > > *2. what happens if they don't read the instructions and upgrade as
> they
> > > have in the past?*
> > > --> The reason we want 2 steps of rolling upgrade is that we want to
> > avoid
> > > the situation where leader is on old byte-code and only recognize
> > "eager",
> > > but due to compatibility would still be able to deserialize the new
> > > protocol data from newer versioned members, and hence just go ahead and
> > do
> > > the assignment while new versioned members did not revoke their
> > partitions
> > > before joining the group.
> > >
> > > But I'd say, the new default assignor "CooperativeStickyAssignor" was
> > > already introduced in V2.4.0, and it should be long enough for user to
> > > upgrade to the new byte-code to recognize the "cooperative" protocol.
> > >
> > > What do you think?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma 
> wrote:
> > >
> > > > Thanks for the KIP. Are we saying that every consumer upgraded would
> > have
> > > > to follow the complex path described in the KIP? Also, what happens
> if
> > > they
> > > > don't read the instructions and upgrade as they have in the past?
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Mar 26, 2021, 1:53 AM Luke Chen  wrote:
> > > >
> > > > > Hi everyone,
> > > > > 
> > > > >
> > > > > I'd like to discuss the following proposal to make the
> > > > > CooperativeStickyAssignor as the default assignor.
> > > > >
> > > > >
> > > 

Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-29 Thread Ismael Juma
Hi Sophie,

I didn't analyze the KIP in detail, but the two suggestions you mentioned
sound like great improvements.

A bit more context: breaking changes for a widely used product like Kafka
are costly and hence why we try as hard as we can to avoid them. When it
comes to the brokers, they are often managed by a central group (or they're
in the Cloud), so they're a bit easier to manage. Even so, it's still
possible to upgrade from 0.8.x directly to 2.7 since all protocol versions
are still supported. When it comes to the basic clients (producer,
consumer, admin client), they're often embedded in applications so we have
to be even more conservative.

Ismael

On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
 wrote:

> Ismael,
>
> It seems like given 3.0 is a breaking release, we have to rely on users
> being aware of this and responsible
> enough to read the upgrade guide. Otherwise we could never ever make any
> breaking changes beyond just
> removing deprecated APIs or other compilation-breaking errors that would be
> immediately visible, no?
>
> That said, obviously it's better to have a circuit-breaker that will fail
> fast in case of a user misconfiguration
> rather than silently corrupting the consumer group state -- eg for two
> consumers to overlap in their ownership
> of the same partition(s). We could definitely implement this, and now that
> I think about it this might solve a
> related problem in KAFKA-12477
> . We just add a new
> field to the Assignment in which the group leader
> indicates whether it's on a recent enough version to understand cooperative
> rebalancing. If an upgraded member
> joins the group, it'll only be allowed to start following the new
> rebalancing protocol after receiving the go-ahead
> from the group leader.
>
> If we do go ahead and add this new field in the Assignment then I'm pretty
> confident we can reduce the number
> of required rolling bounces to just one with KAFKA-12477
> . In that case we
> should
> be in much better shape to
> feel good about changing the default to the CooperativeStickyAssignor. How
> does that sound?
>
> To be clear, I'm not proposing we do this as part of KIP-726. Here's my
> take:
>
> Let's pause this KIP while I work on making these two improvements in
> KAFKA-12477 . Once I
> can
> confirm the
> short-circuit and single rolling bounce will be available for 3.0, I'll
> report back on this thread. Then we can move
> forward with this KIP again.
>
> Thoughts?
> Sophie
>
> On Mon, Mar 29, 2021 at 12:01 AM Luke Chen  wrote:
>
> > Hi Ismael,
> > Thanks for your good question. Answer them below:
> > *1. Are we saying that every consumer upgraded would have to follow the
> > complex path described in the KIP? *
> > --> We suggest that every consumer did these 2 steps of rolling upgrade.
> > And after KAFKA-12477  >
> > is completed, it can be reduced to 1 rolling upgrade.
> >
> > *2. what happens if they don't read the instructions and upgrade as they
> > have in the past?*
> > --> The reason we want 2 steps of rolling upgrade is that we want to
> avoid
> > the situation where leader is on old byte-code and only recognize
> "eager",
> > but due to compatibility would still be able to deserialize the new
> > protocol data from newer versioned members, and hence just go ahead and
> do
> > the assignment while new versioned members did not revoke their
> partitions
> > before joining the group.
> >
> > But I'd say, the new default assignor "CooperativeStickyAssignor" was
> > already introduced in V2.4.0, and it should be long enough for user to
> > upgrade to the new byte-code to recognize the "cooperative" protocol.
> >
> > What do you think?
> >
> > Thank you.
> > Luke
> >
> > On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma  wrote:
> >
> > > Thanks for the KIP. Are we saying that every consumer upgraded would
> have
> > > to follow the complex path described in the KIP? Also, what happens if
> > they
> > > don't read the instructions and upgrade as they have in the past?
> > >
> > > Ismael
> > >
> > > On Fri, Mar 26, 2021, 1:53 AM Luke Chen  wrote:
> > >
> > > > Hi everyone,
> > > > 
> > > >
> > > > I'd like to discuss the following proposal to make the
> > > > CooperativeStickyAssignor as the default assignor.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
> > > >
> > > > Any comments are welcomed.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-29 Thread Sophie Blee-Goldman
Ismael,

It seems like given 3.0 is a breaking release, we have to rely on users
being aware of this and responsible
enough to read the upgrade guide. Otherwise we could never ever make any
breaking changes beyond just
removing deprecated APIs or other compilation-breaking errors that would be
immediately visible, no?

That said, obviously it's better to have a circuit-breaker that will fail
fast in case of a user misconfiguration
rather than silently corrupting the consumer group state -- eg for two
consumers to overlap in their ownership
of the same partition(s). We could definitely implement this, and now that
I think about it this might solve a
related problem in KAFKA-12477
. We just add a new
field to the Assignment in which the group leader
indicates whether it's on a recent enough version to understand cooperative
rebalancing. If an upgraded member
joins the group, it'll only be allowed to start following the new
rebalancing protocol after receiving the go-ahead
from the group leader.

If we do go ahead and add this new field in the Assignment then I'm pretty
confident we can reduce the number
of required rolling bounces to just one with KAFKA-12477
. In that case we should
be in much better shape to
feel good about changing the default to the CooperativeStickyAssignor. How
does that sound?

To be clear, I'm not proposing we do this as part of KIP-726. Here's my
take:

Let's pause this KIP while I work on making these two improvements in
KAFKA-12477 . Once I can
confirm the
short-circuit and single rolling bounce will be available for 3.0, I'll
report back on this thread. Then we can move
forward with this KIP again.

Thoughts?
Sophie

On Mon, Mar 29, 2021 at 12:01 AM Luke Chen  wrote:

> Hi Ismael,
> Thanks for your good question. Answer them below:
> *1. Are we saying that every consumer upgraded would have to follow the
> complex path described in the KIP? *
> --> We suggest that every consumer did these 2 steps of rolling upgrade.
> And after KAFKA-12477 
> is completed, it can be reduced to 1 rolling upgrade.
>
> *2. what happens if they don't read the instructions and upgrade as they
> have in the past?*
> --> The reason we want 2 steps of rolling upgrade is that we want to avoid
> the situation where leader is on old byte-code and only recognize "eager",
> but due to compatibility would still be able to deserialize the new
> protocol data from newer versioned members, and hence just go ahead and do
> the assignment while new versioned members did not revoke their partitions
> before joining the group.
>
> But I'd say, the new default assignor "CooperativeStickyAssignor" was
> already introduced in V2.4.0, and it should be long enough for user to
> upgrade to the new byte-code to recognize the "cooperative" protocol.
>
> What do you think?
>
> Thank you.
> Luke
>
> On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma  wrote:
>
> > Thanks for the KIP. Are we saying that every consumer upgraded would have
> > to follow the complex path described in the KIP? Also, what happens if
> they
> > don't read the instructions and upgrade as they have in the past?
> >
> > Ismael
> >
> > On Fri, Mar 26, 2021, 1:53 AM Luke Chen  wrote:
> >
> > > Hi everyone,
> > > 
> > >
> > > I'd like to discuss the following proposal to make the
> > > CooperativeStickyAssignor as the default assignor.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
> > >
> > > Any comments are welcomed.
> > >
> > > Thank you.
> > > Luke
> > >
> >
>


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-29 Thread Luke Chen
Hi Ismael,
Thanks for your good question. Answer them below:
*1. Are we saying that every consumer upgraded would have to follow the
complex path described in the KIP? *
--> We suggest that every consumer did these 2 steps of rolling upgrade.
And after KAFKA-12477 
is completed, it can be reduced to 1 rolling upgrade.

*2. what happens if they don't read the instructions and upgrade as they
have in the past?*
--> The reason we want 2 steps of rolling upgrade is that we want to avoid
the situation where leader is on old byte-code and only recognize "eager",
but due to compatibility would still be able to deserialize the new
protocol data from newer versioned members, and hence just go ahead and do
the assignment while new versioned members did not revoke their partitions
before joining the group.

But I'd say, the new default assignor "CooperativeStickyAssignor" was
already introduced in V2.4.0, and it should be long enough for user to
upgrade to the new byte-code to recognize the "cooperative" protocol.

What do you think?

Thank you.
Luke

On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma  wrote:

> Thanks for the KIP. Are we saying that every consumer upgraded would have
> to follow the complex path described in the KIP? Also, what happens if they
> don't read the instructions and upgrade as they have in the past?
>
> Ismael
>
> On Fri, Mar 26, 2021, 1:53 AM Luke Chen  wrote:
>
> > Hi everyone,
> > 
> >
> > I'd like to discuss the following proposal to make the
> > CooperativeStickyAssignor as the default assignor.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
> >
> > Any comments are welcomed.
> >
> > Thank you.
> > Luke
> >
>


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-28 Thread Ismael Juma
Thanks for the KIP. Are we saying that every consumer upgraded would have
to follow the complex path described in the KIP? Also, what happens if they
don't read the instructions and upgrade as they have in the past?

Ismael

On Fri, Mar 26, 2021, 1:53 AM Luke Chen  wrote:

> Hi everyone,
> 
>
> I'd like to discuss the following proposal to make the
> CooperativeStickyAssignor as the default assignor.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
>
> Any comments are welcomed.
>
> Thank you.
> Luke
>


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-28 Thread Luke Chen
Hi Sophie,
Thanks for your good suggestion. I've updated in the KIP-726.

Thank you.
Luke

On Sat, Mar 27, 2021 at 3:24 AM Sophie Blee-Goldman
 wrote:

> Thanks for the KIP! I'm 100% on board with this (obviously :P) and the KIP
> itself looks good to me
> overall. Just one clarification I think you should make:
>
> In the *Public Interfaces* section you say "It won't affect the current
> consumers" -- this is only true
> if those current consumers have explicitly set the
> * partition.assignment.strategy *config on their
> clients. If they've been relying on the default thus far, they will need to
> either follow the safe upgrade
> path as described in KIP-429 or else set the
> *partition.assignment.strategy* config
> to one of the non-
> cooperative assignors during the rolling upgrade if they wish to remain on
> EAGER and/or perform
> the upgrade in just a single rolling bounce.
>
> On that note:
> We have some thoughts for improving the upgrade experience to actually
> reduce the
> safe upgrade path to just a single rolling bounce. This is still in
> progress and we need to
> work out all the kinks so I wouldn't commit to a single rolling bounce
> upgrade as part of
> this KIP, but it's worth noting that by 3.0 we may be in an even better
> position with regards
> to this assignor. The ticket is
> https://issues.apache.org/jira/browse/KAFKA-12477
>
> Cheers,
> Sophie
>
> On Fri, Mar 26, 2021 at 1:53 AM Luke Chen  wrote:
>
> > Hi everyone,
> > 
> >
> > I'd like to discuss the following proposal to make the
> > CooperativeStickyAssignor as the default assignor.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
> >
> > Any comments are welcomed.
> >
> > Thank you.
> > Luke
> >
>


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-26 Thread Sophie Blee-Goldman
Thanks for the KIP! I'm 100% on board with this (obviously :P) and the KIP
itself looks good to me
overall. Just one clarification I think you should make:

In the *Public Interfaces* section you say "It won't affect the current
consumers" -- this is only true
if those current consumers have explicitly set the
* partition.assignment.strategy *config on their
clients. If they've been relying on the default thus far, they will need to
either follow the safe upgrade
path as described in KIP-429 or else set the
*partition.assignment.strategy* config
to one of the non-
cooperative assignors during the rolling upgrade if they wish to remain on
EAGER and/or perform
the upgrade in just a single rolling bounce.

On that note:
We have some thoughts for improving the upgrade experience to actually
reduce the
safe upgrade path to just a single rolling bounce. This is still in
progress and we need to
work out all the kinks so I wouldn't commit to a single rolling bounce
upgrade as part of
this KIP, but it's worth noting that by 3.0 we may be in an even better
position with regards
to this assignor. The ticket is
https://issues.apache.org/jira/browse/KAFKA-12477

Cheers,
Sophie

On Fri, Mar 26, 2021 at 1:53 AM Luke Chen  wrote:

> Hi everyone,
> 
>
> I'd like to discuss the following proposal to make the
> CooperativeStickyAssignor as the default assignor.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
>
> Any comments are welcomed.
>
> Thank you.
> Luke
>