[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-09-09 Thread feyman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192885#comment-17192885
 ] 

feyman commented on KAFKA-10284:


Just FYI, created a PR: https://github.com/apache/kafka/pull/9270

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Critical
>  Labels: help-wanted
> Attachments: How to reproduce the issue in KAFKA-10284.md
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-23 Thread feyman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182608#comment-17182608
 ] 

feyman commented on KAFKA-10284:


Checked with [~bchen225242] offline, and I just reproduced the issue with the 
procedure as in the attachment, happy to know if there are more failure 
scenarios related.[^How to reproduce the issue in KAFKA-10284.md]

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Critical
>  Labels: help-wanted
> Attachments: How to reproduce the issue in KAFKA-10284.md
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-12 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176639#comment-17176639
 ] 

Boyang Chen commented on KAFKA-10284:
-

I would say for all.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: help-wanted
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-11 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175905#comment-17175905
 ] 

Sophie Blee-Goldman commented on KAFKA-10284:
-

[~bchen225242] [~hachikuji] Is this a correctness problem for all applications 
or only EOS?

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: help-wanted
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-08-11 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175669#comment-17175669
 ] 

Boyang Chen commented on KAFKA-10284:
-

Resign from this ticket for now, others feel free to pick up.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: help-wanted
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-28 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166681#comment-17166681
 ] 

Boyang Chen commented on KAFKA-10284:
-

That's a good observation, I forgot that for static members they don't actually 
do another Fetch offset immediately after they rejoin the group. You should be 
right about this, and I haven't thought about the scenario where EOS and static 
membership are both turned on.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-27 Thread Jason Gustafson (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165892#comment-17165892
 ] 

Jason Gustafson commented on KAFKA-10284:
-

Hmm.. Not totally sure I buy this:

> A static member X joins the group and updates member.id to M1, then gets stuck
> Another static member Y with the same instance.id joins and updates member.id 
> to M2, while starts working and commit offsets
> The group coordinator migrates, and the member.id for the same static member 
> rewinds to M1
> The static member X goes back online, and validated. It would try to fetch 
> from Y's committed offset

Why would member X fetch Y's committed offset? If it doesn't know it had been 
fenced temporarily, it might just commit its latest offsets. This does seem 
like a correctness problem to me.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-24 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164531#comment-17164531
 ] 

Sophie Blee-Goldman commented on KAFKA-10284:
-

Nope, it just spun in a loop where it would poll and then call 
Consumer#position to try and initialize some metadata. Poll just returned empty 
I guess and Consumer#position continued to throw TimeoutException. 

The TimeoutException part seems pretty weird, so maybe it was an unrelated bug 
(or some other issue like a hung socket being the primary/only guess we had). I 
just thought of it because it definitely happened immediately after a "id does 
not match expected member.id" error.

I actually still have some of the broker and client side logs from around this 
incident, if that might help. But again, I'm not really sure if it could be 
related to this or not

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-23 Thread Akshay Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164146#comment-17164146
 ] 

Akshay Sharma commented on KAFKA-10284:
---

At initial, Group Coordinator will assign member id to the new joined consumer 
i.e X with generation id 1, when the consumer take off at the same point and 
try to rejoin there be no re-balance(i.e generation id will be 1), but group 
coordinator will assign new member id to the consumer i.e Y. Now, when broker 
restart due to some reason, groupmetadatamanager will reload the metadata and 
expects that member.id to be present, he is now expecting X member.id. and 
that's why consumer is fenced.

1) Is this a expected behaviour?

2) when consumer member id changes, groupmetadatamanager not updates new 
member.id information?

3) consumer is fenced with new instance, even single consumer is running. Why?

[~bchen225242] [~guozhang]

 

 

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-23 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164063#comment-17164063
 ] 

Guozhang Wang commented on KAFKA-10284:
---

What [~ableegoldman] described seems aligned to this, BUT I thought that fenced 
instance Y should still try to rejoin the group, but in [~ableegoldman] that 
thread did not try to rejoin at all?

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-23 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163871#comment-17163871
 ] 

Boyang Chen commented on KAFKA-10284:
-

[~akshaysh] I didn't see any trace that the group coordinator gets migrated in 
the pasted ticket, so it might be a separate issue.

[~ableegoldman] Well, the symptom matches, but I don't know for sure if this is 
the same cause :)

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-23 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163796#comment-17163796
 ] 

Sophie Blee-Goldman commented on KAFKA-10284:
-

You know, I think we actually hit this too, but weren't able to recognize the 
problem at the time. A few weeks ago one of our StreamThreads/Consumers seemed 
to "take off" from the group at some point, as evidenced by the steadily 
increasing last-rebalance-seconds-ago metric (whereas the other members had 
rebalanced multiple times since then). Right before this occurred we saw that 
same error message in the logs:

 
{code:java}
ERROR given member.id X is identified as a known static member 1,but not 
matching the expected member.id Y (kafka.coordinator.group.GroupMetadata)
{code}
Unfortunately we killed the client trying to remotely debug it so we couldn't 
get any more useful information. Would you say that this was mysterious 
encounter was likely due to the bug reported here? [~guozhang] [~bchen225242]

 

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-23 Thread Akshay Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163248#comment-17163248
 ] 

Akshay Sharma commented on KAFKA-10284:
---

Hi [~bchen225242]/[~guozhang],

I've raised the bug related to the same issue. Please look at it once.

https://issues.apache.org/jira/browse/KAFKA-10285

 

Analysis,

`when i've not restarted the broker and restarted consumer, I could see below 
logs` 
```
[2020-07-16 13:56:17,189] INFO [GroupCoordinator 1001]: Preparing to rebalance 
group 0 in state PreparingRebalance with old generation 0 
(__consumer_offsets-48) (reason: Adding new member 1-159490144 with group 
instanceid Some(1)) (kafka.coordinator.group.GroupCoordinator)
[2020-07-16 13:56:17,236] INFO [GroupCoordinator 1001]: Stabilized group 0 
generation 1 (__consumer_offsets-48) (kafka.coordinator.group.GroupCoordinator)
[2020-07-16 13:56:17,282] INFO [GroupCoordinator 1001]: Assignment received 
from leader for group 0 for generation 1 
(kafka.coordinator.group.GroupCoordinator)


[2020-07-16 13:59:33,613] INFO [GroupCoordinator 1001]: Static member Some(1) 
with unknown member id rejoins, assigning new member id 1-159490335, while 
old member 1-159490144 will be removed. 
(kafka.coordinator.group.GroupCoordinator)
[2020-07-16 13:59:33,635] INFO [GroupCoordinator 1001]: Static member joins 
during Stable stage will not trigger rebalance. 
(kafka.coordinator.group.GroupCoordinator)
```
when restarted the broker, I could see broker is expecting some other 
member.id(old member.id of consumer)
```
2020-07-16 14:04:04,953] ERROR given member.id 1-159490335 is identified as 
a known static member 1,but not matching the expected member.id 1-159490144 
(kafka.coordinator.group.GroupMetadata)
```

 

 

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-17 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160225#comment-17160225
 ] 

Guozhang Wang commented on KAFKA-10284:
---

Fair enough. So it seems not a correctness breaking issue at the moment.

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-17 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160216#comment-17160216
 ] 

Boyang Chen commented on KAFKA-10284:
-

If we have a combined scenario like below:


 # A static member X joins the group and updates member.id to M1, then gets 
stuck
 # Another static member Y with the same instance.id joins and updates 
member.id to M2, while starts working and commit offsets
 # The group coordinator migrates, and the member.id for the same static member 
rewinds to M1
 # The static member X goes back online, and validated. It would try to fetch 
from Y's committed offset

In this flow, I don't think we are violating the offset committing policy here. 

The only downside I could think of is that there is only one member Y who will 
get fenced by itself after the immigration as stated in the KIP. [~guozhang]

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.
> The bug find credit goes to [~hachikuji] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-17 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160152#comment-17160152
 ] 

Guozhang Wang commented on KAFKA-10284:
---

[~bchen225242] Thanks for the find. Just trying to clarify that "bring up the 
zombie member ID" would break correctness, since if the zombie member still 
exists, it may be able to commit offsets which would rewind the position. Is 
that right?

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10284) Group membership update due to static member rejoin should be persisted

2020-07-16 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159616#comment-17159616
 ] 

Boyang Chen commented on KAFKA-10284:
-

[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L1042]

> Group membership update due to static member rejoin should be persisted
> ---
>
> Key: KAFKA-10284
> URL: https://issues.apache.org/jira/browse/KAFKA-10284
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
> Fix For: 2.6.1
>
>
> For known static members rejoin, we would update its corresponding member.id 
> without triggering a new rebalance. This serves the purpose for avoiding 
> unnecessary rebalance for static membership, as well as fencing purpose if 
> some still uses the old member.id. 
> The bug is that we don't actually persist the membership update, so if no 
> upcoming rebalance gets triggered, this new member.id information will get 
> lost during group coordinator immigration, thus bringing up the zombie member 
> identity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)