[jira] [Commented] (KAFKA-12920) Consumer's cooperative sticky assignor need to clear generation / assignment data upon `onPartitionsLost`
[ https://issues.apache.org/jira/browse/KAFKA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370370#comment-17370370 ] Guozhang Wang commented on KAFKA-12920: --- Adding the other created root cause issue ticket here. > Consumer's cooperative sticky assignor need to clear generation / assignment > data upon `onPartitionsLost` > - > > Key: KAFKA-12920 > URL: https://issues.apache.org/jira/browse/KAFKA-12920 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: bug, consumer > > Consumer's cooperative-sticky assignor does not track the owned partitions > inside the assignor --- i.e. when it reset its state in event of > ``onPartitionsLost``, the ``memberAssignment`` and ``generation`` inside the > assignor would not be cleared. This would cause a member to join with empty > generation on the protocol while with non-empty user-data encoding the old > assignment still (and hence pass the validation check on broker side during > JoinGroup), and eventually cause a single partition to be assigned to > multiple consumers within a generation. > We should let the assignor to also clear its assignment/generation when > ``onPartitionsLost`` is triggered in order to avoid this scenario. > Note that 1) for the regular sticky assignor the generation would still have > an older value, and this would cause the previously owned partitions to be > discarded during the assignment, and 2) for Streams' sticky assignor, it’s > encoding would indeed be cleared along with ``onPartitionsLost``. Hence only > Consumer's cooperative-sticky assignor have this issue to solve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12920) Consumer's cooperative sticky assignor need to clear generation / assignment data upon `onPartitionsLost`
[ https://issues.apache.org/jira/browse/KAFKA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360480#comment-17360480 ] A. Sophie Blee-Goldman commented on KAFKA-12920: I believe we've found the real issue, and are just discussing what the correct behavior here should be. I'm going to close this ticket as 'Not a Bug' and file a separate issue with the problems we've found in the Consumer/AbstractCoordinator that have resulted in [this|https://issues.apache.org/jira/browse/KAFKA-12896] odd behavior we observed with the cooperative-sticky assignor > Consumer's cooperative sticky assignor need to clear generation / assignment > data upon `onPartitionsLost` > - > > Key: KAFKA-12920 > URL: https://issues.apache.org/jira/browse/KAFKA-12920 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: bug, consumer > > Consumer's cooperative-sticky assignor does not track the owned partitions > inside the assignor --- i.e. when it reset its state in event of > ``onPartitionsLost``, the ``memberAssignment`` and ``generation`` inside the > assignor would not be cleared. This would cause a member to join with empty > generation on the protocol while with non-empty user-data encoding the old > assignment still (and hence pass the validation check on broker side during > JoinGroup), and eventually cause a single partition to be assigned to > multiple consumers within a generation. > We should let the assignor to also clear its assignment/generation when > ``onPartitionsLost`` is triggered in order to avoid this scenario. > Note that 1) for the regular sticky assignor the generation would still have > an older value, and this would cause the previously owned partitions to be > discarded during the assignment, and 2) for Streams' sticky assignor, it’s > encoding would indeed be cleared along with ``onPartitionsLost``. Hence only > Consumer's cooperative-sticky assignor have this issue to solve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12920) Consumer's cooperative sticky assignor need to clear generation / assignment data upon `onPartitionsLost`
[ https://issues.apache.org/jira/browse/KAFKA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360328#comment-17360328 ] A. Sophie Blee-Goldman commented on KAFKA-12920: While there does seem to be a possible issue with the cooperative-sticky assignor, I don't believe we've found it yet. It's not expected that the cooperative-sticky assignor would have cleared the `memberAssignment` and `generation` since only the plain sticky assignor uses that stored info. The cooperative-sticky assignor gets the partitions for the previous assignment not from the `memberAssignment` but instead directly from the SubscriptionState. And this _should_ be cleared anytime `onPartitionsLost` is invoked. However we can keep this ticket open for now to track the investigation into the cooperative-sticky assignor. For some context, we've seen a report of continuous rebalancing with JoinGroup requests that seem to encode the same partition in the previous assignment of two consumers. It's not clear how this situation arose, but once we have these initial conditions to the cooperative-sticky assignor, it will detect an issue and throw an IllegalStateException. At the very least we should definitely improve the assignor to check for this condition and handle it by invalidating those previous assignments, rather than just throwing an exception repeatedly. But what's still unclear is how we got these conditions to begin with – it doesn't seem possible for the assignor to have produced an assignment with double partition ownership, as it would have thrown this IllegalStateException. That's what still needs to be investigated, and what this report was hoping to have uncovered > Consumer's cooperative sticky assignor need to clear generation / assignment > data upon `onPartitionsLost` > - > > Key: KAFKA-12920 > URL: https://issues.apache.org/jira/browse/KAFKA-12920 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: bug, consumer > > Consumer's cooperative-sticky assignor does not track the owned partitions > inside the assignor --- i.e. when it reset its state in event of > ``onPartitionsLost``, the ``memberAssignment`` and ``generation`` inside the > assignor would not be cleared. This would cause a member to join with empty > generation on the protocol while with non-empty user-data encoding the old > assignment still (and hence pass the validation check on broker side during > JoinGroup), and eventually cause a single partition to be assigned to > multiple consumers within a generation. > We should let the assignor to also clear its assignment/generation when > ``onPartitionsLost`` is triggered in order to avoid this scenario. > Note that 1) for the regular sticky assignor the generation would still have > an older value, and this would cause the previously owned partitions to be > discarded during the assignment, and 2) for Streams' sticky assignor, it’s > encoding would indeed be cleared along with ``onPartitionsLost``. Hence only > Consumer's cooperative-sticky assignor have this issue to solve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12920) Consumer's cooperative sticky assignor need to clear generation / assignment data upon `onPartitionsLost`
[ https://issues.apache.org/jira/browse/KAFKA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359812#comment-17359812 ] Bruno Cadonna commented on KAFKA-12920: --- Is this regression? Should this be a blocker for 3.0? > Consumer's cooperative sticky assignor need to clear generation / assignment > data upon `onPartitionsLost` > - > > Key: KAFKA-12920 > URL: https://issues.apache.org/jira/browse/KAFKA-12920 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Priority: Major > Labels: bug, consumer > > Consumer's cooperative-sticky assignor does not track the owned partitions > inside the assignor --- i.e. when it reset its state in event of > ``onPartitionsLost``, the ``memberAssignment`` and ``generation`` inside the > assignor would not be cleared. This would cause a member to join with empty > generation on the protocol while with non-empty user-data encoding the old > assignment still (and hence pass the validation check on broker side during > JoinGroup), and eventually cause a single partition to be assigned to > multiple consumers within a generation. > We should let the assignor to also clear its assignment/generation when > ``onPartitionsLost`` is triggered in order to avoid this scenario. > Note that 1) for the regular sticky assignor the generation would still have > an older value, and this would cause the previously owned partitions to be > discarded during the assignment, and 2) for Streams' sticky assignor, it’s > encoding would indeed be cleared along with ``onPartitionsLost``. Hence only > Consumer's cooperative-sticky assignor have this issue to solve. -- This message was sent by Atlassian Jira (v8.3.4#803005)