Luke, I would argue that https://issues.apache.org/jira/browse/KAFKA-13636 is a critical defect as it can have a very serious impact.
We run on AWS MSK which supports these versions: https://docs.aws.amazon.com/msk/latest/developerguide/supported-kafka-versions.html. We are currently on 2.7.2. I note that MSK does not support any 3.x (maybe they're not ready for the Zookeeper removal). So I suspect we're going to need a 2.x if MSK is going to adopt it any time soon. I'd be happier with a 2.7.3 incorporating KAFKA-13636 in order to minimise the risk of introducing other issues, or the 2.8.2 if that's not possible. What can we do to make this happen ASAP? Regards, James. On 29/04/2022, at 14:50, Luke Chen <show...@gmail.com<mailto:show...@gmail.com>> wrote: Hi James, So far, v2.8.2 is not planned, yet. And usually, the patch release only has one, that is, v2.8.0, and v2.8.1. But there are of course some exceptions that some releases have 2 or 3 patch releases. For KAFKA-13658, you can check KAFKA-13658<https://issues.apache.org/jira/browse/KAFKA-13658>, which is included in v3.0.1, v3.1.1, and v3.2.0. So far, the v3.0.1 is released, and v3.1.1 and v3.2.0 will be coming soon. Thank you. Luke On Fri, Apr 29, 2022 at 8:53 AM James Olsen <ja...@inaseq.com<mailto:ja...@inaseq.com>> wrote: Luke, Do you know if 2.8.2 will be released anytime soon? It appears to be waiting on https://issues.apache.org/jira/browse/KAFKA-13805 for which fixes are available. Regards, James. On 11/04/2022, at 14:22, Luke Chen <show...@gmail.com<mailto:show...@gmail.com>> wrote: Hi James, This looks like this known issue KAFKA-13636 <https://issues.apache.org/jira/browse/KAFKA-13636>, which should be fixed in the newer version. Thank you. Luke On Mon, Apr 11, 2022 at 9:18 AM James Olsen <ja...@inaseq.com<mailto:ja...@inaseq.com>> wrote: I recently observed the following series of events for a particular partition (MyTopic-6): 2022-03-18 03:18:28,562 INFO [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the committed offset FetchPosition{offset=438, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>< http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>-east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/> (id: 2 rack: use1-az4)], epoch=64}} -- RESTART (bring up new consumer node) 2022-04-01 15:17:47,943 INFO [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 'executor-thread-6' [Consumer clientId=consumer-MyTopicService-group-7, groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the committed offset FetchPosition{offset=449, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>< http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>-east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/> (id: 2 rack: use1-az4)], epoch=64}} -- REBALANCE (drop old consumer node) 2022-04-01 15:18:24,414 INFO [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, groupId=MyTopicService-group] Found no committed offset for partition MyTopic-6 2022-04-01 15:18:24,474 INFO [org.apache.kafka.clients.consumer.internals.SubscriptionState] 'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, groupId=MyTopicService-group] Resetting offset for partition MyTopic-6 to position FetchPosition{offset=411, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>< http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>-east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/> (id: 2 rack: use1-az4)], epoch=64}}. Seems odd that no offsets were found at 2022-04-01 15:18:24,414 when they were clearly present 36 seconds earlier at 2022-04-01 15:17:47,943. This resulted in message replay from offset 411-449. This was in a test system only and we have duplicate detection in place but I'd still like to avoid similar occurrences in production if we can. There has clearly been a low volume of traffic but there have been active consumers all the time. We have log.retention.ms<http://log.retention.ms/><http://log.retention.ms<http://log.retention.ms/>>=1814400000 (3 weeks) which I believe explains why it resumed from 411 as messages prior to that will have been deleted. There may not have been any new traffic in the last 7 days (we have the default offset retention) so I'm wondering if there is a chance the offsets were deleted during the rebalance when I presume there's a brief moment when there is no active consumer. My understanding is that they shouldn't be deleted until there has been no consumer for 7 days ( https://kafka.apache.org/27/documentation.html#brokerconfigs_offsets.retention.minutes - not using static assignment). Is it possible the logic is actually checking for no consumer now and no offsets for 7 days instead? Server and Client are 2.7.2. Sorry I don't have any more detailed server-side logs. Regards, James.