Re: Unexpected loss of Offsets

James Olsen Thu, 28 Apr 2022 20:08:35 -0700

Luke,

I would argue that https://issues.apache.org/jira/browse/KAFKA-13636 is a 
critical defect as it can have a very serious impact.


We run on AWS MSK which supports these versions: 
https://docs.aws.amazon.com/msk/latest/developerguide/supported-kafka-versions.html.
  We are currently on 2.7.2.

I note that MSK does not support any 3.x (maybe they're not ready for the 
Zookeeper removal).  So I suspect we're going to need a 2.x if MSK is going to 
adopt it any time soon.  I'd be happier with a 2.7.3 incorporating KAFKA-13636 
in order to minimise the risk of introducing other issues, or the 2.8.2 if 
that's not possible.

What can we do to make this happen ASAP?

Regards, James.

On 29/04/2022, at 14:50, Luke Chen 
<show...@gmail.com<mailto:show...@gmail.com>> wrote:

Hi James,

So far, v2.8.2 is not planned, yet. And usually, the patch release only has 
one, that is, v2.8.0, and v2.8.1.
But there are of course some exceptions that some releases have 2 or 3 patch 
releases.

For KAFKA-13658, you can check 
KAFKA-13658<https://issues.apache.org/jira/browse/KAFKA-13658>, which is 
included in v3.0.1, v3.1.1, and v3.2.0.
So far, the v3.0.1 is released, and v3.1.1 and v3.2.0 will be coming soon.

Thank you.
Luke

On Fri, Apr 29, 2022 at 8:53 AM James Olsen 
<ja...@inaseq.com<mailto:ja...@inaseq.com>> wrote:
Luke,

Do you know if 2.8.2 will be released anytime soon?  It appears to be waiting 
on https://issues.apache.org/jira/browse/KAFKA-13805 for which fixes are 
available.

Regards, James.

On 11/04/2022, at 14:22, Luke Chen 
<show...@gmail.com<mailto:show...@gmail.com>> wrote:

Hi James,

This looks like this known issue KAFKA-13636
<https://issues.apache.org/jira/browse/KAFKA-13636>, which should be fixed
in the newer version.

Thank you.
Luke

On Mon, Apr 11, 2022 at 9:18 AM James Olsen 
<ja...@inaseq.com<mailto:ja...@inaseq.com>> wrote:

I recently observed the following series of events for a particular
partition (MyTopic-6):

2022-03-18 03:18:28,562 INFO
[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator]
'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3,
groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the
committed offset FetchPosition{offset=438, offsetEpoch=Optional.empty,
currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/><
http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>-east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/>
 (id: 2 rack:
use1-az4)], epoch=64}}

-- RESTART (bring up new consumer node)

2022-04-01 15:17:47,943 INFO
[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator]
'executor-thread-6' [Consumer clientId=consumer-MyTopicService-group-7,
groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the
committed offset FetchPosition{offset=449, offsetEpoch=Optional.empty,
currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/><
http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>-east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/>
 (id: 2 rack:
use1-az4)], epoch=64}}

-- REBALANCE (drop old consumer node)

2022-04-01 15:18:24,414 INFO
[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator]
'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3,
groupId=MyTopicService-group] Found no committed offset for partition
MyTopic-6
2022-04-01 15:18:24,474 INFO
[org.apache.kafka.clients.consumer.internals.SubscriptionState]
'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3,
groupId=MyTopicService-group] Resetting offset for partition MyTopic-6 to
position FetchPosition{offset=411, offsetEpoch=Optional.empty,
currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/><
http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>-east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/>
 (id: 2 rack:
use1-az4)], epoch=64}}.

Seems odd that no offsets were found at 2022-04-01 15:18:24,414 when they
were clearly present 36 seconds earlier at 2022-04-01 15:17:47,943.

This resulted in message replay from offset 411-449.  This was in a test
system only and we have duplicate detection in place but I'd still like to
avoid similar occurrences in production if we can.

There has clearly been a low volume of traffic but there have been active
consumers all the time.  We have 
log.retention.ms<http://log.retention.ms/><http://log.retention.ms<http://log.retention.ms/>>=1814400000
(3 weeks) which I believe explains why it resumed from 411 as messages
prior to that will have been deleted.

There may not have been any new traffic in the last 7 days (we have the
default offset retention) so I'm wondering if there is a chance the offsets
were deleted during the rebalance when I presume there's a brief moment
when there is no active consumer.  My understanding is that they shouldn't
be deleted until there has been no consumer for 7 days (
https://kafka.apache.org/27/documentation.html#brokerconfigs_offsets.retention.minutes
- not using static assignment).  Is it possible the logic is actually
checking for no consumer now and no offsets for 7 days instead?

Server and Client are 2.7.2.  Sorry I don't have any more detailed
server-side logs.

Regards, James.

Re: Unexpected loss of Offsets

Reply via email to