Anna Povzner created KAFKA-7415:
-----------------------------------

             Summary: OffsetsForLeaderEpoch may incorrectly respond with 
undefined epoch causing truncation to HW
                 Key: KAFKA-7415
                 URL: https://issues.apache.org/jira/browse/KAFKA-7415
             Project: Kafka
          Issue Type: Bug
          Components: replication
    Affects Versions: 2.0.0
            Reporter: Anna Povzner


If the follower's last appended epoch is ahead of the leader's last appended 
epoch, the OffsetsForLeaderEpoch response will incorrectly send 
(UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET), and the follower will truncate to 
HW. This may lead to data loss in some rare cases where 2 back-to-back leader 
elections happen (failure of one leader, followed by quick re-election of the 
next leader due to preferred leader election, so that all replicas are still in 
the ISR, and then failure of the 3rd leader).

The bug is in LeaderEpochFileCache.endOffsetFor(), which returns 
(UNDEFINED_EPOCH, UNDEFINED_EPOCH_OFFSET) if the requested leader epoch is 
ahead of the last leader epoch in the cache. The method should return (last 
leader epoch in the cache, LEO) in this scenario.

 

Here is an example of a scenario where the issue leads to the data loss.

Suppose we have three replicas: r1, r2, and r3. Initially, the ISR consists of 
(r1, r2, r3) and the leader is r1. The data up to offset 10 has been committed 
to the ISR. Here is the initial state:
{code:java}
Leader: r1
leader epoch: 0
ISR(r1, r2, r3)
r1: [hw=10, leo=10]
r2: [hw=8, leo=10]
r3: [hw=5, leo=10]
{code}
Replica 1 fails and leaves the ISR, which makes Replica 2 the new leader with 
leader epoch = 1. The leader appends a batch, but it is not replicated yet to 
the followers.
{code:java}
Leader: r2
leader epoch: 1
ISR(r2, r3)
r1: [hw=10, leo=10]
r2: [hw=8, leo=11]
r3: [hw=5, leo=10]
{code}
Replica 3 is elected a leader (due to preferred leader election) before it has 
a chance to truncate, with leader epoch 2. 
{code:java}
Leader: r3
leader epoch: 2
ISR(r2, r3)
r1: [hw=10, leo=10]
r2: [hw=8, leo=11]
r3: [hw=5, leo=10]
{code}
Replica 2 sends OffsetsForLeaderEpoch(leader epoch = 1) to Replica 3. Replica 3 
incorrectly replies with UNDEFINED_EPOCH_OFFSET, and Replica 2 truncates to HW. 
If Replica 3 fails before Replica 2 re-fetches the data, this may lead to data 
loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to