Anna Povzner created KAFKA-6975:
-----------------------------------

             Summary: AdminClient.deleteRecords() may cause replicas unable to 
fetch from beginning
                 Key: KAFKA-6975
                 URL: https://issues.apache.org/jira/browse/KAFKA-6975
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Anna Povzner
            Assignee: Anna Povzner


AdminClient.deleteRecords(beforeOffset(offset)) will set log start offset to 
the requested offset. If the requested offset is in the middle of the batch, 
the replica will not be able to fetch from that offset (because it is in the 
middle of the batch). 

One use-case where this could cause problems is replica re-assignment. Suppose 
we have a topic partition with 3 initial replicas, and at some point the user 
issues  AdminClient.deleteRecords() for the offset that falls in the middle of 
the batch. It now becomes log start offset for this topic partition. Suppose at 
some later time, the user starts partition re-assignment to 3 new replicas. The 
new replicas (followers) will start with HW = 0, will try to fetch from 0, then 
get "out of order offset" because 0 < log start offset (LSO); the follower will 
be able to reset offset to LSO of the leader and fetch LSO; the leader will 
send a batch in response with base offset <LSO, this will cause "out of order 
offset" on the follower which will stop the fetcher thread. The end result is 
that the new replicas will not be able to start fetching unless LSO moves to an 
offset that is not in the middle of the batch, and the re-assignment will be 
stuck for a possibly a very log time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to