Justine Olshan created KAFKA-15984:
--------------------------------------

             Summary: Client disconnections can cause hanging transactions on 
__consumer_offsets
                 Key: KAFKA-15984
                 URL: https://issues.apache.org/jira/browse/KAFKA-15984
             Project: Kafka
          Issue Type: Task
            Reporter: Justine Olshan


When investigating frequent hanging transactions on __consumer_offsets 
partitions, we realized that many of them were cause by the same offset being 
committed with duplicates and one with `"isDisconnectedClient":true`. 

TxnOffsetCommits do not have sequence numbers and thus are not protected 
against duplicates in the same way idempotent produce requests are. Thus, when 
a client is disconnected (and flushes its requests), we may see the duplicate 
get appended to the log. 

KIP-890 part 1 should protect against this as the duplicate will not succeed 
verification. KIP-890 part 2 strengthens this further as duplicates (from 
previous transactions) can not be added to new transactions if the partitions 
is re-added since the epoch will be bumped. 

Another possible solution is to do duplicate checking on the group coordinator 
side when the request comes in. This solution could be used instead of KIP-890 
part 1 to prevent hanging transactions but given that part 1 only has one open 
PR remaining, we may not need to do this. However, this can also prevent 
duplicates from being added to a new transaction – something only part 2 will 
protect against.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to