[ 
https://issues.apache.org/jira/browse/KAFKA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13794.
-------------------------------------
    Fix Version/s: 3.1.1
                   3.0.2
       Resolution: Fixed

> Producer batch lost silently in TransactionManager
> --------------------------------------------------
>
>                 Key: KAFKA-13794
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13794
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: xuexiaoyue
>            Priority: Major
>             Fix For: 3.1.1, 3.0.2
>
>
> Under the case of idempotence is enabled, when a batch reaches its 
> request.timeout.ms but not yet reaches delivery.timeout.ms, it will be 
> retried and wait for another request.timeout.ms. During the time of this 
> interval, the delivery.timeout.ms may be reached and Sender will remove this 
> in flight batch and bump the producer epoch because of the unresolved 
> sequence, then the sequence of this partition will be reset to 0.
> At this time, if a new batch is sent to the same partition and the former 
> batch reaches request.timeout.ms again, we will see an exception being thrown 
> out by NetworkClient:
> {code:java}
> [ERROR] [kafka-producer-network-thread | producer-1] 
> org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] 
> Uncaught error in request completion:
>  java.lang.IllegalStateException: We are re-enqueueing a batch which is not 
> tracked as part of the in flight requests. batch.topicPartition: 
> txn_test_1648891362900-2; batch.baseSequence: 0
>    at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.insertInSequenceOrder(RecordAccumulator.java:388)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.reenqueue(RecordAccumulator.java:334)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.reenqueueBatch(Sender.java:668)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:622)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:548)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583)
>  ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) 
> ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?]
>    at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_102] {code}
> The cause of this is the inflightBatchesBySequence in TransactionManager is 
> not being remove correctly. One batch may be removed by another batch with 
> the same sequence number.
> The potential consequence of this I can think out is that the send progress 
> will be blocked until the latter batch being expired by delivery.timeout.ms
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to