[ https://issues.apache.org/jira/browse/KAFKA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-13794. ------------------------------------- Fix Version/s: 3.1.1 3.0.2 Resolution: Fixed > Producer batch lost silently in TransactionManager > -------------------------------------------------- > > Key: KAFKA-13794 > URL: https://issues.apache.org/jira/browse/KAFKA-13794 > Project: Kafka > Issue Type: Bug > Reporter: xuexiaoyue > Priority: Major > Fix For: 3.1.1, 3.0.2 > > > Under the case of idempotence is enabled, when a batch reaches its > request.timeout.ms but not yet reaches delivery.timeout.ms, it will be > retried and wait for another request.timeout.ms. During the time of this > interval, the delivery.timeout.ms may be reached and Sender will remove this > in flight batch and bump the producer epoch because of the unresolved > sequence, then the sequence of this partition will be reset to 0. > At this time, if a new batch is sent to the same partition and the former > batch reaches request.timeout.ms again, we will see an exception being thrown > out by NetworkClient: > {code:java} > [ERROR] [kafka-producer-network-thread | producer-1] > org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] > Uncaught error in request completion: > java.lang.IllegalStateException: We are re-enqueueing a batch which is not > tracked as part of the in flight requests. batch.topicPartition: > txn_test_1648891362900-2; batch.baseSequence: 0 > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.insertInSequenceOrder(RecordAccumulator.java:388) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.reenqueue(RecordAccumulator.java:334) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.reenqueueBatch(Sender.java:668) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:622) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:548) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at > org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) > ~[kafka-transaction-test-1.0-SNAPSHOT.jar:?] > at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_102] {code} > The cause of this is the inflightBatchesBySequence in TransactionManager is > not being remove correctly. One batch may be removed by another batch with > the same sequence number. > The potential consequence of this I can think out is that the send progress > will be blocked until the latter batch being expired by delivery.timeout.ms > -- This message was sent by Atlassian Jira (v8.20.1#820001)