Piotr Nowojski created FLINK-8132: ------------------------------------- Summary: FlinkKafkaProducer011 can commit incorrect transaction during recovery Key: FLINK-8132 URL: https://issues.apache.org/jira/browse/FLINK-8132 Project: Flink Issue Type: Bug Components: Kafka Connector Reporter: Piotr Nowojski Assignee: Piotr Nowojski Priority: Blocker Fix For: 1.4.0
Faulty scenario with producer pool of 2. 1. started transaction 1 with producerA, written record 42 2. checkpoint 1 triggered, pre committing txn1, started txn2 with producerB, written record 43 3. checkpoint 1 completed, committing txn1, returning producerA to the pool 4. checkpoint 2 triggered , committing txn2, started txn3 with producerA, written record 44 5. crash.... 6. recover to checkpoint 1, txn1 from producerA found to "pendingCommitTransactions", attempting to recoverAndCommit(txn1) 7. unfortunately txn1 and txn3 from the same producers are identical from KafkaBroker perspective and thus txn3 is being committed result is that both records 42 and 44 are committed. Proposed solution is to postpone returning producers to the pool until we are sure that previous checkpoint (for which given producer was used) will not be used for recovery (at least one more checkpoint was completed). -- This message was sent by Atlassian JIRA (v6.4.14#64029)