Piotr Nowojski created FLINK-8132:
-------------------------------------

             Summary: FlinkKafkaProducer011 can commit incorrect transaction 
during recovery
                 Key: FLINK-8132
                 URL: https://issues.apache.org/jira/browse/FLINK-8132
             Project: Flink
          Issue Type: Bug
          Components: Kafka Connector
            Reporter: Piotr Nowojski
            Assignee: Piotr Nowojski
            Priority: Blocker
             Fix For: 1.4.0


Faulty scenario with producer pool of 2.

1. started transaction 1 with producerA, written record 42
2. checkpoint 1 triggered, pre committing txn1, started txn2 with producerB, 
written record 43
3. checkpoint 1 completed, committing txn1, returning producerA to the pool
4. checkpoint 2 triggered , committing txn2, started txn3 with producerA, 
written record 44
5. crash....
6. recover to checkpoint 1, txn1 from producerA found to 
"pendingCommitTransactions", attempting to recoverAndCommit(txn1)
7. unfortunately txn1 and txn3 from the same producers are identical from 
KafkaBroker perspective and thus txn3 is being committed

result is that both records 42 and 44 are committed. 

Proposed solution is to postpone returning producers to the pool until we are 
sure that previous checkpoint (for which given producer was used) will not be 
used for recovery (at least one more checkpoint was completed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to