Nico Kruber created FLINK-23839: ----------------------------------- Summary: Unclear severity of Kafka transaction recommit warning in logs Key: FLINK-23839 URL: https://issues.apache.org/jira/browse/FLINK-23839 Project: Flink Issue Type: Sub-task Components: Connectors / Kafka Affects Versions: 1.13.2, 1.12.5, 1.11.4 Reporter: Nico Kruber
In a transactional Kafka sink, after recovery, all transactions from the recovered checkpoint are recommitted even though they may have already been committed before because this is not part of the checkpoint. This second commit can lead to a number of WARN entries in the logs coming from [KafkaCommitter|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaCommitter.java#L66] or [FlinkKafkaProducer|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1057]. Examples: {code} WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ... Encountered error org.apache.kafka.common.errors.InvalidTxnStateException: The producer attempted a transactional operation in an invalid state. while recovering transaction KafkaTransactionState [transactionalId=..., producerId=12345, epoch=123]. Presumably this transaction has been already committed before. WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ... Encountered error org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. while recovering transaction KafkaTransactionState [transactionalId=..., producerId=12345, epoch=12345]. Presumably this transaction has been already committed before {code} It sounds to me like the second exception is useful and indicates that the transaction timeout is too short. The first exception, however, seems superfluous and rather alerts the user more than it helps. Or what would you do with it? Can we instead filter out superfluous exceptions and at least put these onto DEBUG logs instead? -- This message was sent by Atlassian Jira (v8.3.4#803005)