Nico Kruber created FLINK-23839:
-----------------------------------

             Summary: Unclear severity of Kafka transaction recommit warning in 
logs
                 Key: FLINK-23839
                 URL: https://issues.apache.org/jira/browse/FLINK-23839
             Project: Flink
          Issue Type: Sub-task
          Components: Connectors / Kafka
    Affects Versions: 1.13.2, 1.12.5, 1.11.4
            Reporter: Nico Kruber


In a transactional Kafka sink, after recovery, all transactions from the 
recovered checkpoint are recommitted even though they may have already been 
committed before because this is not part of the checkpoint.

This second commit can lead to a number of WARN entries in the logs coming from 
[KafkaCommitter|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaCommitter.java#L66]
 or 
[FlinkKafkaProducer|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1057].

Examples:
{code}
WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ... 
Encountered error org.apache.kafka.common.errors.InvalidTxnStateException: The 
producer attempted a transactional operation in an invalid state. while 
recovering transaction KafkaTransactionState [transactionalId=..., 
producerId=12345, epoch=123]. Presumably this transaction has been already 
committed before.

WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ... 
Encountered error org.apache.kafka.common.errors.ProducerFencedException: 
Producer attempted an operation with an old epoch. Either there is a newer 
producer with the same transactionalId, or the producer's transaction has been 
expired by the broker. while recovering transaction KafkaTransactionState 
[transactionalId=..., producerId=12345, epoch=12345]. Presumably this 
transaction has been already committed before
{code}

It sounds to me like the second exception is useful and indicates that the 
transaction timeout is too short. The first exception, however, seems 
superfluous and rather alerts the user more than it helps. Or what would you do 
with it?


Can we instead filter out superfluous exceptions and at least put these onto 
DEBUG logs instead?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to