hi,all 最近用flink写入kafka,发现checkpoint失败特别多,基本50%的都失败了。 checkpoint时间间隔的30~60s 之间,没有大状态,基本就是维护offset的状态 希望能帮我看看 是什么原因导致的,能否降低一下 checkpoint的失败率
主要报错如下所示: Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. Caused by: org.apache.flink.util.FlinkRuntimeException: Committing one of transactions failed, logging first encountered failure kafka日志如下: org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch) org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch) org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch) org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch) hdfs暂时没发现异常
