I had the same issue and enabling checkpoint seems to solve the problem. Can
you please explain how does enabling checkpoint fixes the issue. Thanks!
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Thanks for the reply, Nico.
I've been testing with OffsetCommitMode.ON_CHECKPOINTS, and I can confirm
that this fixes the issue -- even if a single commit time out when
communicating with Kafka, subsequent offset commits are still successful.
--
Sent from:
Hi Edward,
looking through the Kafka code, I do see a path where they deliberately
do not want recursive retries, i.e. if the coordinator is unknown. It
seems like you are getting into this scenario.
I'm no expert on Kafka and therefore I'm not sure on the implications or
ways to circumvent/fix
We have noticed that the Kafka offset auto-commit functionality seems to stop
working after it encounters a timeout. It appears in the logs like this:
2018-03-04 07:02:54,779 INFO
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking
the coordinator kafka06:9092 (id: