hi,这个报错看着是一个可以重试的异常,不过 Flink 里并没有对这个异常支持相应的重试逻辑 [1]/[2],只是打印了异常及记录相应的 metrics,你的作业已经开启了 cp,这个 WARN 日志实际上没有影响,社区之前也有过关于这个问题讨论[3]/[4],如果这个错误是因为 kafka broker 重启导致的,可以尝试参考 [4] 升级 kafka 版本试一下。
1. https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L249 2. https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaSourceReader.java#L149 3. https://issues.apache.org/jira/browse/FLINK-25293 4. https://issues.apache.org/jira/browse/FLINK-28060 -- Best, Matt Wang ---- Replied Message ---- | From | zhan...@eastcom-sw.com<zhan...@eastcom-sw.com> | | Date | 05/6/2023 09:19 | | To | user-zh<user-zh@flink.apache.org> | | Subject | Re: Re: checkpoint Kafka Offset commit failed | hi, 感谢解答~ flink 集群跟kafka集群都在同个网段,检查过网络情况是正常的 在flink1.14中,隔几天出现一次 Time should be non negative 异常,自动重启任务后 也是可以正常自动提交偏移量 java.lang.IllegalArgumentException: Time should be non negative at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138) at org.apache.flink.runtime.throughput.ThroughputEMA.calculateThroughput(ThroughputEMA.java:44) at org.apache.flink.runtime.throughput.ThroughputCalculator.calculateThroughput(ThroughputCalculator.java:80) at org.apache.flink.streaming.runtime.tasks.StreamTask.debloat(StreamTask.java:792) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$4(StreamTask.java:784) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) at java.lang.Thread.run(Thread.java:748) From: Shammon FY Date: 2023-05-05 09:48 To: user-zh Subject: Re: checkpoint Kafka Offset commit failed Hi 看起来像是网络问题导致flink作业source节点连接kafka失败,可以检查一下kafka集群的网络或者flink作业source节点的网络是否有问题 Best, Shammon FY On Fri, May 5, 2023 at 9:41 AM Leonard Xu <xbjt...@gmail.com> wrote: 可以发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 取消订阅来自 user-zh@flink.apache.org 邮件列表的邮件,邮件列表的订阅管理,可以参考[1] 祝好, Leonard [1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8 2023年5月4日 下午9:00,wuzhongxiu <go574...@163.com> 写道: 退订 | | go574...@163.com | | 邮箱:go574...@163.com | ---- 回复的原邮件 ---- | 发件人 | zhan...@eastcom-sw.com | | 日期 | 2023年05月04日 14:54 | | 收件人 | user-zh<user-zh@flink.apache.org> | | 抄送至 | | | 主题 | checkpoint Kafka Offset commit failed | hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is not available 查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗? flink 日志如下: 2023-05-04 11:31:02,636 WARN org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to commit consumer offsets for checkpoint 69153 org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available.