hi,这个报错看着是一个可以重试的异常,不过 Flink 里并没有对这个异常支持相应的重试逻辑 [1]/[2],只是打印了异常及记录相应的 
metrics,你的作业已经开启了 cp,这个 WARN 日志实际上没有影响,社区之前也有过关于这个问题讨论[3]/[4],如果这个错误是因为 kafka 
broker 重启导致的,可以尝试参考 [4] 升级 kafka 版本试一下。


1. 
https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L249
2. 
https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaSourceReader.java#L149
3. https://issues.apache.org/jira/browse/FLINK-25293
4. https://issues.apache.org/jira/browse/FLINK-28060






--

Best,
Matt Wang


---- Replied Message ----
| From | zhan...@eastcom-sw.com<zhan...@eastcom-sw.com> |
| Date | 05/6/2023 09:19 |
| To | user-zh<user-zh@flink.apache.org> |
| Subject | Re: Re: checkpoint Kafka Offset commit failed |
hi, 感谢解答~

flink 集群跟kafka集群都在同个网段,检查过网络情况是正常的
在flink1.14中,隔几天出现一次 Time should be non negative 异常,自动重启任务后 也是可以正常自动提交偏移量

java.lang.IllegalArgumentException: Time should be non negative
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
at 
org.apache.flink.runtime.throughput.ThroughputEMA.calculateThroughput(ThroughputEMA.java:44)
at 
org.apache.flink.runtime.throughput.ThroughputCalculator.calculateThroughput(ThroughputCalculator.java:80)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.debloat(StreamTask.java:792)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$4(StreamTask.java:784)
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)

From: Shammon FY
Date: 2023-05-05 09:48
To: user-zh
Subject: Re: checkpoint Kafka Offset commit failed
Hi

看起来像是网络问题导致flink作业source节点连接kafka失败,可以检查一下kafka集群的网络或者flink作业source节点的网络是否有问题

Best,
Shammon FY

On Fri, May 5, 2023 at 9:41 AM Leonard Xu <xbjt...@gmail.com> wrote:

可以发送任意内容的邮件到  user-zh-unsubscr...@flink.apache.org   取消订阅来自
user-zh@flink.apache.org  邮件列表的邮件,邮件列表的订阅管理,可以参考[1]

祝好,
Leonard
[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8

2023年5月4日 下午9:00,wuzhongxiu <go574...@163.com> 写道:

退订



| |
go574...@163.com
|
|
邮箱:go574...@163.com
|




---- 回复的原邮件 ----
| 发件人 | zhan...@eastcom-sw.com |
| 日期 | 2023年05月04日 14:54 |
| 收件人 | user-zh<user-zh@flink.apache.org> |
| 抄送至 | |
| 主题 | checkpoint Kafka Offset commit failed |
hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is
not available

查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink
job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?

flink 日志如下:
2023-05-04 11:31:02,636 WARN
org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] -
Failed to commit consumer offsets for checkpoint 69153
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset
commit failed with a retriable exception. You should retry committing the
latest consumed offsets.
Caused by:
org.apache.kafka.common.errors.CoordinatorNotAvailableException: The
coordinator is not available.


回复