Hi,

这个看上去是client触发savepoint失败,而不是savepoint本身end-to-end执行超时。建议对照一下JobManager的日志,观察在触发的时刻,JM日志里是否有触发savepoint的相关日志,也可以在flink
 web UI上观察相应的savepoint是否出现在checkpoint tab的历史里面。

祝好
唐云
________________________________
From: 仙剑……情动人间 <[email protected]>
Sent: Tuesday, July 13, 2021 17:31
To: flink邮件列表 <[email protected]>
Subject: flink 触发保存点失败

Hi All,


&nbsp; &nbsp; 我触发Flink 
保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
保存点时在2min之前就会报错,另外我的 状态 并不大
&nbsp; &nbsp;


------------------------------------------------------------
&nbsp;The program finished with the following exception:


org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
00000000000000000000000000000000 failed.
        at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
        at 
org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
        at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
        at 
org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
        at 
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
        at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.TimeoutException
        at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
        at 
org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
        at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

回复