Hi,看起来这个报错是用于输出信息的文件找不到了,可以尝试加一下这个配置再试一下“taskmanager.log.path”,找一下导致tasks超时的根本原因。 还可以试一下用火焰图或jstack查看一下那几个tasks超时的时候是卡在哪个方法上。
-- Best! Xuyang Hi,看起来这个报错是用于输出信息的文件找不到了,可以尝试加一下这个配置再试一下“taskmanager.log.path”,找一下导致tasks超时的根本原因。<br/>还可以试一下用火焰图或jstack查看一下那几个tasks超时的时候是卡在哪个方法上。 在 2022-08-29 16:19:15,"casel.chen" <casel_c...@126.com> 写道: >有一个线上flink作业在人为主动创建保存点时失败,作业有两个算子:从kafka读取数据和写到mongodb,都是48个并行度,出错后查看到写mongodb算子一共48个task,完成了45个,还有3个tasks超时(超时时长设为3分钟),正常情况下完成一次checkpoint要4秒,状态大小只有23.7kb。出错后,查看作业日志如下。在创建保存点失败后作业周期性的检查点生成也都失败了(每个算子各有3个tasks超时)。使用的是FileStateBackend,DFS用的是阿里云oss。请问出错会是因为什么原因造成的? > > >+5 >[2022-08-29 15:38:32] >content: >2022-08-29 15:38:32,617 ERROR >org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler >[] - Failed to transfer file from TaskExecutor >sqrc-session-prod-taskmanager-1-30. >+6 >[2022-08-29 15:38:32] >content: >java.util.concurrent.CompletionException: >org.apache.flink.util.FlinkException: The file STDOUT does not exist on the >TaskExecutor. >+7 >[2022-08-29 15:38:32] >content: >at >org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] >+8 >[2022-08-29 15:38:32] >content: >at >java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > ~[?:1.8.0_312] >+9 >[2022-08-29 15:38:32] >content: >at >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_312] >+10 >[2022-08-29 15:38:32] >content: >at >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_312] >+11 >[2022-08-29 15:38:32] >content: >at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_312] >+12 >[2022-08-29 15:38:32] >content: >Caused by: org.apache.flink.util.FlinkException: The file STDOUT does not >exist on the TaskExecutor. >+13 >[2022-08-29 15:38:32] >content: >... 5 more >+14 >[2022-08-29 15:38:32] >content: >2022-08-29 15:38:32,617 ERROR >org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler >[] - Unhandled exception. >+15 >[2022-08-29 15:38:32] >content: >org.apache.flink.util.FlinkException: The file STDOUT does not exist on the >TaskExecutor. >+16 >[2022-08-29 15:38:32] >content: >at >org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$24(TaskExecutor.java:2064) > ~[flink-dist_2.12-1.13.2.jar:1.13.2] >+17 >[2022-08-29 15:38:32] >content: >at >java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > ~[?:1.8.0_312] >+18 >[2022-08-29 15:38:32] >content: >at >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_312] >+19 >[2022-08-29 15:38:32] >content: >at >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_312] >+20 >[2022-08-29 15:38:32] >content: >at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_312]