能提供你的 ha 配置吗?特别是 high-availability.storageDir,我怀疑是不是没有配置这个啊 Best, tison.
Han Xiao <[email protected]> 于2019年3月25日周一 下午7:26写道: > 各位朋友大家好,我是flink初学者,部署flink ha的过程中出现一些问题,麻烦大家帮忙看下; > 启动flink ha后,jobmanager进程直接hang,使用的flink 1.7.2版本,下面log中有一处出现此错误 File does > not exist: /flink/ha/zookeeper/submittedJobGraphb05001535f91 > ,让我不解的是我的checkpoint目录以及ha目录并不是这个,为什么会到这个目录去找,我所配置的目录下没有生成JobGraph ,他会一直去检索 > /a5ffe00b0bc5688d9a7de5c62b8150e6 > 这个作业图而且找不到,我删除了所有相关的配置路径之后重新搭建,启动时还是会去检索,我该怎样避免flink去检索这个JobGraph > ,让我的ha群集健康的运行起来。 > > > 报错日志: > 2019-03-25 18:55:00,742 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error > occurred in the cluster entrypoint. > java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could > not retrieve submitted JobGraph from state handle under > /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved state > handle is broken. Try cleaning the state handle store. > at > org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:74) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > ....... > Caused by: org.apache.flink.util.FlinkException: Could not retrieve > submitted JobGraph from state handle under > /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved state > handle is broken. Try cleaning the state handle store. > at > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208) > at > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696) > at > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681) > ........ > Caused by: java.io.FileNotFoundException: File does not exist: > /flink/ha/zookeeper/submittedJobGraphb05001535f91 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070) > ....... > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): > File does not exist: /flink/ha/zookeeper/submittedJobGraphb05001535f91 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070) > ....... > > 谢谢! >
