Hi Till , 1. the files are not longer exist in HDFS. 2. yes , stop and start the cluster from the bin commands. 3. unfortunately i deleted the log.. :(
I wondered if this code could cause this issue , the way in using checkpoint StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints"); env.setStateBackend(sb); env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE); env.getCheckpointConfig().setCheckpointInterval(60000); On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Miki, > > could you check whether the files are really no longer stored on HDFS? How > did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I just > tried it locally and it could recover the job after calling > `bin/start-cluster.sh` again. > > What would be helpful are the logs from the initial run of the job. So if > you can reproduce the problem, then this log would be very helpful. > > Cheers, > Till > > On Thu, May 31, 2018 at 6:14 PM, miki haiat <miko5...@gmail.com> wrote: > >> Hi, >> >> Im having some wierd issue with the JM recovery , >> I using HDFS and ZOOKEEPER for HA stand alone cluster . >> >> Iv stop the cluster change some parameters in the flink conf (Memory). >> But now when i start the cluster again im having an error that preventing >> from JM to start. >> somehow the checkpoint file doesn't exists in HDOOP and JM wont start . >> >> full log JM log file >> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c> >> >> >>> 2018-05-31 11:57:05,568 ERROR >>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error >>> occurred in the cluster entrypoint. >> >> Caused by: java.lang.Exception: Cannot set up the user code libraries: >> File does not exist: >> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca >> at >> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72) >> >> >> >> >