HI Till, Iv`e managed to do reproduce it. Full log faild_jm.log <https://gist.githubusercontent.com/miko-code/e634164404354c4c590be84292fd8cb2/raw/baeee310cd50cfa79303b328e3334d960c8e98e6/faild_jm.log>
On Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <trohrm...@apache.org> wrote: > Hmmm, Flink should not delete the stored blobs on the HA storage. Could > you try to reproduce the problem and then send us the logs on DEBUG level? > Please also check before shutting the cluster down, that the files were > there. > > Cheers, > Till > > On Sun, Jun 3, 2018 at 1:10 PM miki haiat <miko5...@gmail.com> wrote: > >> Hi Till , >> >> 1. the files are not longer exist in HDFS. >> 2. yes , stop and start the cluster from the bin commands. >> 3. unfortunately i deleted the log.. :( >> >> >> I wondered if this code could cause this issue , the way in using >> checkpoint >> >> StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints"); >> env.setStateBackend(sb); >> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE); >> env.getCheckpointConfig().setCheckpointInterval(60000); >> >> >> >> >> >> >> >> >> >> >> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >>> Hi Miki, >>> >>> could you check whether the files are really no longer stored on HDFS? >>> How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I >>> just tried it locally and it could recover the job after calling >>> `bin/start-cluster.sh` again. >>> >>> What would be helpful are the logs from the initial run of the job. So >>> if you can reproduce the problem, then this log would be very helpful. >>> >>> Cheers, >>> Till >>> >>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <miko5...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Im having some wierd issue with the JM recovery , >>>> I using HDFS and ZOOKEEPER for HA stand alone cluster . >>>> >>>> Iv stop the cluster change some parameters in the flink conf (Memory). >>>> But now when i start the cluster again im having an error that >>>> preventing from JM to start. >>>> somehow the checkpoint file doesn't exists in HDOOP and JM wont start . >>>> >>>> full log JM log file >>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c> >>>> >>>> >>>>> 2018-05-31 11:57:05,568 ERROR >>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error >>>>> occurred in the cluster entrypoint. >>>> >>>> Caused by: java.lang.Exception: Cannot set up the user code libraries: >>>> File does not exist: >>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca >>>> at >>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72) >>>> >>>> >>>> >>>> >>>