HI Till,
Iv`e managed to do  reproduce it.
Full log faild_jm.log
<https://gist.githubusercontent.com/miko-code/e634164404354c4c590be84292fd8cb2/raw/baeee310cd50cfa79303b328e3334d960c8e98e6/faild_jm.log>




On Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <trohrm...@apache.org> wrote:

> Hmmm, Flink should not delete the stored blobs on the HA storage. Could
> you try to reproduce the problem and then send us the logs on DEBUG level?
> Please also check before shutting the cluster down, that the files were
> there.
>
> Cheers,
> Till
>
> On Sun, Jun 3, 2018 at 1:10 PM miki haiat <miko5...@gmail.com> wrote:
>
>> Hi  Till ,
>>
>>    1. the files are not longer exist in HDFS.
>>    2. yes , stop and start the cluster from the bin commands.
>>    3.  unfortunately i deleted the log.. :(
>>
>>
>> I wondered if this code could cause this issue , the way in using
>> checkpoint
>>
>> StateBackend sb = new FsStateBackend("hdfs://***/flink/my_city/checkpoints");
>> env.setStateBackend(sb);
>> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
>> env.getCheckpointConfig().setCheckpointInterval(60000);
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> Hi Miki,
>>>
>>> could you check whether the files are really no longer stored on HDFS?
>>> How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I
>>> just tried it locally and it could recover the job after calling
>>> `bin/start-cluster.sh` again.
>>>
>>> What would be helpful are the logs from the initial run of the job. So
>>> if you can reproduce the problem, then this log would be very helpful.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <miko5...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Im having some wierd issue with the JM recovery ,
>>>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>>>
>>>> Iv  stop the cluster change some parameters in the flink conf (Memory).
>>>> But now when i start the cluster again im having an error that
>>>> preventing from JM to start.
>>>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont start .
>>>>
>>>> full log JM log file
>>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>>>
>>>>
>>>>> 2018-05-31 11:57:05,568 ERROR
>>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>>> occurred in the cluster entrypoint.
>>>>
>>>> Caused by: java.lang.Exception: Cannot set up the user code libraries:
>>>> File does not exist:
>>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>>>
>>>>
>>>>
>>>>
>>>

Reply via email to