Hi Vinay, Oh! You use a collection source? That's the problem. Please use a general source like Kafka or others. Maybe your checkpoint has not be triggered, your job has stopped.
Thanks, vino. 2018-07-27 16:07 GMT+08:00 Vinay Patil <vinay18.pa...@gmail.com>: > Hi Vino, > > Yes I am enabling checkpoint in the code as follows : > > StreamExecutionEnvironment env = > StreamExecutionEnvironment.createRemoteEnvironment("<job_manager_host>,<job_manager_port>,getJobConfiguration(),jarPath"); > > > env.enableCheckpointing(1000); > > env.setSateBackend(new > FsStateBackend("file:///<shared_mount_point_location>")); > > env.getCheckpointConfig().setMinPauseBetweenCheckpoints(1000); > > > In getJobConfiguration method I have set HA related properties like > HA_STORAGE_PATH,HA_ZOOKEEPER_QUORUM,HA_ZOOKEEPER_ROOT,HA_ > MODE,HA_JOB_MANAGER_PORT_RANGE,HA_CLUSTER_ID > > > I can see the error in Job Manager logs where it says Collection Source is > not being executed at the moment. Aborting checkpoint. In the pipeline I > have a stream initialized using "fromCollection". I think I will have to > get rid of this. > > What do you suggest > > Regards, > Vinay Patil > > > On Thu, Jul 26, 2018 at 12:04 PM vino yang <yanghua1...@gmail.com> wrote: > >> Hi Vinay: >> >> Did you call specific config API refer to this documentation[1]; >> >> Can you share your job program and JM Log? Or the JM log contains the log >> message like this pattern "Triggering checkpoint {} @ {} for job {}."? >> >> [1]: https://ci.apache.org/projects/flink/flink-docs- >> release-1.5/dev/stream/state/checkpointing.html#enabling- >> and-configuring-checkpointing >> >> Thanks, vino. >> >> 2018-07-25 19:43 GMT+08:00 Chesnay Schepler <ches...@apache.org>: >> >>> Can you provide us with the job code? >>> >>> I assume that checkpointing runs properly if you submit the same job to >>> a normal cluster? >>> >>> >>> On 25.07.2018 13:15, Vinay Patil wrote: >>> >>> No error in the logs. That is why I am not able to understand why >>> checkpoints are not getting triggered. >>> >>> Regards, >>> Vinay Patil >>> >>> >>> On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil <vinay18.pa...@gmail.com> >>> wrote: >>> >>>> Hi Chesnay, >>>> >>>> No error in the logs. That is why I am not able to understand why >>>> checkpoints are getting triggered. >>>> >>>> Regards, >>>> Vinay Patil >>>> >>>> >>>> On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler <ches...@apache.org> >>>> wrote: >>>> >>>>> Please check the job- and taskmanager logs for anything suspicious. >>>>> >>>>> On 25.07.2018 12:33, Vinay Patil wrote: >>>>> >>>>> Hi, >>>>> >>>>> I am starting the cluster using bootstrap application where in I am >>>>> calling Job Manager and Task Manager main class to form the cluster. The >>>>> HA >>>>> cluster is formed correctly and I am able to submit jobs to this cluster >>>>> using RemoteExecutionEnvironment but when I enable checkpointing in code I >>>>> do not see any checkpoints triggered on Flink UI. >>>>> >>>>> Am I missing any configurations to be set for the >>>>> RemoteExecutionEnvironment for checkpointing to work. >>>>> >>>>> >>>>> Regards, >>>>> Vinay Patil >>>>> >>>>> >>>>> >>> >>