Re: Where is the "Latest Savepoint" information saved?

Hao Sun Sun, 11 Nov 2018 10:57:02 -0800

Thanks, I'll check it out.

On Sun, Nov 11, 2018 at 10:52 AM Ufuk Celebi <u...@apache.org> wrote:


> I think there should be a log message, but I don't know what the exact
> format is (you would need to look through it and search for something
> related to CompletedCheckpointStore).
>
> An alternative is the web UI checkpointing tab. It shows the latest
> checkpoint used for restore of the job. You should see your savepoint
> there.
>
> Best,
>
> Ufuk
>
>
> On Sun, Nov 11, 2018 at 7:45 PM Hao Sun <ha...@zendesk.com> wrote:
> >
> > This is great, I will try option 3 and let you know.
> > Can I log some message so I know job is recovered from the latest
> savepoint?
> >
> > On Sun, Nov 11, 2018 at 10:42 AM Ufuk Celebi <u...@apache.org> wrote:
> >>
> >> Hey Hao and Paul,
> >>
> >> 1) Fetch checkpoint info manually from ZK (problematic, not recommended)
> >> - As Paul pointed out, this is problematic as the node is a serialized
> >> pointer (StateHandle) to a CompletedCheckpoint in the HA storage
> >> directory and not a path [1].
> >> - I would not recommend this approach at the moment
> >>
> >> 2) Using the HTTP API to fetch the latest savepoint pointer (possible,
> >> but cumbersome)
> >> - As Paul proposed, you could use /jobs/:jobid/checkpoints to fetch
> >> checkpoint statistics about the latest savepoint
> >> - The latest savepoint is available as a separate entry under
> >> `latest.savepoint` (If I'm reading the docs [2] correctly)
> >> - You would need to manually do this before shutting down (requires
> >> custom tooling to automate)
> >>
> >> 3) Use cancelWithSavepoint
> >> - If you keep `high-availability.cluster-id` consistent between
> >> executions of your job cluster, using cancelWithSavepoint [3] should
> >> add the the savepoint to ZK before cancelling the job
> >> - On the next execution of your job cluster, Flink should
> >> automatically pick it up (no need to attach a savepoint restore path
> >> manually)
> >>
> >> I've not tried 3) myself yet, but believe it should work. If you have
> >> time to try it out, I'd be happy to hear whether it works as expected
> >> for you.
> >>
> >> – Ufuk
> >>
> >> [1] I believe this is overly complicated and should be simplified in
> the future.
> >> [2] Search /jobs/:jobid/checkpoints in
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html
> <https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html>
> >> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html#job-cancellation
> <https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html#job-cancellation>
> >>
> >> On Fri, Nov 9, 2018 at 5:03 PM Hao Sun <ha...@zendesk.com> wrote:
> >> >
> >> > Can we add an option to allow job cluster mode to start from the
> latest save point? Otherwise I have to somehow get the info from ZK, before
> job cluster's container started by K8s.
> >> >
> >> > On Fri, Nov 9, 2018, 01:29 Paul Lam <paullin3...@gmail.com> wrote:
> >> >>
> >> >> Hi Hao,
> >> >>
> >> >> The savepoint path is stored in ZK, but it’s in binary format, so in
> order to retrieve the path you have to deserialize it back to some Flink
> internal object.
> >> >>
> >> >> A better approach would be using REST api to get the path. You could
> find it here[1].
> >> >>
> >> >> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html
> <https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html>
> >> >>
> >> >> Best,
> >> >> Paul Lam
> >> >>
> >> >>
> >> >> 在 2018年11月9日，13:55，Hao Sun <ha...@zendesk.com> 写道：
> >> >>
> >> >> Since this save point path is very useful to application updates,
> where is this information stored? Can we keep it in ZK or S3 for retrieval?
> >> >>
> >> >> <image.png>
> >> >>
> >> >>
>

Re: Where is the "Latest Savepoint" information saved?

Reply via email to