Hi Yun. The UI was not useful for this case. I had a feeling before hand about what the issue was. We refactored the state and now the checkpoint is 10x faster.
On Mon, Jun 14, 2021 at 5:47 AM Yun Gao <yungao...@aliyun.com> wrote: > Hi Dan, > > Flink should already have integrate a tool in the web UI to monitor > the detailed statistics of the checkpoint [1]. It would show the time > consumed in each part and each task, thus it could be used to debug > the checkpoint timeout. > > Best, > Yun > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/checkpoint_monitoring/ > > ------------------Original Mail ------------------ > *Sender:*Dan Hill <quietgol...@gmail.com> > *Send Date:*Sat Jun 12 09:15:50 2021 > *Recipients:*user <user@flink.apache.org> > *Subject:*Checkpoint is timing out - inspecting state > >> Hi. >> >> We're doing something bad with our Flink state. We just launched a >> feature that creates very big values (lists of objects that we append to) >> in MapState. >> >> Our checkpoints time out (10 minutes). I'm assuming the values are too >> big. Backpressure is okay and cpu+memory metrics look okay. >> >> Questions >> >> 1. Is there an easy tool for inspecting the Flink state? >> >> I found this post about drilling into Flink state >> <https://flink.apache.org/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html>. >> I was hoping for something more like a CLI. >> >> 2. Is there a way to break down the time spent during a checkout if it >> times out? >> >> Thanks! >> - Dan >> >> >>