Hi Dan,

Flink should already have integrate a tool in the web UI to monitor 
the detailed statistics of the checkpoint [1]. It would show the time
consumed in each part and each task, thus it could be used to debug
the checkpoint timeout.

Best,
Yun



[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/checkpoint_monitoring/


 ------------------Original Mail ------------------
Sender:Dan Hill <quietgol...@gmail.com>
Send Date:Sat Jun 12 09:15:50 2021
Recipients:user <user@flink.apache.org>
Subject:Checkpoint is timing out - inspecting state

Hi.

We're doing something bad with our Flink state.  We just launched a feature 
that creates very big values (lists of objects that we append to) in MapState.

Our checkpoints time out (10 minutes).  I'm assuming the values are too big.  
Backpressure is okay and cpu+memory metrics look okay.

Questions

1. Is there an easy tool for inspecting the Flink state?

I found this post about drilling into Flink state.  I was hoping for something 
more like a CLI.

2. Is there a way to break down the time spent during a checkout if it times 
out?

Thanks!
- Dan


Reply via email to