Hi
I recently encounter a problem in production. I found checkpoint takes
too much time, although it doesn`t affect the job execution.
I am using FsStateBackend, writing the data to a HDFS
checkpointDataUri, and asynchronousSnapshots, I print the metric data
“lastCheckpointDuration” and “lastCheckpointSize”. It shows the
“lastCheckpointSize” is about 80KB, but the “lastCheckpointDuration” is about
160s! Because checkpoint data is small , I think it should not take that long
time. I do not know why and which condition may influent the checkpoint time.
Does anyone has encounter such problem?
Thanks a lot.
Best
Henry