Hi,
In my experience, this is most likely due to one sub-task is blocked
doing some long-running operation.
Try to run the task manager with some profiler (like VisualVM) and check
for hot spot.
Regards,
Kien
On 10/24/2018 4:02 PM, 徐涛 wrote:
Hi
I am running a flink application with parallelism 64, I left the
checkpoint timeout default value, which is 10minutes, the state size is less
than 1MB, I am using the FsStateBackend.
The application triggers some checkpoints but all of them fails due to
"Checkpoint expired before completing”, I check the checkpoint history, found
that there are 63 subtask acknowledge, but one left n/a, and also the alignment
duration is quite long, about 5m27s.
I want to know why there is one subtask does not acknowledge? And
because the alignment duration is long, what will influent the alignment
duration?
Thank a lot.
Best
Henry