Re: Unaligned checkpoint waiting in 'start delay' with AsyncDataStream

Chesnay Schepler Wed, 13 Jul 2022 02:01:58 -0700

Ah OK I could reproduce the problem.

Seems to be tied to the capacity of the async operator; if you half thatthe start delay is doubled.It looks like classic back-pressure delaying checkpoints, which kindamakes sense,if you ignore that unaligned checkpoints are enabled which are supposedto prevent that from happening.

I think it'd be best to create a ticket; either something isn't behavingas it should or the documentation is incomplete.


On 12/07/2022 20:43, Nathan Sharp wrote:

I have not found a solution yet, but some points:
  - A co-worker has reproduced this issue on their own box using the recipe 
given below
  - I have tried using rocksdb state backend, which did not help
  - I have tried adding additional TaskWorkers, which did not help
  - I have checked the TaskWorker stats and nothing seems awry. No memory 
consumption, for example. Nothing obvious in the stack traces
  - If I change the code to be sequential instead of async, checkpoints work 
fine
  - The log file merely shows the checkpoint being triggered, then it being 
completed 47 seconds later. No additional information is logged.
  - See the attached image for the UI representation, which shows that the delay is under 
the "Start Time" column.

  Chesnay, how was your Flink cluster configured when it worked for you? Are 
you able to reproduce it using my docker-compose file?

Thanks again!
   Nathan

-----Original Message-----
From: Nathan Sharp
Sent: Monday, July 4, 2022 10:00 AM
To: 'Chesnay Schepler' <ches...@apache.org>; user@flink.apache.org
Subject: RE: Unaligned checkpoint waiting in 'start delay' with AsyncDataStream

Thank you for trying it out! Hopefully, there is just some setting that needs 
to be changed.

I have an Ubuntu VM where I created a single node Docker swarm. Then I used the 
following command to run Flink 1.15.0 using the docker-compose.yml file in the 
repository:

docker stack up -c docker-compose.yml flink

Then I used Flink's web UI to upload the .jar file and run it with default 
settings.

   Nathan

Re: Unaligned checkpoint waiting in 'start delay' with AsyncDataStream

Reply via email to