Hi all,

I have a job with a large amount of broadcast state (62MB).

I took a savepoint when my workflow was running with parallelism 300.

I then restarted the workflow with parallelism 400.

The first 297 sub-tasks restored their broadcast state fairly quickly, but 
after that it slowed to a crawl (maybe 2 sub-tasks finished per minute)

After 10 minutes we killed the job, so I don’t know if it would have ultimately 
succeeded.

Is this expected? Seems like it could lead to a bad situation, where it would 
take an hour to restart the workflow.

Thanks,

— Ken

--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch

Reply via email to