Hi Andrey,
Weird that I didn't see your reply in my email inbox. My colleague happened
to see it in apache archive :)
nope, we didn't experience it with 1.4 (previous version)
Yes, we did use HA setup.
high-availability: zookeeper
high-availability.zookeeper.quorum: ...
Hi Steven,
Did you not experience this problem with previous Flink release (your
marked topic with 1.7)?
Do you use HA setup?
Without HA setup, the blob data, which belongs to the job, will be
distributed from job master node to all task executors.
Depending on the size of the blob data (jars,
When we start a high-parallelism (1,600) job without any
checkpoint/savepoint, the job struggled to be deployed. After a few
restarts, it eventually got deployed and was running fine after the initial
struggle. jobmanager was very busy. Web UI was very slow. I saw these two
exceptions/failures