I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a
configuration of 3 slots per TM. The cluster is dedicated to a single job
that runs at full capacity in "FLIP6" mode. So in this cluster, the
parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager).

When I run the job in 1.6.0, seven Task Managers are spun up as expected.
But if I run with 1.6.2 only four Task Managers spin up and the job hangs
waiting for more resources.

Our Flink distribution is set up by script after building from source. So
aside from flink jars, both 1.6.0 and 1.6.2 directories are identical. The
job is the same, restarting from savepoint. The problem is repeatable.

Has something changed in 1.6.2, and if so can it be remedied with a config
change?

Reply via email to