[QUESTION] Allocating a full YARN cluster

Robert Metzger Thu, 20 Nov 2014 03:49:10 -0800

Hi,

I'm a developer of Apache Flink (incubating). Flink has a YARN client that
allows users to start Flink in their YARN cluster. The current
implementation is not doing fine-grained resource isolation. We just
allocate our master and worker nodes within YARN and keep them running
until the user stops the cluster again.


The issue I'm facing is that I can not allocate all containers in my YARN
cluster.


My cluster has 42 nodes with 46.09 GB each (according to the NodeManager)
If my Flink client is requesting 41 + 1 (workers + master) containers (with
46GB each), I'm only getting 41 containers (instead of 42).
My ApplicationMaster is sending allocate requests until all worker nodes
have been started.

AllocateResponse response = rmClient.allocate(0);
for (Container container : response.getAllocatedContainers()) ...

My code is never getting the allocation response for the last container
from YARN, even though one NodeManager in the cluster is free.

Is there anything that I'm doing wrong here? How can I resolve the issue?


Another question: Some of our users were reporting issues with (worker)
containers keep running once the Flink YARN session has been stopped. The
NodeManagers were reporting that no containers were running but "jps"
showed that the JVMs were still there.
I tried to reproduce the issue on Amazon EMR which was impossible, but on
Google Compute Cloud (HD 2.4.1) I was also seeing the issue.
Is this a configuration / YARN version issue?
Am I as a YARN application supposed to kill my own JVMs or is YARN doing
that for me?


Cheers,
Robert

[QUESTION] Allocating a full YARN cluster

Reply via email to