Hi, I'm a developer of Apache Flink (incubating). Flink has a YARN client that allows users to start Flink in their YARN cluster. The current implementation is not doing fine-grained resource isolation. We just allocate our master and worker nodes within YARN and keep them running until the user stops the cluster again.
The issue I'm facing is that I can not allocate all containers in my YARN cluster. My cluster has 42 nodes with 46.09 GB each (according to the NodeManager) If my Flink client is requesting 41 + 1 (workers + master) containers (with 46GB each), I'm only getting 41 containers (instead of 42). My ApplicationMaster is sending allocate requests until all worker nodes have been started. AllocateResponse response = rmClient.allocate(0); for (Container container : response.getAllocatedContainers()) ... My code is never getting the allocation response for the last container from YARN, even though one NodeManager in the cluster is free. Is there anything that I'm doing wrong here? How can I resolve the issue? Another question: Some of our users were reporting issues with (worker) containers keep running once the Flink YARN session has been stopped. The NodeManagers were reporting that no containers were running but "jps" showed that the JVMs were still there. I tried to reproduce the issue on Amazon EMR which was impossible, but on Google Compute Cloud (HD 2.4.1) I was also seeing the issue. Is this a configuration / YARN version issue? Am I as a YARN application supposed to kill my own JVMs or is YARN doing that for me? Cheers, Robert
