Re: [QUESTION] Allocating a full YARN cluster

Robert Metzger Mon, 24 Nov 2014 06:31:42 -0800

Hi Ravi,

thanks for you reply. I have two screenshots that show the nodes and the
scheduler output:


Nodes: http://img42.com/gfrGH
Scheduler: http://img42.com/6i8x8

Maybe I'm doing something wrong in my code. The screenshots indicate that
the scheduler assigned the resources to my AppicationMaster but it does not
allocate the container.
The only thing I'm doing in my code is rmClient.allocate(0) and
response.getAllocatedContainers() until all containers are allocated. But
my code is not getting the response for the last container.
Is there anything else that I have to do in the Application Master? Are
there any logfiles you suggest looking into?

Best,
Robert


On Fri, Nov 21, 2014 at 12:03 AM, Ravi Prakash <[email protected]> wrote:

> Hi Robert!
> Do you see a nodemanager really with 46Gb free in the web ui (
> http://resourcemanager:8088/cluster/nodes) ? How big is your
> ApplicationMaster itself (because that too is launched in a container).
> What happens if you ask for, say 30Gb containers?
> CheersRavi
>
>
>      On Thursday, November 20, 2014 3:48 AM, Robert Metzger <
> [email protected]> wrote:
>
>
>  Hi,
>
> I'm a developer of Apache Flink (incubating). Flink has a YARN client that
> allows users to start Flink in their YARN cluster. The current
> implementation is not doing fine-grained resource isolation. We just
> allocate our master and worker nodes within YARN and keep them running
> until the user stops the cluster again.
>
> The issue I'm facing is that I can not allocate all containers in my YARN
> cluster.
>
>
> My cluster has 42 nodes with 46.09 GB each (according to the NodeManager)
> If my Flink client is requesting 41 + 1 (workers + master) containers (with
> 46GB each), I'm only getting 41 containers (instead of 42).
> My ApplicationMaster is sending allocate requests until all worker nodes
> have been started.
>
> AllocateResponse response = rmClient.allocate(0);
> for (Container container : response.getAllocatedContainers()) ...
>
> My code is never getting the allocation response for the last container
> from YARN, even though one NodeManager in the cluster is free.
>
> Is there anything that I'm doing wrong here? How can I resolve the issue?
>
>
> Another question: Some of our users were reporting issues with (worker)
> containers keep running once the Flink YARN session has been stopped. The
> NodeManagers were reporting that no containers were running but "jps"
> showed that the JVMs were still there.
> I tried to reproduce the issue on Amazon EMR which was impossible, but on
> Google Compute Cloud (HD 2.4.1) I was also seeing the issue.
> Is this a configuration / YARN version issue?
> Am I as a YARN application supposed to kill my own JVMs or is YARN doing
> that for me?
>
>
> Cheers,
> Robert
>
>
>
>

Re: [QUESTION] Allocating a full YARN cluster

Reply via email to