RE: Leak in RM Capacity scheduler leading to OOM

Rohith Sharma K S Wed, 23 Mar 2016 19:14:25 -0700

I think you might be hitting with YARN-2997. This issue fixes for sending 
duplicated completed containers to RM.


Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Sharad Agarwal [mailto:[email protected]] 
Sent: 24 March 2016 08:58
To: Sharad Agarwal
Cc: [email protected]; [email protected]
Subject: Re: Leak in RM Capacity scheduler leading to OOM

Ticket for this is here ->
https://issues.apache.org/jira/browse/YARN-4852

On Wed, Mar 23, 2016 at 5:50 PM, Sharad Agarwal <[email protected]> wrote:

> Taking a dump of 8 GB heap shows about 18 million 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto
>
> Similar counts are there for ApplicationAttempt, ContainerId. All 
> seems to be linked via 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerStatusProto, the 
> count of which is also about 18 million.
>
> On further debugging, looking at the CapacityScheduler code:
>
> It seems to add duplicated entries of UpdatedContainerInfo objects for 
> the completed containers. In the same dump seeing about 0.5 
> UpdatedContainerInfo million objects
>
> This issue only surfaces if the scheduler thread is not able to drain 
> fast enough the UpdatedContainerInfo objects, happens only in a big cluster.
>
> Has anyone noticed the same. We are running hadoop 2.6.0
>
> Sharad
>

RE: Leak in RM Capacity scheduler leading to OOM

Reply via email to