I think you might be hitting with YARN-2997. This issue fixes for sending duplicated completed containers to RM.
Thanks & Regards Rohith Sharma K S -----Original Message----- From: Sharad Agarwal [mailto:sha...@apache.org] Sent: 24 March 2016 08:58 To: Sharad Agarwal Cc: yarn-...@hadoop.apache.org; user@hadoop.apache.org Subject: Re: Leak in RM Capacity scheduler leading to OOM Ticket for this is here -> https://issues.apache.org/jira/browse/YARN-4852 On Wed, Mar 23, 2016 at 5:50 PM, Sharad Agarwal <sha...@apache.org> wrote: > Taking a dump of 8 GB heap shows about 18 million > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto > > Similar counts are there for ApplicationAttempt, ContainerId. All > seems to be linked via > org.apache.hadoop.yarn.proto.YarnProtos$ContainerStatusProto, the > count of which is also about 18 million. > > On further debugging, looking at the CapacityScheduler code: > > It seems to add duplicated entries of UpdatedContainerInfo objects for > the completed containers. In the same dump seeing about 0.5 > UpdatedContainerInfo million objects > > This issue only surfaces if the scheduler thread is not able to drain > fast enough the UpdatedContainerInfo objects, happens only in a big cluster. > > Has anyone noticed the same. We are running hadoop 2.6.0 > > Sharad >