[ 
https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342008#comment-15342008
 ] 

Jason Lowe commented on YARN-4862:
----------------------------------

Agree that the RM needs to inform the NM to stop tracking a container if the RM 
receives an update for a container it doesn't know about.  As a sanity-check we 
could put in some logic similar to what was done in YARN-5197.  Then the RM can 
detect when the NM has stopped reporting a completed container so we can also 
remove it from the completedContainers tracking set.  That should prevent any 
leak from occurring in the new set even if the RM somehow fails to send the 
container removal event to the RMNodeImpl.

Besides the suggestion above to harden against leaks, another minor comment on 
the patch:
{code}
        if (!completedContainers.contains(containerId)) {
          completedContainers.add(containerId);
          newlyCompletedContainers.add(remoteContainer);
        }
{code}
The above should be simplified to the following to avoid the double-lookup on 
the completedContainers set:
{code}
  if (completedContainers.add(containerId)) {
    newlyCompletedContainers.add(remoteContainer);
  }
{code}


> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch
>
>
> As per 
> [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
>  from [~sharadag], there should be safe guard for duplicated container status 
> in RMNodeImpl before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, 
> if any duplicated container are sent to RM(may be bug in NM also), there is 
> significant impact that RMNodImpl always create UpdatedContainerInfo for 
> duplicated containers. This result in increase in the heap memory and causes 
> problem like YARN-4852.
> This is an optimization for issue kind YARN-4852



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to