[ 
https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4862:
------------------------------------
    Attachment: YARN-4862-004.patch

Updating patch handling completed container leak. The scenario is when ever RM 
do not track containers, in RMNodeImpl conatainerId get added to 
completedContainer list. Since this container is not tracked by RM, RM just 
ignore it. This causes leak in completedContainer. 

I have updated patch fixing the leak by triggering an event to RMNodeImpl. This 
is basically same issue as YARN-5279. But I would prefer to add in this JIRA 
itself rather than committing separately.

As part of latest patch attached, I have combined patch of YARN-5279 too. With 
respect addressing comments of YARN-5279, I have not created different event 
class and name as per comment. I have reused same event type 
FINISHED_CONTAINERS_PULLED_BY_AM and its class 
RMNodeFinishedContainersPulledByAMEvent. It is because, both event are same to 
RMNodeImpl. May be I can change existing event type  
FINISHED_CONTAINERS_PULLED_BY_AM to CONTAINERS_TO_BE_REMOVED_FROM_NM. Thoughts?

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch, 
> 0003-YARN-4862.patch, YARN-4862-004.patch
>
>
> As per 
> [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
>  from [~sharadag], there should be safe guard for duplicated container status 
> in RMNodeImpl before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, 
> if any duplicated container are sent to RM(may be bug in NM also), there is 
> significant impact that RMNodImpl always create UpdatedContainerInfo for 
> duplicated containers. This result in increase in the heap memory and causes 
> problem like YARN-4852.
> This is an optimization for issue kind YARN-4852



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to