[ 
https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344921#comment-15344921
 ] 

Sunil G commented on YARN-4862:
-------------------------------

Agreeing to the discussion here.

Basically its better that we do a sanity like in YARN-5197 always. It can help 
to minimize the risk of leaking or syncing NM and RM in a much better way. I 
also do not see this as a performance bottleneck, as we are operating on a 
small set of running vs finished for a node per heartbeat.

Reg YARN-5279, Interestingly preemption was trying to make use of 
KILL_CONTAINER event for killing a container forcefully from RM. Even though 
preemption module informed AM that a container to be preempted, in case of AMs 
which doesnt handle this preemption messages, RM is forced to kill with 
KILL_CONTAINER. 
So I think we need not have to inform attempt immediately in  KILL_CONTAINER. 
Rather we can add to RMNodeImpl's {{containersToCleanUp}} list, and can wait 
for NM to report back with completed container list. This will slowup the 
cleanup in case if we preempt AM container, but may be more cleaner. Will this 
be fine for preemption scenario? Thoughts.

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch
>
>
> As per 
> [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
>  from [~sharadag], there should be safe guard for duplicated container status 
> in RMNodeImpl before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, 
> if any duplicated container are sent to RM(may be bug in NM also), there is 
> significant impact that RMNodImpl always create UpdatedContainerInfo for 
> duplicated containers. This result in increase in the heap memory and causes 
> problem like YARN-4852.
> This is an optimization for issue kind YARN-4852



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to