[
https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344921#comment-15344921
]
Sunil G commented on YARN-4862:
-------------------------------
Agreeing to the discussion here.
Basically its better that we do a sanity like in YARN-5197 always. It can help
to minimize the risk of leaking or syncing NM and RM in a much better way. I
also do not see this as a performance bottleneck, as we are operating on a
small set of running vs finished for a node per heartbeat.
Reg YARN-5279, Interestingly preemption was trying to make use of
KILL_CONTAINER event for killing a container forcefully from RM. Even though
preemption module informed AM that a container to be preempted, in case of AMs
which doesnt handle this preemption messages, RM is forced to kill with
KILL_CONTAINER.
So I think we need not have to inform attempt immediately in KILL_CONTAINER.
Rather we can add to RMNodeImpl's {{containersToCleanUp}} list, and can wait
for NM to report back with completed container list. This will slowup the
cleanup in case if we preempt AM container, but may be more cleaner. Will this
be fine for preemption scenario? Thoughts.
> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
> Key: YARN-4862
> URL: https://issues.apache.org/jira/browse/YARN-4862
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch
>
>
> As per
> [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
> from [~sharadag], there should be safe guard for duplicated container status
> in RMNodeImpl before creating UpdatedContainerInfo.
> Or else in heavily loaded cluster where event processing is gradually slow,
> if any duplicated container are sent to RM(may be bug in NM also), there is
> significant impact that RMNodImpl always create UpdatedContainerInfo for
> duplicated containers. This result in increase in the heap memory and causes
> problem like YARN-4852.
> This is an optimization for issue kind YARN-4852
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]