[
https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohith Sharma K S updated YARN-4862:
------------------------------------
Attachment: YARN-4862-006.patch
bq. I'm also curious why a sleep was added instead of something like a
drainEvents call.
I used draiEvents in attached patch.
I retained sending 2 container status in node heart beat to verification. There
are 2 scenarios can occur in reality when NM reports container status to RM.
# Application belongs to container-status are tracked by RM. Here, RMNodeImpl
triggers an event to scheduler as completed containers.
# Application belongs to container-status are *NOT* tracked by RM. Here,
RMNodeImpl trigger event to scheduler with only one container as completed.
Rest all containers belong to this application will be skipped.
In earlier patches, test case was sending container status with 2nd scenario.
But in latest patch, I have modified test code for 1st scenario.
I would think still we can optimize such that if application is not tracked by
RM then RMNodeImpl need not to report even one completed container to scheduler
at all. May be I am open to handle it in this JIRA or create new JIRA optimize
this scenario.
> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
> Key: YARN-4862
> URL: https://issues.apache.org/jira/browse/YARN-4862
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch,
> 0003-YARN-4862.patch, YARN-4862-004.patch, YARN-4862-005.patch,
> YARN-4862-006.patch
>
>
> As per
> [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
> from [~sharadag], there should be safe guard for duplicated container status
> in RMNodeImpl before creating UpdatedContainerInfo.
> Or else in heavily loaded cluster where event processing is gradually slow,
> if any duplicated container are sent to RM(may be bug in NM also), there is
> significant impact that RMNodImpl always create UpdatedContainerInfo for
> duplicated containers. This result in increase in the heap memory and causes
> problem like YARN-4852.
> This is an optimization for issue kind YARN-4852
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]