[jira] [Updated] (YARN-4862) Handle duplicate completed containers in RMNodeImpl

Rohith Sharma K S (JIRA) Wed, 02 Nov 2016 00:23:45 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rohith Sharma K S updated YARN-4862:
------------------------------------
    Attachment: YARN-4862-006.patch

bq. I'm also curious why a sleep was added instead of something like a 
drainEvents call.
I used draiEvents in attached patch.

I retained sending 2 container status in node heart beat to verification. There 
are 2 scenarios can occur in reality when NM reports container status to RM. 
# Application belongs to container-status are tracked by RM. Here, RMNodeImpl 
triggers an event to scheduler as completed containers. 
# Application belongs to container-status are *NOT* tracked by RM. Here, 
RMNodeImpl trigger event to scheduler with only one container as completed. 
Rest all containers belong to this application will be skipped. 

In earlier patches, test case was sending container status with 2nd scenario. 
But in latest patch, I have modified test code for 1st scenario. 

I would think still we can optimize such that if application is not tracked by 
RM then RMNodeImpl need not to report even one completed container to scheduler 
at all. May be I am open to handle it in this JIRA or create new JIRA  optimize 
this scenario. 

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch, 
> 0003-YARN-4862.patch, YARN-4862-004.patch, YARN-4862-005.patch, 
> YARN-4862-006.patch
>
>
> As per 
> [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
>  from [~sharadag], there should be safe guard for duplicated container status 
> in RMNodeImpl before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, 
> if any duplicated container are sent to RM(may be bug in NM also), there is 
> significant impact that RMNodImpl always create UpdatedContainerInfo for 
> duplicated containers. This result in increase in the heap memory and causes 
> problem like YARN-4852.
> This is an optimization for issue kind YARN-4852



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-4862) Handle duplicate completed containers in RMNodeImpl

Reply via email to