[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033637#comment-16033637
 ] 

Robert Kanter commented on YARN-5464:
-------------------------------------

I won't have time to work on this for quite a while, so I'm unassigning myself 
in case someone else wants to pick it up.  I did manage to implement the state 
store logic (storing the decommissioning nodes data, updating the data, 
recovering the data, unit tests, etc).  However, it's incomplete because 
recovery doesn't work properly: the problem is that there's a timing issue 
between recovering the node data, loading in the include and exclude files, and 
the NMs heartbeating.  As it stands, when the data gets recovered, it can't 
find the {{NodeId}} because it's been removed due to the exclude file.  I'll 
attach the wip patch in case anyone picks this up.

> Server-Side NM Graceful Decommissioning with RM HA
> --------------------------------------------------
>
>                 Key: YARN-5464
>                 URL: https://issues.apache.org/jira/browse/YARN-5464
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>            Priority: Blocker
>         Attachments: YARN-5464.wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to