[
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033637#comment-16033637
]
Robert Kanter commented on YARN-5464:
-------------------------------------
I won't have time to work on this for quite a while, so I'm unassigning myself
in case someone else wants to pick it up. I did manage to implement the state
store logic (storing the decommissioning nodes data, updating the data,
recovering the data, unit tests, etc). However, it's incomplete because
recovery doesn't work properly: the problem is that there's a timing issue
between recovering the node data, loading in the include and exclude files, and
the NMs heartbeating. As it stands, when the data gets recovered, it can't
find the {{NodeId}} because it's been removed due to the exclude file. I'll
attach the wip patch in case anyone picks this up.
> Server-Side NM Graceful Decommissioning with RM HA
> --------------------------------------------------
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: graceful
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Priority: Blocker
> Attachments: YARN-5464.wip.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]