[ 
https://issues.apache.org/jira/browse/YARN-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202069#comment-14202069
 ] 

Jason Lowe commented on YARN-2632:
----------------------------------

Thanks for taking this up, Junping, and for reviews, Varun!  Further comments 
on the latest patch:

In Preconditions the wording of the first sentence is a bit off and we're 
inconsistent with the use of "nodemanager" vs. "NodeManager."  I suggest 
something like the following:
{noformat}
Ephemeral ports (port 0, which is the default) cannot be used for the 
NodeManager address because the NodeManager may restart with a different 
address.
{noformat}

"that is waitting" s/b "that are waiting"

Wondering if we should simply put what's currently in the Preconditions section 
down in the steps for enabling NM restart.  Arguably it's just the third step 
in the config, and we can put a line or two of explanation next to the 
instructions.  That way if someone just scans down to the steps to enable it, 
they will also see that they have to not only set 
yarn.nodemanager.recovery.enabled and yarn.nodemanager.recovery.dir but also 
change yarn.nodemanager.address.

The mapreduce_shuffle auxiliary service and any other auxiliary service also 
needs to be configured to support NM restart (e.g.: avoid using ephemeral ports 
or otherwise support recovering with the same address).  mapreduce_shuffle uses 
mapreduce.shuffle.port for the shuffle port, for example, and it, too, doesn't 
support ephemeral ports for restart.

There should also be a caveat that configured auxiliary services must support 
recovery or otherwise NM functionality may be affected in those areas upon 
restart.

"To enable NM Restart functionality, set ..." should just be "Set ..." because 
the line just before this already says "Enabling NM Restart consists of ...".


> Document NM Restart feature
> ---------------------------
>
>                 Key: YARN-2632
>                 URL: https://issues.apache.org/jira/browse/YARN-2632
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Blocker
>         Attachments: YARN-2632-v2.patch, YARN-2632-v3.patch, YARN-2632.patch
>
>
> As a new feature to YARN, we should document this feature's behavior, 
> configuration, and things to pay attention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to