[ 
https://issues.apache.org/jira/browse/YARN-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590207#comment-14590207
 ] 

Jason Lowe commented on YARN-3811:
----------------------------------

bq. this is not possible to do as the NM needs to report the RPC server port 
during registration - so, server start should happen before registration.
Yes, but that's a limitation in the RPC layer.  If we could bind the server 
before we start it then we could know the port, register with the RM, then 
start the server.  IMHO the RPC layer should support this, but I understand 
we'll have to work around the lack of that in the interim.  I think we all can 
agree the retry exception is just a hack being used because we can't keep the 
client service from serving too soon.

> NM restarts could lead to app failures
> --------------------------------------
>
>                 Key: YARN-3811
>                 URL: https://issues.apache.org/jira/browse/YARN-3811
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>
> Consider the following scenario:
> 1. RM assigns a container on node N to an app A.
> 2. Node N is restarted
> 3. A tries to launch container on node N.
> 3 could lead to an NMNotYetReadyException depending on whether NM N has 
> registered with the RM. In MR, this is considered a task attempt failure. A 
> few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to