[ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086061#comment-15086061
 ] 

Jason Lowe commented on YARN-4414:
----------------------------------

Thanks for the patch, Chang!  I'm a bit curious on the naming convention of the 
patches.  Why .1.2 and .1.3 instead of just .2 and .3?  In the future, I'd 
recommend using the patch naming conventions as described in 
http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch to be 
consistent with other contributors and help reduce confusion.

As for the patch the main change looks OK to me, but I have some nits with the 
test:
- Why are we explicitly setting the NM port to 1234?  Shouldn't we inherit the 
same NM port setting from the base conf as the other connection retry tests 
already do?
- getNMProxy2 should just be getNMProxy, overloaded for the Configuration 
parameter.
- Rather than copying the entire method, getProxy() should be implemented in 
terms of getProxy(Configuration).


> Nodemanager connection errors are retried at multiple levels
> ------------------------------------------------------------
>
>                 Key: YARN-4414
>                 URL: https://issues.apache.org/jira/browse/YARN-4414
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1, 2.6.2
>            Reporter: Jason Lowe
>            Assignee: Chang Li
>         Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.3.patch, YARN-4414.1.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to