[
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086061#comment-15086061
]
Jason Lowe commented on YARN-4414:
----------------------------------
Thanks for the patch, Chang! I'm a bit curious on the naming convention of the
patches. Why .1.2 and .1.3 instead of just .2 and .3? In the future, I'd
recommend using the patch naming conventions as described in
http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch to be
consistent with other contributors and help reduce confusion.
As for the patch the main change looks OK to me, but I have some nits with the
test:
- Why are we explicitly setting the NM port to 1234? Shouldn't we inherit the
same NM port setting from the base conf as the other connection retry tests
already do?
- getNMProxy2 should just be getNMProxy, overloaded for the Configuration
parameter.
- Rather than copying the entire method, getProxy() should be implemented in
terms of getProxy(Configuration).
> Nodemanager connection errors are retried at multiple levels
> ------------------------------------------------------------
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.7.1, 2.6.2
> Reporter: Jason Lowe
> Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch,
> YARN-4414.1.3.patch, YARN-4414.1.patch
>
>
> This is related to YARN-3238. Ran into more scenarios where connection
> errors are being retried at multiple levels, like NoRouteToHostException.
> The fix for YARN-3238 was too specific, and I think we need a more general
> solution to catch a wider array of connection errors that can occur to avoid
> retrying them both at the RPC layer and at the NM proxy layer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)