[
https://issues.apache.org/jira/browse/YARN-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe resolved YARN-3364.
------------------------------
Resolution: Duplicate
Closing this as a duplicate of HADOOP-11398.
> Clarify Naming of yarn.client.nodemanager-connect.max-wait-ms and
> yarn.resourcemanager.connect.max-wait.ms
> -----------------------------------------------------------------------------------------------------------
>
> Key: YARN-3364
> URL: https://issues.apache.org/jira/browse/YARN-3364
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: yarn
> Reporter: Andrew Johnson
>
> I encountered an issue recently where the ApplicationMaster for MapReduce
> jobs would spend hours attempting to connect to a node in my cluster that had
> died due to a hardware fault. After debugging this, I found that the
> yarn.client.nodemanager-connect.max-wait-ms property did not behave as I had
> expected. Based on the name I had thought this would set a maximum time
> limit for attempting to connect to a NodeManager. The code in
> org.apache.hadoop.yarn.client.NMProxy corroborated this thought - it used a
> RetryUpToMaximumTimeWithFixedSleep policy when a ConnectTimeoutException was
> thrown, as it was in my case with a dead node.
> However, the RetryUpToMaximumTimeWithFixedSleep policy doesn't actually set a
> time limit, but instead divides the maximum time by the sleep period to set a
> total number of retries, regardless of how long those retries take. As such
> I was seeing the ApplicationMaster spend much longer attempting to make a
> connection than I had anticipated.
> The yarn.resourcemanager.connect.max-wait.ms would have the same behavior.
> These properties would be better named like
> yarn.client.nodemanager-connect.max.retries and
> yarn.resourcemanager.connect.max.retries to better align with the actual
> behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)