[
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli updated YARN-3238:
------------------------------------------
Fix Version/s: 2.6.1
Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.
> Connection timeouts to nodemanagers are retried at multiple levels
> ------------------------------------------------------------------
>
> Key: YARN-3238
> URL: https://issues.apache.org/jira/browse/YARN-3238
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Blocker
> Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-3238.001.patch
>
>
> The IPC layer will retry connection timeouts automatically (see Client.java),
> but we are also retrying them with YARN's RetryPolicy put in place when the
> NM proxy is created. This causes a two-level retry mechanism where the IPC
> layer has already retried quite a few times (45 by default) for each YARN
> RetryPolicy error that is retried. The end result is that NM clients can
> wait a very, very long time for the connection to finally fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)