Jason Lowe created YARN-3238:
--------------------------------
Summary: Connection timeouts to nodemanagers are retried at
multiple levels
Key: YARN-3238
URL: https://issues.apache.org/jira/browse/YARN-3238
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker
The IPC layer will retry connection timeouts automatically (see Client.java),
but we are also retrying them with YARN's RetryPolicy put in place when the NM
proxy is created. This causes a two-level retry mechanism where the IPC layer
has already retried quite a few times (45 by default) for each YARN RetryPolicy
error that is retried. The end result is that NM clients can wait a very, very
long time for the connection to finally fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)