[jira] [Updated] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-09-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3238:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.

> Connection timeouts to nodemanagers are retried at multiple levels
> --
>
> Key: YARN-3238
> URL: https://issues.apache.org/jira/browse/YARN-3238
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-3238.001.patch
>
>
> The IPC layer will retry connection timeouts automatically (see Client.java), 
> but we are also retrying them with YARN's RetryPolicy put in place when the 
> NM proxy is created.  This causes a two-level retry mechanism where the IPC 
> layer has already retried quite a few times (45 by default) for each YARN 
> RetryPolicy error that is retried.  The end result is that NM clients can 
> wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-07-27 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3238:
--
Labels: 2.6.1-candidate  (was: )

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3238:
-
Attachment: YARN-3238.001.patch

Since the IPC layer is already retrying it doesn't make sense to also retry at 
the YARN layer.  Attaching a patch that removes socket connection timeouts from 
the list of errors we retry at the YARN layer.  An alternate approach would be 
to retry at the YARN layer but explicitly tell the IPC layer to _not_ retry 
socket timeouts when creating the proxy.  This change seemed simpler and is 
what we've been doing all along before YARN-2613.

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Priority: Blocker
 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)