[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2020-02-07 Thread Jason Darrell Lowe (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032359#comment-17032359
 ] 

Jason Darrell Lowe commented on YARN-3238:
--

This did go into 3.0.0-alpha1 (see commit 
92d67ace3248930c0c0335070cc71a480c566a36) but was later superceded by YARN-4414.

> Connection timeouts to nodemanagers are retried at multiple levels
> --
>
> Key: YARN-3238
> URL: https://issues.apache.org/jira/browse/YARN-3238
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Darrell Lowe
>Assignee: Jason Darrell Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1, 3.0.0-alpha1
>
> Attachments: YARN-3238.001.patch
>
>
> The IPC layer will retry connection timeouts automatically (see Client.java), 
> but we are also retrying them with YARN's RetryPolicy put in place when the 
> NM proxy is created.  This causes a two-level retry mechanism where the IPC 
> layer has already retried quite a few times (45 by default) for each YARN 
> RetryPolicy error that is retried.  The end result is that NM clients can 
> wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2020-02-06 Thread Lakshmi Manasa Gaduputi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032029#comment-17032029
 ] 

Lakshmi Manasa Gaduputi commented on YARN-3238:
---

this patch is not available in 3.0.0-alpha1 
[branch-3.0.0-alpha1|[https://github.com/apache/hadoop/blob/branch-3.0.0-alpha1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java#L80]]
 and [release-3.0.0-alpha1 
|[https://github.com/apache/hadoop/blob/release-3.0.0-alpha1-RC0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java#L80]].

A lot of the versions do not have this patch. Is there a reason for not pulling 
it into higher versions? Did something underlying change so as to NOT need this 
patch? 

> Connection timeouts to nodemanagers are retried at multiple levels
> --
>
> Key: YARN-3238
> URL: https://issues.apache.org/jira/browse/YARN-3238
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Darrell Lowe
>Assignee: Jason Darrell Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1, 3.0.0-alpha1
>
> Attachments: YARN-3238.001.patch
>
>
> The IPC layer will retry connection timeouts automatically (see Client.java), 
> but we are also retrying them with YARN's RetryPolicy put in place when the 
> NM proxy is created.  This causes a two-level retry mechanism where the IPC 
> layer has already retried quite a few times (45 by default) for each YARN 
> RetryPolicy error that is retried.  The end result is that NM clients can 
> wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-08-25 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711738#comment-14711738
 ] 

Benoit Sigoure commented on YARN-3238:
--

What's the setting to tune down to avoid the 45min timeout? I'd like the code 
to fail fast.

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-07-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642805#comment-14642805
 ] 

Sangjin Lee commented on YARN-3238:
---

The patch applies to 2.6.0 cleanly.

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332136#comment-14332136
 ] 

Hudson commented on YARN-3238:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #112 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/112/])
YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: 
rev 92d67ace3248930c0c0335070cc71a480c566a36)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java


 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332131#comment-14332131
 ] 

Hudson commented on YARN-3238:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #846 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/846/])
YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: 
rev 92d67ace3248930c0c0335070cc71a480c566a36)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java


 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332172#comment-14332172
 ] 

Hudson commented on YARN-3238:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2044 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2044/])
YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: 
rev 92d67ace3248930c0c0335070cc71a480c566a36)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java


 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332178#comment-14332178
 ] 

Hudson commented on YARN-3238:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #103 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/103/])
YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: 
rev 92d67ace3248930c0c0335070cc71a480c566a36)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* hadoop-yarn-project/CHANGES.txt


 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332198#comment-14332198
 ] 

Hudson commented on YARN-3238:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #112 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/112/])
YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: 
rev 92d67ace3248930c0c0335070cc71a480c566a36)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* hadoop-yarn-project/CHANGES.txt


 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332226#comment-14332226
 ] 

Hudson commented on YARN-3238:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2062 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2062/])
YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: 
rev 92d67ace3248930c0c0335070cc71a480c566a36)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java


 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331958#comment-14331958
 ] 

Xuan Gong commented on YARN-3238:
-

Committed into trunk/branch-2. Thanks, Jason !

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331991#comment-14331991
 ] 

Jian He commented on YARN-3238:
---

I think this is related to the RetryPolicy library we use from common module.
the implementation of {{RetryPolicies.retryUpToMaximumTimeWithFixedSleep}} 
doesn't match the semantics. It should retry based on the overall time taken 
instead of the number of retries.  HADOOP-11398 is trying to fix this. 

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331960#comment-14331960
 ] 

Hudson commented on YARN-3238:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7175 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7175/])
YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: 
rev 92d67ace3248930c0c0335070cc71a480c566a36)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* hadoop-yarn-project/CHANGES.txt


 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331954#comment-14331954
 ] 

Xuan Gong commented on YARN-3238:
-

+1 LGTM. Will commit

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-20 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329998#comment-14329998
 ] 

Mit Desai commented on YARN-3238:
-

+1 (non binding)
Looks good to me

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329818#comment-14329818
 ] 

Hadoop QA commented on YARN-3238:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699964/YARN-3238.001.patch
  against trunk revision f56c65b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6687//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6687//console

This message is automatically generated.

 Connection timeouts to nodemanagers are retried at multiple levels
 --

 Key: YARN-3238
 URL: https://issues.apache.org/jira/browse/YARN-3238
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-3238.001.patch


 The IPC layer will retry connection timeouts automatically (see Client.java), 
 but we are also retrying them with YARN's RetryPolicy put in place when the 
 NM proxy is created.  This causes a two-level retry mechanism where the IPC 
 layer has already retried quite a few times (45 by default) for each YARN 
 RetryPolicy error that is retried.  The end result is that NM clients can 
 wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)