[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032359#comment-17032359 ] Jason Darrell Lowe commented on YARN-3238: -- This did go into 3.0.0-alpha1 (see commit 92d67ace3248930c0c0335070cc71a480c566a36) but was later superceded by YARN-4414. > Connection timeouts to nodemanagers are retried at multiple levels > -- > > Key: YARN-3238 > URL: https://issues.apache.org/jira/browse/YARN-3238 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Darrell Lowe >Assignee: Jason Darrell Lowe >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1, 3.0.0-alpha1 > > Attachments: YARN-3238.001.patch > > > The IPC layer will retry connection timeouts automatically (see Client.java), > but we are also retrying them with YARN's RetryPolicy put in place when the > NM proxy is created. This causes a two-level retry mechanism where the IPC > layer has already retried quite a few times (45 by default) for each YARN > RetryPolicy error that is retried. The end result is that NM clients can > wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032029#comment-17032029 ] Lakshmi Manasa Gaduputi commented on YARN-3238: --- this patch is not available in 3.0.0-alpha1 [branch-3.0.0-alpha1|[https://github.com/apache/hadoop/blob/branch-3.0.0-alpha1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java#L80]] and [release-3.0.0-alpha1 |[https://github.com/apache/hadoop/blob/release-3.0.0-alpha1-RC0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java#L80]]. A lot of the versions do not have this patch. Is there a reason for not pulling it into higher versions? Did something underlying change so as to NOT need this patch? > Connection timeouts to nodemanagers are retried at multiple levels > -- > > Key: YARN-3238 > URL: https://issues.apache.org/jira/browse/YARN-3238 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Darrell Lowe >Assignee: Jason Darrell Lowe >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1, 3.0.0-alpha1 > > Attachments: YARN-3238.001.patch > > > The IPC layer will retry connection timeouts automatically (see Client.java), > but we are also retrying them with YARN's RetryPolicy put in place when the > NM proxy is created. This causes a two-level retry mechanism where the IPC > layer has already retried quite a few times (45 by default) for each YARN > RetryPolicy error that is retried. The end result is that NM clients can > wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711738#comment-14711738 ] Benoit Sigoure commented on YARN-3238: -- What's the setting to tune down to avoid the 45min timeout? I'd like the code to fail fast. Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642805#comment-14642805 ] Sangjin Lee commented on YARN-3238: --- The patch applies to 2.6.0 cleanly. Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332136#comment-14332136 ] Hudson commented on YARN-3238: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #112 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/112/]) YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: rev 92d67ace3248930c0c0335070cc71a480c566a36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332131#comment-14332131 ] Hudson commented on YARN-3238: -- FAILURE: Integrated in Hadoop-Yarn-trunk #846 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/846/]) YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: rev 92d67ace3248930c0c0335070cc71a480c566a36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332172#comment-14332172 ] Hudson commented on YARN-3238: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2044 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2044/]) YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: rev 92d67ace3248930c0c0335070cc71a480c566a36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332178#comment-14332178 ] Hudson commented on YARN-3238: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #103 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/103/]) YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: rev 92d67ace3248930c0c0335070cc71a480c566a36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/CHANGES.txt Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332198#comment-14332198 ] Hudson commented on YARN-3238: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #112 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/112/]) YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: rev 92d67ace3248930c0c0335070cc71a480c566a36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/CHANGES.txt Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332226#comment-14332226 ] Hudson commented on YARN-3238: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2062 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2062/]) YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: rev 92d67ace3248930c0c0335070cc71a480c566a36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331958#comment-14331958 ] Xuan Gong commented on YARN-3238: - Committed into trunk/branch-2. Thanks, Jason ! Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331991#comment-14331991 ] Jian He commented on YARN-3238: --- I think this is related to the RetryPolicy library we use from common module. the implementation of {{RetryPolicies.retryUpToMaximumTimeWithFixedSleep}} doesn't match the semantics. It should retry based on the overall time taken instead of the number of retries. HADOOP-11398 is trying to fix this. Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331960#comment-14331960 ] Hudson commented on YARN-3238: -- FAILURE: Integrated in Hadoop-trunk-Commit #7175 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7175/]) YARN-3238. Connection timeouts to nodemanagers are retried at multiple (xgong: rev 92d67ace3248930c0c0335070cc71a480c566a36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/CHANGES.txt Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331954#comment-14331954 ] Xuan Gong commented on YARN-3238: - +1 LGTM. Will commit Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329998#comment-14329998 ] Mit Desai commented on YARN-3238: - +1 (non binding) Looks good to me Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329818#comment-14329818 ] Hadoop QA commented on YARN-3238: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699964/YARN-3238.001.patch against trunk revision f56c65b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6687//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6687//console This message is automatically generated. Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)