[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195847#comment-14195847 ] Karthik Kambatla commented on YARN-2010: Jian - thanks for taking the time to look at this closely. The patch looks mostly good. However, this patch only catches DT renewal issues during app recovery. Any other errors that could be encountered in the remaining code paths are not handled gracefully. For instance, any errors during app.recoverAppAttempts can affect the health of the RM. > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, > yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, > yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, > yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195819#comment-14195819 ] Hadoop QA commented on YARN-2802: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679156/YARN-2802.001.patch against trunk revision 2bb327e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5719//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5719//console This message is automatically generated. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Issue Type: Improvement (was: Bug) > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195746#comment-14195746 ] zhihai xu commented on YARN-2802: - attached the patch YARN-2802.001.patch to fix the test error. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: YARN-2802.001.patch > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195658#comment-14195658 ] Hadoop QA commented on YARN-2802: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679114/YARN-2802.000.patch against trunk revision c5a46d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5718//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5718//console This message is automatically generated. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195647#comment-14195647 ] Hadoop QA commented on YARN-2010: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679113/yarn-2010-11.patch against trunk revision c5a46d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5717//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5717//console This message is automatically generated. > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, > yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, > yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, > yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195634#comment-14195634 ] Xuan Gong commented on YARN-2505: - +1 for the latest patch. Leave it tomorrow in case vinod has further comments about it. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, > YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, > YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195630#comment-14195630 ] Hadoop QA commented on YARN-2604: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679108/YARN-2604.patch against trunk revision c5a46d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5716//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5716//console This message is automatically generated. > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195587#comment-14195587 ] Hadoop QA commented on YARN-2505: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679097/YARN-2505.15.patch against trunk revision c5a46d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5715//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5715//console This message is automatically generated. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, > YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, > YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195581#comment-14195581 ] Jian He commented on YARN-2010: --- [~kasha], I reviewed your patch. and just made some edits on top of your patch, could you please take a look ? thx > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, > yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, > yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, > yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: YARN-2802.000.patch > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: (was: YARN-2802.000.patch) > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2010: -- Attachment: yarn-2010-11.patch > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, > yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, > yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, > yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2604: Attachment: YARN-2604.patch The findbugs warning had to do with the lock I added. During initialization, it doesn't use the lock, which should be fine. I've added a findbugs warning. (Also, the findbugs warning seemed backwards on which places it labeled synchronized and unsynchronized) > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195536#comment-14195536 ] Hadoop QA commented on YARN-2604: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679085/YARN-2604.patch against trunk revision 35d353e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5713//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5713//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5713//console This message is automatically generated. > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.
[ https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195535#comment-14195535 ] Hadoop QA commented on YARN-2804: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679093/YARN-2804.1.patch against trunk revision c5a46d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5714//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5714//console This message is automatically generated. > Timeline server .out log have JAXB binding exceptions and warnings. > --- > > Key: YARN-2804 > URL: https://issues.apache.org/jira/browse/YARN-2804 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-2804.1.patch > > > Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve > the resources. However, there are noises in .out log: > {code} > SEVERE: Failed to generate the schema for the JAX-B elements > com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of > IllegalAnnotationExceptions > java.util.Map is an interface, and JAXB can't handle interfaces. > this problem is related to the following location: > at java.util.Map > at public java.util.Map > org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities > java.util.Map does not have a no-arg default constructor. > this problem is related to the following location: > at java.util.Map > at public java.util.Map > org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities > at > com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170) > at > com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248) >
[jira] [Commented] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.
[ https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195532#comment-14195532 ] Craig Welch commented on YARN-2803: --- I'll have a look, [~rusanu], did this happen when you ran the unit tests? Can you have a look also? > MR distributed cache not working correctly on Windows after NodeManager > privileged account changes. > --- > > Key: YARN-2803 > URL: https://issues.apache.org/jira/browse/YARN-2803 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Chris Nauroth >Priority: Critical > > This problem is visible by running {{TestMRJobs#testDistributedCache}} or > {{TestUberAM#testDistributedCache}} on Windows. Both tests fail. Running > git bisect, I traced it to the YARN-2198 patch to remove the need to run > NodeManager as a privileged account. The tests started failing when that > patch was committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195526#comment-14195526 ] Subru Krishnan commented on YARN-2738: -- [~adhoot], I personally prefer per queue configuration and not just for enabling configuration of org specific agents/policies. I do not believe it adds significant overhead while at the same time enabling reservation system to run side-by-side existing queue mechanism. It provides greater flexibility in trying out reservations for only part of the cluster as partitioned by a leaf queue and in phased migration if required. What is the additional complexity between per-queue and system wide settings as we do have global defaults which should work for majority of the scenarios? > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch, YARN-2738.002.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195524#comment-14195524 ] Hadoop QA commented on YARN-2802: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679084/YARN-2802.000.patch against trunk revision 35d353e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5712//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5712//console This message is automatically generated. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195504#comment-14195504 ] Wangda Tan commented on YARN-2505: -- host or host:0 should be host *and* host:0 > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, > YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, > YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195499#comment-14195499 ] Wangda Tan commented on YARN-2505: -- bq. As I understand it it is possible to run more than one nodemanager on a host and in that case they are distinguished by the port they listen on, so there is also a practical/functional reason why the port needs to be retained in the id. Additionally, the node id is well established as the host:port combo throughout, it's good to keep that consistent. I see, I don't have strong opinion about if we should return host or host:0 when port=0 to user. But I still prefer to support host or host:0 when user input hosts. :) > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, > YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, > YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2505: -- Attachment: YARN-2505.15.patch This patch drops the single label at a time operations for both cluster and node level labels to avoid duplication with the group operations. In addition, the cluster node label and individual node label aggregate operations have been harmonized to use the suffix to the post url (.../add .../remove .../replace) operations to keep them consistent with one another (and enable group changes everywhere). The only overlap now is with node-to-labels and node level node label operations, but these are both likely to be useful in different scenarios so it makes sense to have them both. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, > YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, > YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195483#comment-14195483 ] Craig Welch commented on YARN-2505: --- As I understand it it is possible to run more than one nodemanager on a host and in that case they are distinguished by the port they listen on, so there is also a practical/functional reason why the port needs to be retained in the id. Additionally, the node id is well established as the host:port combo throughout, it's good to keep that consistent. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, > YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, > YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.
[ https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2804: -- Attachment: YARN-2804.1.patch In the patch, I made a compromise when changing TimelineEntity and TimelineEvent, to ensure java API compatible as well as satisfy JAXB. For put domain response, I change to return an empty TimelinePutResponse instead of using Jersey Response. After these changes, the exceptions and the warnings are gone from .out. > Timeline server .out log have JAXB binding exceptions and warnings. > --- > > Key: YARN-2804 > URL: https://issues.apache.org/jira/browse/YARN-2804 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-2804.1.patch > > > Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve > the resources. However, there are noises in .out log: > {code} > SEVERE: Failed to generate the schema for the JAX-B elements > com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of > IllegalAnnotationExceptions > java.util.Map is an interface, and JAXB can't handle interfaces. > this problem is related to the following location: > at java.util.Map > at public java.util.Map > org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities > java.util.Map does not have a no-arg default constructor. > this problem is related to the following location: > at java.util.Map > at public java.util.Map > org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities > at > com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170) > at > com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248) > at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235) > at javax.xml.bind.ContextFinder.find(ContextFinder.java:432) > at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637) > at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412) > at > com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352) > at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) > at > com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) > at > com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules
[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally
[ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195478#comment-14195478 ] Vinod Kumar Vavilapalli commented on YARN-1922: --- Thanks for the reviews, [~vvasudev]! > Process group remains alive after container process is killed externally > > > Key: YARN-1922 > URL: https://issues.apache.org/jira/browse/YARN-1922 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 > Environment: CentOS 6.4 >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Fix For: 2.6.0 > > Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, > YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch > > > If the main container process is killed externally, ContainerLaunch does not > kill the rest of the process group. Before sending the event that results in > the ContainerLaunch.containerCleanup method being called, ContainerLaunch > sets the "completed" flag to true. Then when cleaning up, it doesn't try to > read the pid file if the completed flag is true. If it read the pid file, it > would proceed to send the container a kill signal. In the case of the > DefaultContainerExecutor, this would kill the process group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.
[ https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195475#comment-14195475 ] Zhijie Shen commented on YARN-2804: --- If the map interface issue is resolved, another issue which didn't occur before will show up too: {code} java.lang.IllegalAccessException: Class com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class javax.ws.rs.core.Response with modifiers "protected" at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:65) at java.lang.Class.newInstance0(Class.java:349) at java.lang.Class.newInstance(Class.java:308) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) at com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) at com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) at com.sun.jersey.server.impl.wadl.WadlResource.getWadl(WadlResource.java:89) {code} This needs to be fixed together to completely avoid the excessive log though it seems not to be necessary if we upgrade jersey (See [here|https://java.net/projects/jersey/lists/users/archive/2011-10/message/117]) > Timeline server .out log have JAXB binding exceptions and warnings. > --- > > Key: YARN-2804 > URL: https://issues.apache.org/jira/browse/YARN-2804 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > > Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve > the resources. However, there are noises in .out log: > {code} > SEVERE: Failed to generate the schema for the JAX-B elements > com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of > IllegalAnnotationExceptions > java.util.Map is an interface, and JAXB can't handle interfaces. > this problem is related to the following location: > at java.util.Map > at public java.util.Map > org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities > java.util.Map does not have a no-arg default constructor. > this problem is related to the following location: > at java.util.Map > at public java.util.Map > org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities > at > com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319) > at > com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170) > at > com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248) > at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235) > at javax.xml.bind.ContextFinder.find(ContextFinder.jav
[jira] [Created] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.
Zhijie Shen created YARN-2804: - Summary: Timeline server .out log have JAXB binding exceptions and warnings. Key: YARN-2804 URL: https://issues.apache.org/jira/browse/YARN-2804 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve the resources. However, there are noises in .out log: {code} SEVERE: Failed to generate the schema for the JAX-B elements com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of IllegalAnnotationExceptions java.util.Map is an interface, and JAXB can't handle interfaces. this problem is related to the following location: at java.util.Map at public java.util.Map org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent at public java.util.List org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity at public java.util.List org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities java.util.Map does not have a no-arg default constructor. this problem is related to the following location: at java.util.Map at public java.util.Map org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo() at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent at public java.util.List org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents() at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity at public java.util.List org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities at com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106) at com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489) at com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319) at com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170) at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248) at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235) at javax.xml.bind.ContextFinder.find(ContextFinder.java:432) at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637) at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352) at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) at com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at
[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally
[ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195468#comment-14195468 ] Hudson commented on YARN-1922: -- FAILURE: Integrated in Hadoop-trunk-Commit #6432 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6432/]) YARN-1922. Fixed NodeManager to kill process-trees correctly in the presence of races between the launch and the stop-container call and when root processes crash. Contributed by Billie Rinaldi. (vinodkv: rev c5a46d4c8ca236ff641a309f256bbbdf4dd56db5) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java > Process group remains alive after container process is killed externally > > > Key: YARN-1922 > URL: https://issues.apache.org/jira/browse/YARN-1922 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 > Environment: CentOS 6.4 >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, > YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch > > > If the main container process is killed externally, ContainerLaunch does not > kill the rest of the process group. Before sending the event that results in > the ContainerLaunch.containerCleanup method being called, ContainerLaunch > sets the "completed" flag to true. Then when cleaning up, it doesn't try to > read the pid file if the completed flag is true. If it read the pid file, it > would proceed to send the container a kill signal. In the case of the > DefaultContainerExecutor, this would kill the process group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195460#comment-14195460 ] Wangda Tan commented on YARN-2505: -- bq. -re 1 and 2, there are two kinds of consistency in play here - with the other node label apis and also with current apis in the web service. There are quite a few artifacts in the dao which are working with ids, including node, and they don't use "id" to specify it - I think it's assumed as there's no other way to refer to them in a web service context except via id. So, to stay consistent with the other web service apis, I don't think we should add "id" to the dao names. I think I was wrong above, what you have is considering the string passed-in is nodeId, you will try to create a nodeId from that, that is what I expected. And yes, it is consistent with other methods of web service. And another thing I can see is, you used ConverterUtils.toNodeId(..) which can only accept patterns like "host:port". I think we should also support user specify host only without port. Even though we assume port=0 is the whole host, but if user doesn't specify the port, we should treat it as port=0 instead of fail it. And also, I will prefer to return host only when we want to return NodeId to user but the port=0. The port=0 magic is more like a implementation detail, we should avoid to expose it to user as we can. Does this make sense to you? Thanks, Wangda > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, > YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, > YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally
[ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195452#comment-14195452 ] Vinod Kumar Vavilapalli commented on YARN-1922: --- Sorry, didn't look at your previous comment given the progress on other patches. So, I think we overall need to the following: {code} while (pidFile is not Present && the process has not crashed) { // loop } {code} This is same as your do {} while {} loop. +1 for your YARN-1922.5.patch. Checking this in. > Process group remains alive after container process is killed externally > > > Key: YARN-1922 > URL: https://issues.apache.org/jira/browse/YARN-1922 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 > Environment: CentOS 6.4 >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, > YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch > > > If the main container process is killed externally, ContainerLaunch does not > kill the rest of the process group. Before sending the event that results in > the ContainerLaunch.containerCleanup method being called, ContainerLaunch > sets the "completed" flag to true. Then when cleaning up, it doesn't try to > read the pid file if the completed flag is true. If it read the pid file, it > would proceed to send the container a kill signal. In the case of the > DefaultContainerExecutor, this would kill the process group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195453#comment-14195453 ] Chris Nauroth commented on YARN-2198: - It appears that this patch has broken some MR distributed cache functionality on Windows, or at least caused a failure in {{TestMRJobs#testDistributedCache}}. Please see YARN-2803 for more details. > Remove the need to run NodeManager as privileged account for Windows Secure > Container Executor > -- > > Key: YARN-2198 > URL: https://issues.apache.org/jira/browse/YARN-2198 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, > YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, > YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.16.patch, > YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, > YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, > YARN-2198.separation.patch, YARN-2198.trunk.10.patch, > YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, > YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch > > > YARN-1972 introduces a Secure Windows Container Executor. However this > executor requires the process launching the container to be LocalSystem or a > member of the a local Administrators group. Since the process in question is > the NodeManager, the requirement translates to the entire NM to run as a > privileged account, a very large surface area to review and protect. > This proposal is to move the privileged operations into a dedicated NT > service. The NM can run as a low privilege account and communicate with the > privileged NT service when it needs to launch a container. This would reduce > the surface exposed to the high privileges. > There has to exist a secure, authenticated and authorized channel of > communication between the NM and the privileged NT service. Possible > alternatives are a new TCP endpoint, Java RPC etc. My proposal though would > be to use Windows LPC (Local Procedure Calls), which is a Windows platform > specific inter-process communication channel that satisfies all requirements > and is easy to deploy. The privileged NT service would register and listen on > an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop > with libwinutils which would host the LPC client code. The client would > connect to the LPC port (NtConnectPort) and send a message requesting a > container launch (NtRequestWaitReplyPort). LPC provides authentication and > the privileged NT service can use authorization API (AuthZ) to validate the > caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.
[ https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195451#comment-14195451 ] Chris Nauroth commented on YARN-2803: - Here is the stack trace from a failure. {code} testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: 1 6.844 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.mapreduce.v2.TestMRJobs._testDistributedCache(TestMRJobs.java:881) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testDistributedCache(TestMRJobs.java:891) {code} The task log shows the assertion failing when it tries to find job.jar/lib/lib2.jar. {code} 2014-11-03 15:36:33,652 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertNotNull(Assert.java:621) at org.junit.Assert.assertNotNull(Assert.java:631) at org.apache.hadoop.mapreduce.v2.TestMRJobs$DistributedCacheChecker.setup(TestMRJobs.java:764) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:169) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1640) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) {code} > MR distributed cache not working correctly on Windows after NodeManager > privileged account changes. > --- > > Key: YARN-2803 > URL: https://issues.apache.org/jira/browse/YARN-2803 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Chris Nauroth >Priority: Critical > > This problem is visible by running {{TestMRJobs#testDistributedCache}} or > {{TestUberAM#testDistributedCache}} on Windows. Both tests fail. Running > git bisect, I traced it to the YARN-2198 patch to remove the need to run > NodeManager as a privileged account. The tests started failing when that > patch was committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.
Chris Nauroth created YARN-2803: --- Summary: MR distributed cache not working correctly on Windows after NodeManager privileged account changes. Key: YARN-2803 URL: https://issues.apache.org/jira/browse/YARN-2803 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Chris Nauroth Priority: Critical This problem is visible by running {{TestMRJobs#testDistributedCache}} or {{TestUberAM#testDistributedCache}} on Windows. Both tests fail. Running git bisect, I traced it to the YARN-2198 patch to remove the need to run NodeManager as a privileged account. The tests started failing when that patch was committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195442#comment-14195442 ] Hadoop QA commented on YARN-2079: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679073/YARN-2079.patch against trunk revision 35d353e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5711//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5711//console This message is automatically generated. > Recover NonAggregatingLogHandler state upon nodemanager restart > --- > > Key: YARN-2079 > URL: https://issues.apache.org/jira/browse/YARN-2079 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2079.patch > > > The state of NonAggregatingLogHandler needs to be persisted so logs are > properly deleted across a nodemanager restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2604: Attachment: YARN-2604.patch The new patch fixes the test failures: - TestContainerAllocation: Minor adjustment to memory allocation amount - TestFairScheduler: This failing test becomes obsolete with the patch, so I removed it - TestCapacityScheduler: I had to use more fine-grained locking on {{maximumAllocation}} to fix this, so I gave it it's own {{ReentrantReadWriteLock}} instead of just using {{synchronized}} - (TestAMRestart was unrelated) > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2604.patch, YARN-2604.patch > > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195431#comment-14195431 ] Craig Welch commented on YARN-2505: --- [~leftnoteasy] -re 1 and 2, there are two kinds of consistency in play here - with the other node label apis and also with current apis in the web service. There are quite a few artifacts in the dao which are working with ids, including node, and they don't use "id" to specify it - I think it's assumed as there's no other way to refer to them in a web service context except via id. So, to stay consistent with the other web service apis, I don't think we should add "id" to the dao names. As far as the duplication of the put and delete operations on the cluster node labels I tend to agree, it seemed like there were too many ways to do that once the new api's were added, so I'll remove those. I do think that the /nodes/nodeid/labels apis should stay (I believe you are saying the same thing there...) as those are useful for more easily/conveniently working with individual nodes. Will post the updated patch in a few. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, > YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, > YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195430#comment-14195430 ] Hadoop QA commented on YARN-2690: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679061/YARN-2690.004.patch against trunk revision 734eeb4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5708//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5708//console This message is automatically generated. > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch, > YARN-2690.004.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195424#comment-14195424 ] Wangda Tan commented on YARN-2800: -- [~ozawa], Store to levelDB or rockDB may be an option, I think it's worth to investigate the pros/cons if we move node labels store to it. But I would against to put NodeLabelsSever as an independent process. There's one major difference between TimelineServer and NodeLabelsServer: TimelineServer is a storage of historical data, majorly for retrieving. But the NodeLabelsManager is a center piece of scheduling. RM shouldn't be able to schedule if the "NodeLabelsServer" is gone, the scheduled resource is not expected. In the near future, the scalability of NodeLabelsManager will not as large as worth to do that in a independent process, lots of synchronization between processes need to be handled, we should avoid such complexity until we can see the value of doing that :). Does this make sense to you? Thanks, Wangda > Should print WARN log in both RM/RMAdminCLI side when > MemoryRMNodeLabelsManager is enabled > -- > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195420#comment-14195420 ] zhihai xu commented on YARN-2802: - The TestRMProxyUsersConf is passed in my local build: --- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.919 sec - in org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf Results : Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Also Findbugs warnings is not related to my changes: I didn't touch the file RMAppImpl.java in my patch. Bug type REC_CATCH_EXCEPTION (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition In method org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl, RMAppEvent) At RMAppImpl.java:[line 842] Restart the Hadoop QA test. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195421#comment-14195421 ] Karthik Kambatla commented on YARN-2010: The tests pass locally, the findbugs warning is to do with catching Exception instead of just IOException and InterruptedException in RMAppRecoveredTransition > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, > yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, > yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: YARN-2802.000.patch > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: (was: YARN-2802.000.patch) > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195407#comment-14195407 ] Tsuyoshi OZAWA commented on YARN-2800: -- [~leftnoteasy], If we assume the labels as a configuration which can be highly updated, ZK is not good option as you mentioned. In this case, I think NodeLabelsManager, whose backend can be leveldb or rockdb, should be loosely coupling with RM like TimelineServer for stabilization of RM. One option is making NodeLabelsManager NodeLabelsServer. It means RM should work correctly even if NodeLabelsManager is temporary unavailable. And update operation should only affect NodeLabelsManager(it doesn't affect RM). For example, RM pulls the label information from NodeLabelsServer periodically. RM treats the lable information as a hint and does schedule based on label information. Even without the information, RM should schedule apps. I think this weak consistency approach is suitable for large-scale updating. > Should print WARN log in both RM/RMAdminCLI side when > MemoryRMNodeLabelsManager is enabled > -- > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195406#comment-14195406 ] Hadoop QA commented on YARN-2786: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679065/YARN-2786-20141103-1-without-yarn.cmd.patch against trunk revision 35d353e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5710//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5710//console This message is automatically generated. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195404#comment-14195404 ] Hadoop QA commented on YARN-2802: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679054/YARN-2802.000.patch against trunk revision 734eeb4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5705//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5705//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5705//console This message is automatically generated. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195405#comment-14195405 ] Hadoop QA commented on YARN-2010: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679057/yarn-2010-10.patch against trunk revision 734eeb4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5707//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5707//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5707//console This message is automatically generated. > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, > yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, > yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195394#comment-14195394 ] Hadoop QA commented on YARN-2010: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679057/yarn-2010-10.patch against trunk revision 734eeb4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestAppManager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5706//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5706//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5706//console This message is automatically generated. > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, > yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, > yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- Th
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195371#comment-14195371 ] Wangda Tan commented on YARN-2505: -- 1) NodeToLabelsInfo should be NodeIdToLabelsInfo, since we should be able to specify nodeId in REST API to consistent with YarnClient APIs and RM admin CLI. 2) Also we need change name of NodeToLabelsInfo#getNodeToLabels to getNodeIdToLabels if you agree with #1 3) I would prefer drop REST APIs to modifcation single nodeId or nodeLabels like {code} + @DELETE + @Path("/node-labels/{nodeLabel}") {code} Also like: addLabelsToNode/removeLabelsFromNode, etc. Since we have {code} + @POST + @Path("/node-labels/remove") {code} Already. The reason are: single/batch operations seems a little duplicated to me, set a map of nodeId -> labels is not a big burden to end user, regarding both API complexity and performance. However, we can keep get APIs for labels on a node: {code} + @GET + @Path("/nodes/{nodeId}/labels") {code} Since it may not always needed to return all node-to-labels mappings regarding performance. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, > YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, > YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2079: - Attachment: YARN-2079.patch Patch that saves the state of scheduled LogDeleterRunnable objects to the state store and reschedules them upon recovery. Added unit tests for both the leveldb state store changes and log handler recovery. > Recover NonAggregatingLogHandler state upon nodemanager restart > --- > > Key: YARN-2079 > URL: https://issues.apache.org/jira/browse/YARN-2079 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2079.patch > > > The state of NonAggregatingLogHandler needs to be persisted so logs are > properly deleted across a nodemanager restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-2079: Assignee: Jason Lowe > Recover NonAggregatingLogHandler state upon nodemanager restart > --- > > Key: YARN-2079 > URL: https://issues.apache.org/jira/browse/YARN-2079 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > > The state of NonAggregatingLogHandler needs to be persisted so logs are > properly deleted across a nodemanager restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20141103-1-without-yarn.cmd.patch Uploaded a patch without yarn.cmd to kick Jenkins > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195326#comment-14195326 ] Hadoop QA commented on YARN-2786: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679064/YARN-2786-20141103-1-full.patch against trunk revision 35d353e. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5709//console This message is automatically generated. > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20141103-1-full.patch > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, > YARN-2786-20141103-1-full.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195317#comment-14195317 ] Wangda Tan commented on YARN-2800: -- [~ozawa], Moving it to RMStateStore is not a bad idea, since node label store itself can be treated as a part of RM's state. However, the RMStateStore is hard coded can only use one storage behind, which I'm a little concern about. I'm not quite agree with ZK can handle it well, since we shouldn't assume this feature won't be used in a large cluster nor high frequency updating labels. Node labels updating is different from RMStateStore updating, client side can change all labels of nodes (like 10k nodes) in one command, but there cannot be 10k application completed in short period (say around seconds) at least for now. WAL based solution may be outperform in such scenario, I think ZK is not a good back end for WAL storage. > Should print WARN log in both RM/RMAdminCLI side when > MemoryRMNodeLabelsManager is enabled > -- > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1680: -- Assignee: Craig Welch (was: Chen He) > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Craig Welch > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195313#comment-14195313 ] Chen He commented on YARN-1680: --- Hi, [~cwelch], I just assigned to you. I am busy dealing moving stuff and may not have time to work on this for a short time. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Craig Welch > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2690: Attachment: YARN-2690.004.patch Fixed javac warning. That was some preexisting code unchanged by the patch. > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch, > YARN-2690.004.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195288#comment-14195288 ] Craig Welch commented on YARN-1680: --- Hey [~airbots] any luck on this? If you're too busy to get to it, mind if I take it on? > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195291#comment-14195291 ] Hadoop QA commented on YARN-2505: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679036/YARN-2505.14.patch against trunk revision 237890f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5704//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5704//console This message is automatically generated. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, > YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, > YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials
[ https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195277#comment-14195277 ] Tsuyoshi OZAWA commented on YARN-2794: -- Or, just put new HashMap via setSystemCrendentialsForApps. > Fix log msgs about distributing system-credentials > --- > > Key: YARN-2794 > URL: https://issues.apache.org/jira/browse/YARN-2794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2794.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials
[ https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195273#comment-14195273 ] Tsuyoshi OZAWA commented on YARN-2794: -- [~jianhe], oops, I got it. The updating about systemCredentials is done only via setSystemCredentials. Then, your solution is enough. One minor nits: TestLogAggregationService#testAddNewTokenSentFromRMForLogAggregation calls {{this.context.getSystemCredentialsForApps().put(application1, credentials);}}. We should use ConcurrentHashMap for the test case. > Fix log msgs about distributing system-credentials > --- > > Key: YARN-2794 > URL: https://issues.apache.org/jira/browse/YARN-2794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2794.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2010) Handle app-recovery failures gracefully
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2010: --- Attachment: yarn-2010-10.patch Updated patch to address review comments. > Handle app-recovery failures gracefully > --- > > Key: YARN-2010 > URL: https://issues.apache.org/jira/browse/YARN-2010 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: YARN-2010.1.patch, YARN-2010.patch, > issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, > yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, > yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch > > > Sometimes, the RM fails to recover an application. It could be because of > turning security on, token expiry, or issues connecting to HDFS etc. The > causes could be classified into (1) transient, (2) specific to one > application, and (3) permanent and apply to multiple (all) applications. > Today, the RM fails to transition to Active and ends up in STOPPED state and > can never be transitioned to Active again. > The initial stacktrace reported is at > https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials
[ https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195258#comment-14195258 ] Tsuyoshi OZAWA commented on YARN-2794: -- [~jianhe], thanks for taking this JIRA. Shouldn't we use ConcurrentHashMap? IIUC, making the variable volatile for this case is not enough to synchronize. Please correct me if I'm wrong. > Fix log msgs about distributing system-credentials > --- > > Key: YARN-2794 > URL: https://issues.apache.org/jira/browse/YARN-2794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2794.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: YARN-2802.000.patch > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
zhihai xu created YARN-2802: --- Summary: add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue. Key: YARN-2802 URL: https://issues.apache.org/jira/browse/YARN-2802 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue. Added two metrics in QueueMetrics: aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. aMRegisterDelay: the time waiting from receiving event RMAppAttemptEventType.LAUNCHED to receiving event RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195220#comment-14195220 ] Tsuyoshi OZAWA commented on YARN-2800: -- [~leftnoteasy], Thanks for your clarification. Essentially, the configurations about labels are a part of RM's state. IMHO, we should move the essential configuration onto RMStateStore to prevent the mismatch ideally. I think ZK can handle it since frequency of updating labels is not so high and number of labels are not so large. cc: [~jianhe], [~kkambatl], what do you think? > Should print WARN log in both RM/RMAdminCLI side when > MemoryRMNodeLabelsManager is enabled > -- > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195204#comment-14195204 ] Wangda Tan commented on YARN-2786: -- [~cwelch], Thanks for comments, bq. bin/yarn drop s Addressed bq. listLables should be listNodeLabels ... Addressed bq. Can't we use the visible for test annotation? Addressed bq. test is still using the node-labels command instead of cluster ... Oh, my bad, I forgot to change that, the field will not be ignored in Java side, so the both of test case or actual using "yarn cluster ..." will be succeeded. Will upload a patch soon. Thanks, Wangda > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster
[ https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195174#comment-14195174 ] Wangda Tan commented on YARN-2795: -- Thanks Vinod's review and commit! > Resource Manager fails startup with HDFS label storage and secure cluster > - > > Key: YARN-2795 > URL: https://issues.apache.org/jira/browse/YARN-2795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Phil D'Amore >Assignee: Wangda Tan > Fix For: 2.6.0 > > Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, > YARN-2795-20141102-2.patch > > > When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to > a hdfs:// path, and the cluster is using kerberos, the RM fails to start > while trying to unmarshal the label store. The following error/stack trace > is observed: > {code} > 2014-10-31 11:55:53,807 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service o > rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state > INITED; cause: java.io.IOExcepti > on: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate faile > d [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tg > t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination > hos > t is: "host.running.nn":8020; > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: G > SS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to fin > d any Kerberos tgt)]; Host Details : local host is: > "host.running.rm/10.0.0.34" > ; destination host is: "host.running.nn":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt > ocolTranslatorPB.java:539) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187 > ) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731) > at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:206) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceInit(CommonNodeLabelsManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.serviceInit(RMNodeLabelsManager.java:62) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:547) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:986) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:245) > at > org.apache.hadoop.service
[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster
[ https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195170#comment-14195170 ] Hudson commented on YARN-2795: -- FAILURE: Integrated in Hadoop-trunk-Commit #6429 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6429/]) YARN-2795. Fixed ResourceManager to not crash loading node-label data from HDFS in secure mode. Contributed by Wangda Tan. (vinodkv: rev ec6cbece8e7772868ce8ad996135d3136bd32245) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java > Resource Manager fails startup with HDFS label storage and secure cluster > - > > Key: YARN-2795 > URL: https://issues.apache.org/jira/browse/YARN-2795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Phil D'Amore >Assignee: Wangda Tan > Fix For: 2.6.0 > > Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, > YARN-2795-20141102-2.patch > > > When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to > a hdfs:// path, and the cluster is using kerberos, the RM fails to start > while trying to unmarshal the label store. The following error/stack trace > is observed: > {code} > 2014-10-31 11:55:53,807 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service o > rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state > INITED; cause: java.io.IOExcepti > on: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate faile > d [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tg > t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination > hos > t is: "host.running.nn":8020; > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: G > SS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to fin > d any Kerberos tgt)]; Host Details : local host is: > "host.running.rm/10.0.0.34" > ; destination host is: "host.running.nn":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt > ocolTranslatorPB.java:539) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187 > ) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731) > at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(C
[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195154#comment-14195154 ] Wangda Tan commented on YARN-2800: -- Hi [~ozawa], bq. Let me clarify this case - do you mean RM will fail to allocate containers on labeled nodes after RM restart since RM uses MemoryRMNodeLabelsManager and forget the mapping of node-to-labels? Not exactly, actually the RM will fail to start, because we have accessible-node-labels in queues, and when CS initialization, we will check if such labels existed in node labels manager. Upon mem-based RMNodelabelsManager and RM restart, CS cannot find labels from node labels manager, so RM will fail to start entirely. I'm agree about what you mentioned about it may confuse people since admin may configured it properly in RM side, and it will be annoying every time run such command in client side. But I think it is still important to let the client know about this. Of course we can add it in RM web UI, but user may still not check it -- not all user will check cluster metrics UI :). So I think we can drop logging in RM admin CLI part and change the RMAdmin PB responses in a separated task, which will return the actual RMNodeLabelsManager being used in RM side. And we can log the WARN properly. Do you have any other ideas? Thanks, Wangda > Should print WARN log in both RM/RMAdminCLI side when > MemoryRMNodeLabelsManager is enabled > -- > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster
[ https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195153#comment-14195153 ] Vinod Kumar Vavilapalli commented on YARN-2795: --- Tx for the update, Wangda. Checking this in. > Resource Manager fails startup with HDFS label storage and secure cluster > - > > Key: YARN-2795 > URL: https://issues.apache.org/jira/browse/YARN-2795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Phil D'Amore >Assignee: Wangda Tan > Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, > YARN-2795-20141102-2.patch > > > When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to > a hdfs:// path, and the cluster is using kerberos, the RM fails to start > while trying to unmarshal the label store. The following error/stack trace > is observed: > {code} > 2014-10-31 11:55:53,807 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service o > rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state > INITED; cause: java.io.IOExcepti > on: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate faile > d [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tg > t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination > hos > t is: "host.running.nn":8020; > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: G > SS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to fin > d any Kerberos tgt)]; Host Details : local host is: > "host.running.rm/10.0.0.34" > ; destination host is: "host.running.nn":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt > ocolTranslatorPB.java:539) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187 > ) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731) > at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:206) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceInit(CommonNodeLabelsManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.serviceInit(RMNodeLabelsManager.java:62) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:547) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:986) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:245) > at > org.apache.hadoop.
[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2505: -- Attachment: YARN-2505.14.patch TestFairScheduler passes on my box - and the change should not have any impact on it anyway - reuploading patch to trigger another go > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, > YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, > YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, > YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195130#comment-14195130 ] Tsuyoshi OZAWA commented on YARN-2800: -- [~leftnoteasy], thanks for your comments. {quote} + + "this message is based on the yarn-site.xml settings " + + "in the machine you run \"yarn rmadmin ...\", if you " + + "already edited the field in yarn-site.xml of the node " + + "running RM, please ignore this message."; {quote} I think printing the message based on client-side configuration can confuse the user - it can be different from RM-side. Every users doesn't have a copy of RM-side configuration and some users doesn't know the contents of RM-side configuration. {quote} But if user configured mem-based node labels manager, user may add labels to queue configurations, when RM will be failed to launch (specifically, CS cannot initialize) if a queue use a label but not existed in node labels manager {quote} Let me clarify this case - do you mean RM will fail to allocate containers on labeled nodes after RM restart since RM uses MemoryRMNodeLabelsManager and forget the mapping of node-to-labels? In this case, I think we should arise the warnings to submitter of yarn apps like "application cannot be submitted for now since no node has the required label" after restart. It's more straight forward because users can notice the mistake of configurations of labels. So I think it's better way to log the warning at startup once and add the information to Web UI for the consistency of the information. What do you think? > Should print WARN log in both RM/RMAdminCLI side when > MemoryRMNodeLabelsManager is enabled > -- > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195131#comment-14195131 ] Craig Welch commented on YARN-2786: --- The "node" command isn't a good fit for this aspect of node-labels, as it is not an operation or query on nodes as such, but on the set of node labels recognized by the cluster. If we don't want to tie it to the resource manager (not sure we can't, but it sounds as though we want to keep it distinct) then we need something new. I actually preferred the original "node-labels" command, but "cluster" is ok if we believe that other things will come along in the future which fit this definition (and I could see that happen). Code items: bin/yarn prints cluster informations - information is singular and plural, you can drop the s ClusterCLI.java listLables should be listNodeLabels (we've gone to that everywhere b/c there will likely be other kinds of labels, we should stay consistent, especially as "cluster" cmd name has lost any notion of "nodelabelness") //Make it protected to make unit test can change it Can't we use the visible for test annotation? It looks like the test is still using the node-labels command instead of cluster, did something go wrong with the patch (maybe forgot to restage)? Can you make sure the unit test + patch code are consistent and the tests pass? > Create yarn cluster CLI to enable list node labels collection > - > > Key: YARN-2786 > URL: https://issues.apache.org/jira/browse/YARN-2786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, > YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch > > > With YARN-2778, we can list node labels on existing RM nodes. But it is not > enough, we should be able to: > 1) list node labels collection > The command should start with "yarn cluster ...", in the future, we can add > more functionality to the "yarnClusterCLI" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2794) Fix log msgs about distributing system-credentials
[ https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2794: -- Attachment: YARN-2794.patch > Fix log msgs about distributing system-credentials > --- > > Key: YARN-2794 > URL: https://issues.apache.org/jira/browse/YARN-2794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2794.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials
[ https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195128#comment-14195128 ] Jian He commented on YARN-2794: --- Straight forward patch to fix the logs to debug level. bq. NMContext.systemCredentials will have concurrency issues done > Fix log msgs about distributing system-credentials > --- > > Key: YARN-2794 > URL: https://issues.apache.org/jira/browse/YARN-2794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2794.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2788) yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0
[ https://issues.apache.org/jira/browse/YARN-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195115#comment-14195115 ] Hudson commented on YARN-2788: -- FAILURE: Integrated in Hadoop-trunk-Commit #6427 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6427/]) YARN-2788. Fixed backwards compatiblity issues with log-aggregation feature that were caused when adding log-upload-time via YARN-2703. Contributed by Xuan Gong. (vinodkv: rev 58e9f24e0f06efede21085b7ffe36af042fa7b38) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java > yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0 > -- > > Key: YARN-2788 > URL: https://issues.apache.org/jira/browse/YARN-2788 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.6.0 >Reporter: Gopal V >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2788.1.1.patch, YARN-2788.1.patch, > YARN-2788.2.patch, YARN-2788.3.patch, YARN-2788.4.patch, YARN-2788.5.patch > > > Log format version needs to be upped between 2.4.0 and 2.6.0 > {code} > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:589) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$ContainerLogsReader.nextLog(AggregatedLogFormat.java:765) > at > org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.readContainerLogs(AggregatedLogsBlock.java:197) > at > org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:166) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:178) > ... 40 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195114#comment-14195114 ] Hudson commented on YARN-2703: -- FAILURE: Integrated in Hadoop-trunk-Commit #6427 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6427/]) YARN-2788. Fixed backwards compatiblity issues with log-aggregation feature that were caused when adding log-upload-time via YARN-2703. Contributed by Xuan Gong. (vinodkv: rev 58e9f24e0f06efede21085b7ffe36af042fa7b38) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java > Add logUploadedTime into LogValue for better display > > > Key: YARN-2703 > URL: https://issues.apache.org/jira/browse/YARN-2703 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, > YARN-2703.4.patch > > > Right now, the container can upload its logs multiple times. Sometimes, > containers write different logs into the same log file. After the log > aggregation, when we query those logs, it will show: > LogType: stderr > LogContext: > LogType: stdout > LogContext: > LogType: stderr > LogContext: > LogType: stdout > LogContext: > The same files could be displayed multiple times. But we can not figure out > which logs come first. We could add extra loguploadedTime to let users have > better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195101#comment-14195101 ] Hadoop QA commented on YARN-2505: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679010/YARN-2505.13.patch against trunk revision 67f13b5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5703//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5703//console This message is automatically generated. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.3.patch, YARN-2505.4.patch, > YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, > YARN-2505.9.patch, YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2788) yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0
[ https://issues.apache.org/jira/browse/YARN-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195098#comment-14195098 ] Vinod Kumar Vavilapalli commented on YARN-2788: --- Looks good, +1. Checking this in. > yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0 > -- > > Key: YARN-2788 > URL: https://issues.apache.org/jira/browse/YARN-2788 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.6.0 >Reporter: Gopal V >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-2788.1.1.patch, YARN-2788.1.patch, > YARN-2788.2.patch, YARN-2788.3.patch, YARN-2788.4.patch, YARN-2788.5.patch > > > Log format version needs to be upped between 2.4.0 and 2.6.0 > {code} > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:589) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$ContainerLogsReader.nextLog(AggregatedLogFormat.java:765) > at > org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.readContainerLogs(AggregatedLogsBlock.java:197) > at > org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:166) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:178) > ... 40 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster
[ https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195099#comment-14195099 ] Wangda Tan commented on YARN-2795: -- Just tried to test in a security enabled cluster, without this patch, RM will failed to start because we don't login before accessing HDFS. And with this patch, RM can successfully start with labels stored on HDFS. And tried to submit a MR job after start, it can also successfully completed as well. > Resource Manager fails startup with HDFS label storage and secure cluster > - > > Key: YARN-2795 > URL: https://issues.apache.org/jira/browse/YARN-2795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Phil D'Amore >Assignee: Wangda Tan > Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, > YARN-2795-20141102-2.patch > > > When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to > a hdfs:// path, and the cluster is using kerberos, the RM fails to start > while trying to unmarshal the label store. The following error/stack trace > is observed: > {code} > 2014-10-31 11:55:53,807 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service o > rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state > INITED; cause: java.io.IOExcepti > on: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate faile > d [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tg > t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination > hos > t is: "host.running.nn":8020; > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: G > SS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to fin > d any Kerberos tgt)]; Host Details : local host is: > "host.running.rm/10.0.0.34" > ; destination host is: "host.running.nn":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt > ocolTranslatorPB.java:539) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187 > ) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731) > at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866) > at > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:206) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceInit(CommonNodeLabelsManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.serviceInit(RMNodeLabelsManager.java:62) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:547) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.reso
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195069#comment-14195069 ] Hadoop QA commented on YARN-2690: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679005/YARN-2690.004.patch against trunk revision 67f13b5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1267 javac compiler warnings (more than the trunk's current 1266 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5702//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5702//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5702//console This message is automatically generated. > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer
[ https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195062#comment-14195062 ] Hudson commented on YARN-2798: -- FAILURE: Integrated in Hadoop-trunk-Commit #6426 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6426/]) YARN-2798. Fixed YarnClient to populate the renewer correctly for Timeline delegation tokens. Contributed by Zhijie Shen. (vinodkv: rev 71fbb474f531f60c5d908cf724f18f90dfd5fa9f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java > YarnClient doesn't need to translate Kerberos name of timeline DT renewer > - > > Key: YARN-2798 > URL: https://issues.apache.org/jira/browse/YARN-2798 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Arpit Gupta >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-2798.1.patch, YARN-2798.2.patch > > > Now YarnClient will automatically get a timeline DT when submitting an app in > a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get > the RM daemon operating system user. However, the RM principal and > auth_to_local may not be properly presented to the client, and the client > cannot translate the principal to the daemon user properly. On the other > hand, AbstractDelegationTokenIdentifier will do this translation when create > the token. However, since the client has already translated the full > principal into a short user name (which may not be correct), the server can > no longer apply the translation any more, where RM principal and > auth_to_local are always correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2730) DefaultContainerExecutor runs only one localizer at a time
[ https://issues.apache.org/jira/browse/YARN-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195052#comment-14195052 ] Hudson commented on YARN-2730: -- FAILURE: Integrated in Hadoop-trunk-Commit #6425 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6425/]) YARN-2730. DefaultContainerExecutor runs only one localizer at a time. Contributed by Siqi Li (jlowe: rev 6157ace5475fff8d2513fd3cd99134b532b0b406) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/CHANGES.txt > DefaultContainerExecutor runs only one localizer at a time > -- > > Key: YARN-2730 > URL: https://issues.apache.org/jira/browse/YARN-2730 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2730.v1.patch, YARN-2730.v2.patch, > YARN-2730.v3.patch > > > We are seeing that when one of the localizerRunner stuck, the rest of the > localizerRunners are blocked. We should remove the synchronized modifier. > The synchronized modifier appears to have been added by > https://issues.apache.org/jira/browse/MAPREDUCE-3537 > It could be removed if Localizer doesn't depend on current directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195048#comment-14195048 ] Hadoop QA commented on YARN-2505: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678993/YARN-2505.12.patch against trunk revision 67f13b5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5701//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5701//console This message is automatically generated. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.3.patch, YARN-2505.4.patch, > YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, > YARN-2505.9.patch, YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2730) DefaultContainerExecutor runs only one localizer at a time
[ https://issues.apache.org/jira/browse/YARN-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2730: - Summary: DefaultContainerExecutor runs only one localizer at a time (was: Only one localizer can run on a NodeManager at a time) +1 for the latest patch, committing this. > DefaultContainerExecutor runs only one localizer at a time > -- > > Key: YARN-2730 > URL: https://issues.apache.org/jira/browse/YARN-2730 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2730.v1.patch, YARN-2730.v2.patch, > YARN-2730.v3.patch > > > We are seeing that when one of the localizerRunner stuck, the rest of the > localizerRunners are blocked. We should remove the synchronized modifier. > The synchronized modifier appears to have been added by > https://issues.apache.org/jira/browse/MAPREDUCE-3537 > It could be removed if Localizer doesn't depend on current directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer
[ https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195038#comment-14195038 ] Vinod Kumar Vavilapalli commented on YARN-2798: --- Looks good now, +1. Checking this in. > YarnClient doesn't need to translate Kerberos name of timeline DT renewer > - > > Key: YARN-2798 > URL: https://issues.apache.org/jira/browse/YARN-2798 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Arpit Gupta >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-2798.1.patch, YARN-2798.2.patch > > > Now YarnClient will automatically get a timeline DT when submitting an app in > a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get > the RM daemon operating system user. However, the RM principal and > auth_to_local may not be properly presented to the client, and the client > cannot translate the principal to the daemon user properly. On the other > hand, AbstractDelegationTokenIdentifier will do this translation when create > the token. However, since the client has already translated the full > principal into a short user name (which may not be correct), the server can > no longer apply the translation any more, where RM principal and > auth_to_local are always correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer
[ https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194982#comment-14194982 ] Zhijie Shen edited comment on YARN-2798 at 11/3/14 8:03 PM: I don't have a quick setup for RM HA and secure cluster, but the mapping rule is applied every where in this cluster, I think it should work fine. In fact, this issue is not HA related problem. However, in general, if we want the DT renew to work across RMs, we have to run these RMs as the same operating user name. Otherwise, if DT renewer is set to yarn of RM1, and RM2 is run by yarn'. RM2 can no longer renew the DT. This is not applied just to timeline DT, but all the DTs that we assign RM to renew. Correct me if I'm wrong. was (Author: zjshen): I don't have a quick setup for RM HA and secure cluster, but the mapping rule is applied every where in this cluster, I think it should work fine. > YarnClient doesn't need to translate Kerberos name of timeline DT renewer > - > > Key: YARN-2798 > URL: https://issues.apache.org/jira/browse/YARN-2798 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Arpit Gupta >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-2798.1.patch, YARN-2798.2.patch > > > Now YarnClient will automatically get a timeline DT when submitting an app in > a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get > the RM daemon operating system user. However, the RM principal and > auth_to_local may not be properly presented to the client, and the client > cannot translate the principal to the daemon user properly. On the other > hand, AbstractDelegationTokenIdentifier will do this translation when create > the token. However, since the client has already translated the full > principal into a short user name (which may not be correct), the server can > no longer apply the translation any more, where RM principal and > auth_to_local are always correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2738: Attachment: YARN-2738.002.patch Thanks [~kasha] for the review 1. Removed TODO and opened YARN-2773. 2. Fixed [~subru] do you see any issues where not having per queue configuration settings would make it difficult for some scenarios? I can see that the max and avg for CapacityOverTimePolicy might the first thing that people may need to configure per queue. [~kasha] I would prefer either no queue configuration (other than the element that marks a queue for reservations) or all instead of a partial set. Would you agree? > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch, YARN-2738.002.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2505: -- Attachment: YARN-2505.13.patch Ok, implemented the changes we came up with (switching to /add /remove for cluster node label posts, changing post node-to-labels to post node-to-labels/replace) [~xgong] [~leftnoteasy] have a look pls. > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.3.patch, YARN-2505.4.patch, > YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, > YARN-2505.9.patch, YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer
[ https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194982#comment-14194982 ] Zhijie Shen commented on YARN-2798: --- I don't have a quick setup for RM HA and secure cluster, but the mapping rule is applied every where in this cluster, I think it should work fine. > YarnClient doesn't need to translate Kerberos name of timeline DT renewer > - > > Key: YARN-2798 > URL: https://issues.apache.org/jira/browse/YARN-2798 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Arpit Gupta >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-2798.1.patch, YARN-2798.2.patch > > > Now YarnClient will automatically get a timeline DT when submitting an app in > a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get > the RM daemon operating system user. However, the RM principal and > auth_to_local may not be properly presented to the client, and the client > cannot translate the principal to the daemon user properly. On the other > hand, AbstractDelegationTokenIdentifier will do this translation when create > the token. However, since the client has already translated the full > principal into a short user name (which may not be correct), the server can > no longer apply the translation any more, where RM principal and > auth_to_local are always correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2690: Attachment: YARN-2690.004.patch Fixed the javadoc warning because of typo Re 1. The information returned is specific to Scheduler queues so named it that way. Also i am introducing another Reservation Configuration class which needs to be distinguished from this. Fixed 2. and 3. > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally
[ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194966#comment-14194966 ] Hadoop QA commented on YARN-1922: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678987/YARN-1922.6.patch against trunk revision 67f13b5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5700//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5700//console This message is automatically generated. > Process group remains alive after container process is killed externally > > > Key: YARN-1922 > URL: https://issues.apache.org/jira/browse/YARN-1922 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 > Environment: CentOS 6.4 >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, > YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch > > > If the main container process is killed externally, ContainerLaunch does not > kill the rest of the process group. Before sending the event that results in > the ContainerLaunch.containerCleanup method being called, ContainerLaunch > sets the "completed" flag to true. Then when cleaning up, it doesn't try to > read the pid file if the completed flag is true. If it read the pid file, it > would proceed to send the container a kill signal. In the case of the > DefaultContainerExecutor, this would kill the process group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194963#comment-14194963 ] Wangda Tan commented on YARN-2505: -- Offline discussed with [~cwelch], some suggestions: - We should have replaceLabelsOnNode as we have it in RMAdminCLI - We should have removeFromClusterNodeLabels as we have addToClusterNodeLabels - Suggest to make replaceLabelsOnNode URL as : .../node-to-labels/replace and using POST, in the future we can have ../node-to-labels/remove(add) - Suggest to make remove/add To/From ClusterNodeLabels as : .../node-labels/add(remove) and using POST, to make it consistent with replace/add/removeLabelsOnNode APIs. Thanks, Wangda > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, > YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, > YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2730) Only one localizer can run on a NodeManager at a time
[ https://issues.apache.org/jira/browse/YARN-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194947#comment-14194947 ] Hadoop QA commented on YARN-2730: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678985/YARN-2730.v3.patch against trunk revision 67f13b5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5699//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5699//console This message is automatically generated. > Only one localizer can run on a NodeManager at a time > - > > Key: YARN-2730 > URL: https://issues.apache.org/jira/browse/YARN-2730 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2730.v1.patch, YARN-2730.v2.patch, > YARN-2730.v3.patch > > > We are seeing that when one of the localizerRunner stuck, the rest of the > localizerRunners are blocked. We should remove the synchronized modifier. > The synchronized modifier appears to have been added by > https://issues.apache.org/jira/browse/MAPREDUCE-3537 > It could be removed if Localizer doesn't depend on current directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer
[ https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194945#comment-14194945 ] Jian He commented on YARN-2798: --- Can you also check if it works for RM HA where two RMs sit on different host? I checked, it should work. as long as two RMs use the same mapping rule. > YarnClient doesn't need to translate Kerberos name of timeline DT renewer > - > > Key: YARN-2798 > URL: https://issues.apache.org/jira/browse/YARN-2798 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Arpit Gupta >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-2798.1.patch, YARN-2798.2.patch > > > Now YarnClient will automatically get a timeline DT when submitting an app in > a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get > the RM daemon operating system user. However, the RM principal and > auth_to_local may not be properly presented to the client, and the client > cannot translate the principal to the daemon user properly. On the other > hand, AbstractDelegationTokenIdentifier will do this translation when create > the token. However, since the client has already translated the full > principal into a short user name (which may not be correct), the server can > no longer apply the translation any more, where RM principal and > auth_to_local are always correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2505: -- Attachment: YARN-2505.12.patch At [~leftnoteasy] 's recommendation, switch the bulk operation for node-to-labels to a "replace" instead of "add", as this is what we plan to do from the cli. [~xgong] can you have a look? [~vinodkv] as well? > Support get/add/remove/change labels in RM REST API > --- > > Key: YARN-2505 > URL: https://issues.apache.org/jira/browse/YARN-2505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Craig Welch > Attachments: YARN-2505.1.patch, YARN-2505.11.patch, > YARN-2505.12.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, > YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, > YARN-2505.9.patch, YARN-2505.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection
[ https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194893#comment-14194893 ] Anubhav Dhoot commented on YARN-2735: - This looks like a trivial patch that should be okay without tests. LGTM > diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are > initialized twice in DirectoryCollection > --- > > Key: YARN-2735 > URL: https://issues.apache.org/jira/browse/YARN-2735 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Attachments: YARN-2735.000.patch > > > diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are > initialized twice in DirectoryCollection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1922) Process group remains alive after container process is killed externally
[ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1922: - Attachment: YARN-1922.6.patch Attaching a new patch. Instead of using do/while (!completed.get()), this patch simply uses while(true), so that it always loops until the pid file appears or the maxKillWaitTime elapses. [~vinodkv], does this address your concerns? > Process group remains alive after container process is killed externally > > > Key: YARN-1922 > URL: https://issues.apache.org/jira/browse/YARN-1922 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 > Environment: CentOS 6.4 >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, > YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch > > > If the main container process is killed externally, ContainerLaunch does not > kill the rest of the process group. Before sending the event that results in > the ContainerLaunch.containerCleanup method being called, ContainerLaunch > sets the "completed" flag to true. Then when cleaning up, it doesn't try to > read the pid file if the completed flag is true. If it read the pid file, it > would proceed to send the container a kill signal. In the case of the > DefaultContainerExecutor, this would kill the process group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)