[jira] [Commented] (YARN-2529) Generic history service RPC interface doesn't work when service authorization is enabled
[ https://issues.apache.org/jira/browse/YARN-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131147#comment-14131147 ] Hadoop QA commented on YARN-2529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668266/YARN-2529.2.patch against trunk revision 5633da2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4912//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4912//console This message is automatically generated. Generic history service RPC interface doesn't work when service authorization is enabled Key: YARN-2529 URL: https://issues.apache.org/jira/browse/YARN-2529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2529.1.patch, YARN-2529.2.patch Here's the problem shown in the log: {code} 14/09/10 10:42:44 INFO ipc.Server: Connection from 10.22.2.109:55439 for protocol org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB is unauthorized for user zshen (auth:SIMPLE) 14/09/10 10:42:44 INFO ipc.Server: Socket Reader #1 for port 10200: readAndProcess from client 10.22.2.109 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: Protocol interface org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB is not known.] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2542: -- Attachment: YARN-2542.1.patch Upload a patch to fix the bug. yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2542: -- Attachment: (was: YARN-2542.2.patch) yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2542: -- Attachment: YARN-2542.2.patch Upload a new patch: 1. Fix 80 char line limit. 2. Update the comment. The long term fix is to record the attempt's resource usage as well. Will file a ticket to trace the issue. yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2542: -- Attachment: YARN-2542.2.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch, YARN-2542.2.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131188#comment-14131188 ] Hadoop QA commented on YARN-2542: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668280/YARN-2542.1.patch against trunk revision 469ea3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4913//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4913//console This message is automatically generated. yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch, YARN-2542.2.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2543) Resource usage should be published to the timeline server as well
Zhijie Shen created YARN-2543: - Summary: Resource usage should be published to the timeline server as well Key: YARN-2543 URL: https://issues.apache.org/jira/browse/YARN-2543 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen RM will include the resource usage in the app report, but generic history service doesn't, because RM doesn't publish this data to the timeline server -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131219#comment-14131219 ] Hadoop QA commented on YARN-2542: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668289/YARN-2542.2.patch against trunk revision 469ea3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4914//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4914//console This message is automatically generated. yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch, YARN-2542.2.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2452: Attachment: YARN-2452.002.patch TestRMApplicationHistoryWriter is failed for FairScheduler -- Key: YARN-2452 URL: https://issues.apache.org/jira/browse/YARN-2452 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2452.000.patch, YARN-2452.001.patch, YARN-2452.002.patch TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the following: T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 66.261 sec FAILURE! java.lang.AssertionError: expected:1 but was:200 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131259#comment-14131259 ] zhihai xu commented on YARN-2452: - I uploaded a new patch YARN-2452.002.patch which use FairSchedulerConfiguration.ASSIGN_MULTIPLE and make FairSchedulerConfiguration.ASSIGN_MULTIPLE public. Please review it. thanks TestRMApplicationHistoryWriter is failed for FairScheduler -- Key: YARN-2452 URL: https://issues.apache.org/jira/browse/YARN-2452 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2452.000.patch, YARN-2452.001.patch, YARN-2452.002.patch TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the following: T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 66.261 sec FAILURE! java.lang.AssertionError: expected:1 but was:200 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2452) TestRMApplicationHistoryWriter is failed for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131317#comment-14131317 ] Hadoop QA commented on YARN-2452: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668301/YARN-2452.002.patch against trunk revision 469ea3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4915//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4915//console This message is automatically generated. TestRMApplicationHistoryWriter is failed for FairScheduler -- Key: YARN-2452 URL: https://issues.apache.org/jira/browse/YARN-2452 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2452.000.patch, YARN-2452.001.patch, YARN-2452.002.patch TestRMApplicationHistoryWriter is failed for FairScheduler. The failure is the following: T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 69.311 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter) Time elapsed: 66.261 sec FAILURE! java.lang.AssertionError: expected:1 but was:200 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430) at org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)
Wangda Tan created YARN-2544: Summary: [YARN-796] Common server side PB changes (not include user API PB changes) Key: YARN-2544 URL: https://issues.apache.org/jira/browse/YARN-2544 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED
Hong Zhiguo created YARN-2545: - Summary: RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED Key: YARN-2545 URL: https://issues.apache.org/jira/browse/YARN-2545 Project: Hadoop YARN Issue Type: Bug Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, and then exits, the corresponding RMApp and RMAppAttempt transit to state FINISHED. I think this is wrong and confusing. On RM WebUI, this application is displayed as State=FINISHED, FinalStatus=FAILED, and is counted as Apps Completed, not as Apps Failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2538) Add logs when RM send new AMRMToken to ApplicationMaster
[ https://issues.apache.org/jira/browse/YARN-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131404#comment-14131404 ] Hudson commented on YARN-2538: -- FAILURE: Integrated in Hadoop-Yarn-trunk #678 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/678/]) YARN-2538. Added logs when RM sends roll-overed AMRMToken to AM. Contributed by Xuan Gong. (zjshen: rev 469ea3dcef6e427d02fd08b859b2789cc25189f9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/CHANGES.txt Add logs when RM send new AMRMToken to ApplicationMaster Key: YARN-2538 URL: https://issues.apache.org/jira/browse/YARN-2538 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2538.1.patch, YARN-2538.1.patch This is for testing/debugging purpose -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2541) Fix ResourceManagerRest.apt.vm syntax error
[ https://issues.apache.org/jira/browse/YARN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131399#comment-14131399 ] Hudson commented on YARN-2541: -- FAILURE: Integrated in Hadoop-Yarn-trunk #678 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/678/]) YARN-2541. Fixed ResourceManagerRest.apt.vm table syntax error. Contributed by Jian He (jianhe: rev 5633da2a018efcfac03cc1dd65af79bce2f1a11b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm * hadoop-yarn-project/CHANGES.txt Fix ResourceManagerRest.apt.vm syntax error --- Key: YARN-2541 URL: https://issues.apache.org/jira/browse/YARN-2541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2541.1.patch the incorrect table syntax somehow causes hadoop-yarn-site intermittent build failure as in https://jira.codehaus.org/browse/DOXIA-453 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2033) Merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131403#comment-14131403 ] Hudson commented on YARN-2033: -- FAILURE: Integrated in Hadoop-Yarn-trunk #678 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/678/]) YARN-2033. Merging generic-history into the Timeline Store (Contributed by Zhijie Shen) (junping_du: rev 6b8b1608e64e300e4e1d23c60476febaca29ca38) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerFinishedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/AppAttemptMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java *
[jira] [Commented] (YARN-2534) FairScheduler: Potential integer overflow calculating totalMaxShare
[ https://issues.apache.org/jira/browse/YARN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131394#comment-14131394 ] Hudson commented on YARN-2534: -- FAILURE: Integrated in Hadoop-Yarn-trunk #678 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/678/]) YARN-2534. FairScheduler: Potential integer overflow calculating totalMaxShare. (Zhihai Xu via kasha) (kasha: rev c11ada5ea6d17321626e5a9a4152ff857d03aee2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: Potential integer overflow calculating totalMaxShare --- Key: YARN-2534 URL: https://issues.apache.org/jira/browse/YARN-2534 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.6.0 Attachments: YARN-2534.000.patch FairScheduler: totalMaxShare is not calculated correctly in computeSharesInternal for some cases. If the sum of MAX share of all Schedulables is more than Integer.MAX_VALUE ,but each individual MAX share is not equal to Integer.MAX_VALUE. then totalMaxShare will be a negative value, which will cause all fairShare are wrongly calculated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2546) REST API for application creation/submission is using strings for numeric boolean values
Doug Haigh created YARN-2546: Summary: REST API for application creation/submission is using strings for numeric boolean values Key: YARN-2546 URL: https://issues.apache.org/jira/browse/YARN-2546 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.5.1 Reporter: Doug Haigh When YARN responds with or accepts JSON, numbers booleans are being represented as strings which can cause parsing problems. Resource values look like { application-id:application_1404198295326_0001, maximum-resource-capability: { memory:8192, vCores:32 } } Instead of { application-id:application_1404198295326_0001, maximum-resource-capability: { memory:8192, vCores:32 } } When I POST to start a job, numeric values are represented as numbers: local-resources: { entry: [ { key:AppMaster.jar, value: { resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar, type:FILE, visibility:APPLICATION, size: 43004, timestamp: 1405452071209 } } ] }, Instead of local-resources: { entry: [ { key:AppMaster.jar, value: { resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar, type:FILE, visibility:APPLICATION, size: 43004, timestamp: 1405452071209 } } ] }, Similarly, Boolean values are also represented as strings: keep-containers-across-application-attempts:false Instead of keep-containers-across-application-attempts:false -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131522#comment-14131522 ] Eric Payne commented on YARN-415: - I would like to express my thanks to [~aklochkov], [~jianhe], [~leftnoteasy], [~kkambatl], [~sandyr], and [~jlowe]. It was a team effort, and I appreciate all of the great help you have given on this feature. Capture aggregate memory allocation at the app-level for chargeback --- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.5.0 Reporter: Kendall Thrapp Assignee: Eric Payne Fix For: 2.6.0 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.201408181938.txt, YARN-415.201408212033.txt, YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.201409102216.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2534) FairScheduler: Potential integer overflow calculating totalMaxShare
[ https://issues.apache.org/jira/browse/YARN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131531#comment-14131531 ] Hudson commented on YARN-2534: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1894 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1894/]) YARN-2534. FairScheduler: Potential integer overflow calculating totalMaxShare. (Zhihai Xu via kasha) (kasha: rev c11ada5ea6d17321626e5a9a4152ff857d03aee2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: Potential integer overflow calculating totalMaxShare --- Key: YARN-2534 URL: https://issues.apache.org/jira/browse/YARN-2534 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.6.0 Attachments: YARN-2534.000.patch FairScheduler: totalMaxShare is not calculated correctly in computeSharesInternal for some cases. If the sum of MAX share of all Schedulables is more than Integer.MAX_VALUE ,but each individual MAX share is not equal to Integer.MAX_VALUE. then totalMaxShare will be a negative value, which will cause all fairShare are wrongly calculated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2033) Merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131540#comment-14131540 ] Hudson commented on YARN-2033: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1894 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1894/]) YARN-2033. Merging generic-history into the Timeline Store (Contributed by Zhijie Shen) (junping_du: rev 6b8b1608e64e300e4e1d23c60476febaca29ca38) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerFinishedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AppAttemptFinishedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/HtmlBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ContainerBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationFinishedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java *
[jira] [Commented] (YARN-2541) Fix ResourceManagerRest.apt.vm syntax error
[ https://issues.apache.org/jira/browse/YARN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131536#comment-14131536 ] Hudson commented on YARN-2541: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1894 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1894/]) YARN-2541. Fixed ResourceManagerRest.apt.vm table syntax error. Contributed by Jian He (jianhe: rev 5633da2a018efcfac03cc1dd65af79bce2f1a11b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm Fix ResourceManagerRest.apt.vm syntax error --- Key: YARN-2541 URL: https://issues.apache.org/jira/browse/YARN-2541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2541.1.patch the incorrect table syntax somehow causes hadoop-yarn-site intermittent build failure as in https://jira.codehaus.org/browse/DOXIA-453 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2538) Add logs when RM send new AMRMToken to ApplicationMaster
[ https://issues.apache.org/jira/browse/YARN-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131541#comment-14131541 ] Hudson commented on YARN-2538: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1894 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1894/]) YARN-2538. Added logs when RM sends roll-overed AMRMToken to AM. Contributed by Xuan Gong. (zjshen: rev 469ea3dcef6e427d02fd08b859b2789cc25189f9) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java Add logs when RM send new AMRMToken to ApplicationMaster Key: YARN-2538 URL: https://issues.apache.org/jira/browse/YARN-2538 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2538.1.patch, YARN-2538.1.patch This is for testing/debugging purpose -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2033) Merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131572#comment-14131572 ] Hudson commented on YARN-2033: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1869 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1869/]) YARN-2033. Merging generic-history into the Timeline Store (Contributed by Zhijie Shen) (junping_du: rev 6b8b1608e64e300e4e1d23c60476febaca29ca38) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AppAttemptFinishedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AppAttemptRegisteredEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/AppAttemptMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppAttemptInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ContainerMetricsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ApplicationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml *
[jira] [Commented] (YARN-2541) Fix ResourceManagerRest.apt.vm syntax error
[ https://issues.apache.org/jira/browse/YARN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131568#comment-14131568 ] Hudson commented on YARN-2541: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1869 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1869/]) YARN-2541. Fixed ResourceManagerRest.apt.vm table syntax error. Contributed by Jian He (jianhe: rev 5633da2a018efcfac03cc1dd65af79bce2f1a11b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm Fix ResourceManagerRest.apt.vm syntax error --- Key: YARN-2541 URL: https://issues.apache.org/jira/browse/YARN-2541 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2541.1.patch the incorrect table syntax somehow causes hadoop-yarn-site intermittent build failure as in https://jira.codehaus.org/browse/DOXIA-453 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2534) FairScheduler: Potential integer overflow calculating totalMaxShare
[ https://issues.apache.org/jira/browse/YARN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131563#comment-14131563 ] Hudson commented on YARN-2534: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1869 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1869/]) YARN-2534. FairScheduler: Potential integer overflow calculating totalMaxShare. (Zhihai Xu via kasha) (kasha: rev c11ada5ea6d17321626e5a9a4152ff857d03aee2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: Potential integer overflow calculating totalMaxShare --- Key: YARN-2534 URL: https://issues.apache.org/jira/browse/YARN-2534 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.6.0 Attachments: YARN-2534.000.patch FairScheduler: totalMaxShare is not calculated correctly in computeSharesInternal for some cases. If the sum of MAX share of all Schedulables is more than Integer.MAX_VALUE ,but each individual MAX share is not equal to Integer.MAX_VALUE. then totalMaxShare will be a negative value, which will cause all fairShare are wrongly calculated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2484) FileSystemRMStateStore#readFile/writeFile should close FSData(In|Out)putStream in final block
[ https://issues.apache.org/jira/browse/YARN-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131577#comment-14131577 ] Jason Lowe commented on YARN-2484: -- +1 lgtm. Committing this. FileSystemRMStateStore#readFile/writeFile should close FSData(In|Out)putStream in final block - Key: YARN-2484 URL: https://issues.apache.org/jira/browse/YARN-2484 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Priority: Trivial Attachments: YARN-2484.1.patch, YARN-2484.2.patch File descriptors can leak if exceptions are thrown in these methods. {code} private byte[] readFile(Path inputPath, long len) throws Exception { FSDataInputStream fsIn = fs.open(inputPath); // state data will not be that long byte[] data = new byte[(int)len]; fsIn.readFully(data); fsIn.close(); return data; } {code} {code} private void writeFile(Path outputPath, byte[] data) throws Exception { Path tempPath = new Path(outputPath.getParent(), outputPath.getName() + .tmp); FSDataOutputStream fsOut = null; // This file will be overwritten when app/attempt finishes for saving the // final status. fsOut = fs.create(tempPath, true); fsOut.write(data); fsOut.close(); fs.rename(tempPath, outputPath); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
Jonathan Eagles created YARN-2547: - Summary: Cross Origin Filter throws UnsupportedOperationException upon destroy Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131581#comment-14131581 ] Krisztian Horvath commented on YARN-1964: - Will containers be able to communicate with each other, e.g with slider I can run HBase inside containers. Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned YARN-2547: --- Assignee: Mit Desai Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2494) [YARN-796] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2494: - Attachment: YARN-2494.patch [YARN-796] Node label manager API and storage implementations - Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2498: - Attachment: YARN-2498.patch [YARN-796] Respect labels in preemption policy of capacity scheduler Key: YARN-2498 URL: https://issues.apache.org/jira/browse/YARN-2498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2498.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2496: - Attachment: YARN-2496.patch [YARN-796] Changes for capacity scheduler to support allocate resource respect labels - Key: YARN-2496 URL: https://issues.apache.org/jira/browse/YARN-2496 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch This JIRA Includes: - Add/parse labels option to {{capacity-scheduler.xml}} similar to other options of queue like capacity/maximum-capacity, etc. - Include a default-label-expression option in queue config, if an app doesn't specify label-expression, default-label-expression of queue will be used. - Check if labels can be accessed by the queue when submit an app with labels-expression to queue or update ResourceRequest with label-expression - Check labels on NM when trying to allocate ResourceRequest on the NM with label-expression - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2502) [YARN-796] Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2502: - Attachment: YARN-2502.patch [YARN-796] Changes in distributed shell to support specify labels - Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2503) [YARN-796] Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2503: - Attachment: YARN-2503.patch [YARN-796] Changes in RM Web UI to better show labels to end users -- Key: YARN-2503 URL: https://issues.apache.org/jira/browse/YARN-2503 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2503.patch Include but not limited to: - Show labels of nodes in RM/nodes page - Show labels of queue in RM/scheduler page - Warn user/admin if capacity of queue cannot be guaranteed according to mis config of labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2502) [YARN-796] Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2502: - Attachment: (was: YARN-2502.patch) [YARN-796] Changes in distributed shell to support specify labels - Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131616#comment-14131616 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668345/YARN-2496.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4918//console This message is automatically generated. [YARN-796] Changes for capacity scheduler to support allocate resource respect labels - Key: YARN-2496 URL: https://issues.apache.org/jira/browse/YARN-2496 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch This JIRA Includes: - Add/parse labels option to {{capacity-scheduler.xml}} similar to other options of queue like capacity/maximum-capacity, etc. - Include a default-label-expression option in queue config, if an app doesn't specify label-expression, default-label-expression of queue will be used. - Check if labels can be accessed by the queue when submit an app with labels-expression to queue or update ResourceRequest with label-expression - Check labels on NM when trying to allocate ResourceRequest on the NM with label-expression - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2494) [YARN-796] Node label manager API and storage implementations
[ https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131615#comment-14131615 ] Hadoop QA commented on YARN-2494: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668344/YARN-2494.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4916//console This message is automatically generated. [YARN-796] Node label manager API and storage implementations - Key: YARN-2494 URL: https://issues.apache.org/jira/browse/YARN-2494 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch This JIRA includes APIs and storage implementations of node label manager, NodeLabelManager is an abstract class used to manage labels of nodes in the cluster, it has APIs to query/modify - Nodes according to given label - Labels according to given hostname - Add/remove labels - Set labels of nodes in the cluster - Persist/recover changes of labels/labels-on-nodes to/from storage And it has two implementations to store modifications - Memory based storage: It will not persist changes, so all labels will be lost when RM restart - FileSystem based storage: It will persist/recover to/from FileSystem (like HDFS), and all labels and labels-on-nodes will be recovered upon RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131626#comment-14131626 ] Wangda Tan commented on YARN-796: - Split and updated all existing patches for YARN-796 against latest trunk, patch dependencies: {code} YARN-2493;YARN-2544 | \ YARN-2494 YARN-2501;YARN-2502 | YARN-2500 | YARN-2596 / | \ YARN-2598 YARN-2504 YARN-2505 | YARN-2503 {code} Please kindly review. Thanks, Wangda Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.consolidate.2.patch, YARN-796.node-label.consolidate.3.patch, YARN-796.node-label.consolidate.4.patch, YARN-796.node-label.demo.patch.1, YARN-796.patch, YARN-796.patch4 It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2501) [YARN-796] Changes in AMRMClient to support labels
[ https://issues.apache.org/jira/browse/YARN-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131630#comment-14131630 ] Hadoop QA commented on YARN-2501: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668348/YARN-2501.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4922//console This message is automatically generated. [YARN-796] Changes in AMRMClient to support labels -- Key: YARN-2501 URL: https://issues.apache.org/jira/browse/YARN-2501 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2501.patch Changes in AMRMClient to support labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2503) [YARN-796] Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131629#comment-14131629 ] Hadoop QA commented on YARN-2503: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668350/YARN-2503.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4919//console This message is automatically generated. [YARN-796] Changes in RM Web UI to better show labels to end users -- Key: YARN-2503 URL: https://issues.apache.org/jira/browse/YARN-2503 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2503.patch Include but not limited to: - Show labels of nodes in RM/nodes page - Show labels of queue in RM/scheduler page - Warn user/admin if capacity of queue cannot be guaranteed according to mis config of labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels
[ https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131634#comment-14131634 ] Hadoop QA commented on YARN-2500: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668347/YARN-2500.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4923//console This message is automatically generated. [YARN-796] Miscellaneous changes in ResourceManager to support labels - Key: YARN-2500 URL: https://issues.apache.org/jira/browse/YARN-2500 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2504) [YARN-796] Support get/add/remove/change labels in RM admin CLI
[ https://issues.apache.org/jira/browse/YARN-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131637#comment-14131637 ] Hadoop QA commented on YARN-2504: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668354/YARN-2504.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4924//console This message is automatically generated. [YARN-796] Support get/add/remove/change labels in RM admin CLI Key: YARN-2504 URL: https://issues.apache.org/jira/browse/YARN-2504 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2504.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: YARN-796.node-label.consolidate.5.patch Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.consolidate.2.patch, YARN-796.node-label.consolidate.3.patch, YARN-796.node-label.consolidate.4.patch, YARN-796.node-label.consolidate.5.patch, YARN-796.node-label.demo.patch.1, YARN-796.patch, YARN-796.patch4 It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131640#comment-14131640 ] Abin Shahab commented on YARN-1964: --- We have decided to create an umbrella issue to cover the integration between YARN and Docker(YARN-2466). This task(YARN-1964) has the following scope: 1) Launch docker containers from YARN with net=host mode. This will allow the container to take on the host's network, and therefore the YARN administrators will not need to set up special networking for docker. 2) Allow users to provide docker images through the job configuration. 3) Setup and user guides. The rest(secure hadoop, advanced networking) will be handled in other issues under YARN-2466. Please add your feedback to this plan on this jira. Thanks! Abin Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)
[ https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131652#comment-14131652 ] Hadoop QA commented on YARN-2544: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668355/YARN-2544.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1329 javac compiler warnings (more than the trunk's current 1301 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.api.TestPBImplRecords The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.webapp.view.TestHtmlBlock {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4921//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4921//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4921//console This message is automatically generated. [YARN-796] Common server side PB changes (not include user API PB changes) -- Key: YARN-2544 URL: https://issues.apache.org/jira/browse/YARN-2544 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2544.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2502) [YARN-796] Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131653#comment-14131653 ] Hadoop QA commented on YARN-2502: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668353/YARN-2502.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1329 javac compiler warnings (more than the trunk's current 1301 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.api.TestPBImplRecords The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.webapp.view.TestHtmlBlock {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4920//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4920//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4920//console This message is automatically generated. [YARN-796] Changes in distributed shell to support specify labels - Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2534) FairScheduler: Potential integer overflow calculating totalMaxShare
[ https://issues.apache.org/jira/browse/YARN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131673#comment-14131673 ] zhihai xu commented on YARN-2534: - [~kasha], thanks to review and commit the patch. FairScheduler: Potential integer overflow calculating totalMaxShare --- Key: YARN-2534 URL: https://issues.apache.org/jira/browse/YARN-2534 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.6.0 Attachments: YARN-2534.000.patch FairScheduler: totalMaxShare is not calculated correctly in computeSharesInternal for some cases. If the sum of MAX share of all Schedulables is more than Integer.MAX_VALUE ,but each individual MAX share is not equal to Integer.MAX_VALUE. then totalMaxShare will be a negative value, which will cause all fairShare are wrongly calculated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2547: Attachment: YARN-2547.patch Attaching the patch. Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131682#comment-14131682 ] Mit Desai commented on YARN-2547: - Refining the patch. Will update shortly Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2547: Attachment: YARN-2547.patch Updated the patch. Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch, YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131700#comment-14131700 ] Chris Douglas commented on YARN-1710: - bq. I am not memoizing findEarliestTime, as it would only save one invocation (the others are on diff sets, or updated version of the same set) I'm confused. There are three invocations: {code} if (findEarliestTime(allocations.keySet()) earliestStart) { allocations.put(new ReservationInterval(earliestStart, findEarliestTime(allocations.keySet())), ZERO_RES); } ReservationAllocation capReservation = new InMemoryReservationAllocation(reservationId, contract, user, plan.getQueueName(), findEarliestTime(allocations.keySet()), findLatestTime(allocations.keySet()), allocations, plan.getResourceCalculator(), plan.getMinimumAllocation()); {code} Isn't earliest time is either the earliest in the set, or the interval this just added? Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131711#comment-14131711 ] Jonathan Eagles commented on YARN-2547: --- [~mitdesai], thanks for the quick fix posted. Main changes look good. Couple of minor things related to the test code. Instead testing for this one exception thrown which is an implementation detail. What we really want to test is that restart works init - destroy - init. That way the test conveys the functionality of the filter we are trying to ensure. Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch, YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131713#comment-14131713 ] Chris Douglas commented on YARN-2475: - +1, other than a couple very minor nits: * the new cstr accepting {{Clock}} can be package-private, with the no-arg cstr calling {{this(new UTCClock());}} (comment unnecessary, or replace with {{@VisibleForTesting}}) * The unit test could have a more descriptive name than {{test()}}, declare {{PlanningException}} in its throws clause instead of calling {{Assert::fail()}} on catching it, and not declare {{InterruptedException}} which it no longer throws Just a minor clarification: as this iterates over each instant of the plan, are others allowed to modify it? ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch, YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131719#comment-14131719 ] Hadoop QA commented on YARN-2547: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668374/YARN-2547.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4925//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4925//console This message is automatically generated. Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch, YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131727#comment-14131727 ] Chris Douglas commented on YARN-1709: - Thanks for the updates. Just a few minor tweaks, then I'm +1 * In checking the preconditions: {code} if (!readWriteLock.isWriteLockedByCurrentThread()) { return; } {code} The intent was to {{assert}} and crash, so tests against this code can detect violations if the code is modified. When assertions are disabled, the check is elided * Instead of two cstr that assign all the final fields, the no-arg should call the other * Instead of explicitly throwing {{ClassCastException}}, this should just attempt the cast. The cause is implicit, and doesn't require a custom error string Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster
[ https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131732#comment-14131732 ] Lohit Vijayarenu commented on YARN-2314: We hit same problem on one of our large cluster with more than 2.5K nodes. As a work around we ended up increasing container size to 6G for AM (and with pmem-vmem ratio of 2:1) we give away 12G of VM for AM container. From initial looks of this, there is no way to turn this behavior off via config, other than patching code, right? ContainerManagementProtocolProxy can create thousands of threads for a large cluster Key: YARN-2314 URL: https://issues.apache.org/jira/browse/YARN-2314 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Jason Lowe Priority: Critical Attachments: nmproxycachefix.prototype.patch ContainerManagementProtocolProxy has a cache of NM proxies, and the size of this cache is configurable. However the cache can grow far beyond the configured size when running on a large cluster and blow AM address/container limits. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2540: - Attachment: YARN-2540-v1.txt Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2542: -- Attachment: YARN-2542.3.patch Patch looks good, just added N/A string in case appUsage doesn't exist yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch, YARN-2542.2.patch, YARN-2542.3.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-1710: --- Attachment: YARN-1710.3.patch Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131772#comment-14131772 ] Carlo Curino commented on YARN-1710: I understand what you meant (after our brief chat). I addressed it in the version I just uploaded (v3). Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131773#comment-14131773 ] Ashwin Shankar commented on YARN-2540: -- Attached patch fixes this issue. The problem was that in FairScheduler, queue names are represented as fully qualified name(root.blah.blah) while the filtering logic in FairSchedulerPage.java filters based on a substring of the queue name. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed
[ https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131781#comment-14131781 ] Ashwin Shankar commented on YARN-2104: -- Created YARN-2540 and posted a patch to fix this issue in fair scheduler. Scheduler queue filter failed to work because index of queue column changed --- Key: YARN-2104 URL: https://issues.apache.org/jira/browse/YARN-2104 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2104.patch YARN-563 added, {code} + th(.type, Application Type”). {code} to application table, which makes queue’s column index from 3 to 4. And in scheduler page, queue’s column index is hard coded to 3 when filter application with queue’s name, {code} if (q == 'root') q = '';, else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';, $('#apps').dataTable().fnFilter(q, 3, true);, {code} So queue filter will not work for application page. Reproduce steps: (Thanks Bo Yang for pointing this) {code} 1) In default setup, there’s a default queue under root queue 2) Run an arbitrary application, you can find it in “Applications” page 3) Click “Default” queue in scheduler page 4) Click “Applications”, no application will show here 5) Click “Root” queue in scheduler page 6) Click “Applications”, application will show again {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131797#comment-14131797 ] Hadoop QA commented on YARN-1710: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668391/YARN-1710.3.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4926//console This message is automatically generated. Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2032: Attachment: (was: YARN-2032-091114.patch) Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Li Lu Attachments: YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2032: Attachment: YARN-2032-091114.patch Reapplied my patch to latest trunk branch locally several times, could not reproduce the javac failure. Re-upload a patch to see if this is a persistent failure. Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Li Lu Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2539) FairScheduler: Update the default value for maxAMShare
[ https://issues.apache.org/jira/browse/YARN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131818#comment-14131818 ] Ashwin Shankar commented on YARN-2539: -- Sounds good, thanks. FairScheduler: Update the default value for maxAMShare -- Key: YARN-2539 URL: https://issues.apache.org/jira/browse/YARN-2539 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2539-1.patch Currently, the maxAMShare per queue is -1 in default, which disables the AM share constraint. Change to 0.5f would be good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2032: Attachment: (was: YARN-2032-091114.patch) Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Li Lu Attachments: YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131830#comment-14131830 ] Hadoop QA commented on YARN-2542: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668390/YARN-2542.3.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.cli.TestYarnCLI {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4927//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4927//console This message is automatically generated. yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch, YARN-2542.2.patch, YARN-2542.3.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2032: Attachment: YARN-2032-091114.patch Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Li Lu Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2456) Possible livelock in CapacityScheduler when RM is recovering apps
[ https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2456: -- Attachment: YARN-2456.2.patch patch rebased Possible livelock in CapacityScheduler when RM is recovering apps - Key: YARN-2456 URL: https://issues.apache.org/jira/browse/YARN-2456 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2456.1.patch, YARN-2456.2.patch Consider this scenario: 1. RM is configured with a single queue and only one application can be active at a time. 2. Submit App1 which uses up the queue's whole capacity 3. Submit App2 which remains pending. 4. Restart RM. 5. App2 is recovered before App1, so App2 is added to the activeApplications list. Now App1 remains pending (because of max-active-app limit) 6. All containers of App1 are now recovered when NM registers, and use up the whole queue capacity again. 7. Since the queue is full, App2 cannot proceed to allocate AM container. 8. In the meanwhile, App1 cannot proceed to become active because of the max-active-app limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2547: Attachment: YARN-2547.patch Thanks for the feedback [~jeagles]. Uploading new patch with modified test Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch, YARN-2547.patch, YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1372: Attachment: YARN-1372.005.patch Fixed unit test failure Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1372.001.patch, YARN-1372.001.patch, YARN-1372.002_NMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131848#comment-14131848 ] Jonathan Eagles commented on YARN-2547: --- +1 pending QA comment. Thanks, Mit. Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch, YARN-2547.patch, YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2456) Possible livelock in CapacityScheduler when RM is recovering apps
[ https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131850#comment-14131850 ] Xuan Gong commented on YARN-2456: - +1 LGTM. Will commit this after Jenkins give +1 Possible livelock in CapacityScheduler when RM is recovering apps - Key: YARN-2456 URL: https://issues.apache.org/jira/browse/YARN-2456 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2456.1.patch, YARN-2456.2.patch Consider this scenario: 1. RM is configured with a single queue and only one application can be active at a time. 2. Submit App1 which uses up the queue's whole capacity 3. Submit App2 which remains pending. 4. Restart RM. 5. App2 is recovered before App1, so App2 is added to the activeApplications list. Now App1 remains pending (because of max-active-app limit) 6. All containers of App1 are now recovered when NM registers, and use up the whole queue capacity again. 7. Since the queue is full, App2 cannot proceed to allocate AM container. 8. In the meanwhile, App1 cannot proceed to become active because of the max-active-app limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131865#comment-14131865 ] Hadoop QA commented on YARN-1372: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668409/YARN-1372.005.patch against trunk revision 3122daa. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4931//console This message is automatically generated. Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1372.001.patch, YARN-1372.001.patch, YARN-1372.002_NMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2547) Cross Origin Filter throws UnsupportedOperationException upon destroy
[ https://issues.apache.org/jira/browse/YARN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131876#comment-14131876 ] Hadoop QA commented on YARN-2547: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668407/YARN-2547.patch against trunk revision 3122daa. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4932//console This message is automatically generated. Cross Origin Filter throws UnsupportedOperationException upon destroy - Key: YARN-2547 URL: https://issues.apache.org/jira/browse/YARN-2547 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2547.patch, YARN-2547.patch, YARN-2547.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131878#comment-14131878 ] Hadoop QA commented on YARN-2540: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668385/YARN-2540-v1.txt against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4928//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4928//console This message is automatically generated. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131881#comment-14131881 ] Hadoop QA commented on YARN-2032: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668402/YARN-2032-091114.patch against trunk revision 3122daa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.yarn.server.timeline.TestHBaseTimelineStoreUtil {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4930//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4930//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4930//console This message is automatically generated. Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Li Lu Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1372: Attachment: YARN-1372.005.patch Rebased patch Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1372.001.patch, YARN-1372.001.patch, YARN-1372.002_NMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131886#comment-14131886 ] Ashwin Shankar commented on YARN-2540: -- Didn't add unit tests since it was a cosmetic UI change. I verified the patch manually by running apps in multiple queues in 2-level queue hierarchy and checked if clicking on parent/leaf queues resulted in right filter set. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131892#comment-14131892 ] Xuan Gong commented on YARN-2468: - Did more investigations and offline discussions. It turns out this is a really hard problem. So, we decide to solve this step by step. For the first step, we will stick to the original proposal: change the log layout, create a directory (named as node id of the NM), under this directory, every time when AppLogAggregatorImpl starts to upload container logs; it will create a file (named as node_id + timestamp). This method will increase the number of log files, but it will work fine for a small cluster. For the next step, we need to find a better way to handle the logs more efficiently. We would like to aggregate all containers’ log (Those containers are belong to the same NM) in a single file. In that case, the total number of logs is bounded. But we need find more scalable way, other than TFile, to do it. Will open a separate ticket for this. Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2468: Attachment: YARN-2468.2.patch Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster
[ https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2314: - Attachment: disable-cm-proxy-cache.patch Yeah, I don't think there's a good way to fix this short of running a bigger container than necessary or patching the code. Attaching a patch we've been running with recently that disables the CM proxy cache completely and reinstates the fix from MAPREDUCE-. It's not an ideal fix but it effectively restores the behavior to what Hadoop 0.23 did which worked OK for us. ContainerManagementProtocolProxy can create thousands of threads for a large cluster Key: YARN-2314 URL: https://issues.apache.org/jira/browse/YARN-2314 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Jason Lowe Priority: Critical Attachments: disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch ContainerManagementProtocolProxy has a cache of NM proxies, and the size of this cache is configurable. However the cache can grow far beyond the configured size when running on a large cluster and blow AM address/container limits. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131908#comment-14131908 ] Sandy Ryza commented on YARN-415: - Awesome to see this go in! Capture aggregate memory allocation at the app-level for chargeback --- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.5.0 Reporter: Kendall Thrapp Assignee: Eric Payne Fix For: 2.6.0 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.201408181938.txt, YARN-415.201408212033.txt, YARN-415.201409040036.txt, YARN-415.201409092204.txt, YARN-415.201409102216.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster
[ https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131920#comment-14131920 ] Lohit Vijayarenu commented on YARN-2314: Thanks [~jlowe] ContainerManagementProtocolProxy can create thousands of threads for a large cluster Key: YARN-2314 URL: https://issues.apache.org/jira/browse/YARN-2314 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.1.0-beta Reporter: Jason Lowe Priority: Critical Attachments: disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch ContainerManagementProtocolProxy has a cache of NM proxies, and the size of this cache is configurable. However the cache can grow far beyond the configured size when running on a large cluster and blow AM address/container limits. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2542: -- Attachment: YARN-2542.4.patch fixed test failure yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch, YARN-2542.2.patch, YARN-2542.3.patch, YARN-2542.4.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2513: -- Attachment: YARN-2513-v1.patch Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2513-v1.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2456) Possible livelock in CapacityScheduler when RM is recovering apps
[ https://issues.apache.org/jira/browse/YARN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131943#comment-14131943 ] Hadoop QA commented on YARN-2456: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668403/YARN-2456.2.patch against trunk revision 3122daa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4929//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4929//console This message is automatically generated. Possible livelock in CapacityScheduler when RM is recovering apps - Key: YARN-2456 URL: https://issues.apache.org/jira/browse/YARN-2456 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2456.1.patch, YARN-2456.2.patch Consider this scenario: 1. RM is configured with a single queue and only one application can be active at a time. 2. Submit App1 which uses up the queue's whole capacity 3. Submit App2 which remains pending. 4. Restart RM. 5. App2 is recovered before App1, so App2 is added to the activeApplications list. Now App1 remains pending (because of max-active-app limit) 6. All containers of App1 are now recovered when NM registers, and use up the whole queue capacity again. 7. Since the queue is full, App2 cannot proceed to allocate AM container. 8. In the meanwhile, App1 cannot proceed to become active because of the max-active-app limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131946#comment-14131946 ] Jonathan Eagles commented on YARN-2513: --- [~vinodkv], [~hitesh], [~zjshen], I have posted a patch that will simply allow the timeline server to host generic UI that still pass through the web filters of hadoop. Please give some feedback. {code} property nameyarn.timeline-service.ui-names/name valuetez/value /property property nameyarn.timeline-service.ui-on-disk-path.tez/name value/Users/jeagles/hadoop/tez-ui/value /property property nameyarn.timeline-service.ui-web-path.tez/name value/tez-ui-v1.0/value /property {code} Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2513-v1.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131960#comment-14131960 ] Zhijie Shen commented on YARN-611: -- ControlledClock should be marked \@LimitedPrivate\{mapreduce, yarn\}? Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch, YARN-611.9.patch, YARN-611.9.rebase.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131962#comment-14131962 ] Wei Yan commented on YARN-2540: --- Veried the patch. Running an app in queue root.wei.yan, and the patch works well. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131986#comment-14131986 ] Hadoop QA commented on YARN-1372: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668415/YARN-1372.005.patch against trunk revision 3122daa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4933//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4933//console This message is automatically generated. Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1372.001.patch, YARN-1372.001.patch, YARN-1372.002_NMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-611: --- Attachment: YARN-611.10.patch Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.10.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch, YARN-611.9.patch, YARN-611.9.rebase.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131992#comment-14131992 ] Xuan Gong commented on YARN-611: bq. ControlledClock should actually be in a test module. Moved into the test module. bq. ControlledClock should be marked @LimitedPrivate{mapreduce, yarn}? Moved into the test module. So, not need to add those. Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.10.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch, YARN-611.9.patch, YARN-611.9.rebase.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2542) yarn application -status appId throws NPE when retrieving the app from the timelineserver
[ https://issues.apache.org/jira/browse/YARN-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131998#comment-14131998 ] Hadoop QA commented on YARN-2542: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668428/YARN-2542.4.patch against trunk revision 3122daa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4935//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4935//console This message is automatically generated. yarn application -status appId throws NPE when retrieving the app from the timelineserver - Key: YARN-2542 URL: https://issues.apache.org/jira/browse/YARN-2542 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2542.1.patch, YARN-2542.2.patch, YARN-2542.3.patch, YARN-2542.4.patch yarn application -status appId throws NPE when retrieving the app from the timelineserver. It's broken by YARN-415. When app is finished, there's no usageReport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132012#comment-14132012 ] Hadoop QA commented on YARN-2468: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668421/YARN-2468.2.patch against trunk revision 3122daa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 4 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4934//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4934//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4934//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4934//console This message is automatically generated. Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132015#comment-14132015 ] Hadoop QA commented on YARN-611: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668436/YARN-611.10.patch against trunk revision 3122daa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4936//console This message is automatically generated. Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.10.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch, YARN-611.9.patch, YARN-611.9.rebase.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2548) Find a more scalable way to handle logs for long running service
Xuan Gong created YARN-2548: --- Summary: Find a more scalable way to handle logs for long running service Key: YARN-2548 URL: https://issues.apache.org/jira/browse/YARN-2548 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong After YARN-2468, the container logs will be aggregated separately based on the time. It will increate the total number of log files. It is fine for small cluster. But for the larger cluster, it will make too-many-files problem even worse. We need to find a more scalable way to handle those logs. Aggregate all container logs in a single file is an option, but we need to find a different way, other than TFile(do not support append), to do it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132031#comment-14132031 ] stack commented on YARN-2032: - What you need of hbase lads? Our next release undoes our dependency on HTTPServer (the coming 1.0, a 0.99.0 developer release is imminent). If you want us to change our sync method call, np, just say; now would be a good time to do it before 1.0 goes out. We are also well-practiced poking around with reflection looking for whatever the method that does hdfs sync'ing is called (smile). Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Li Lu Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.3.4#6332)