[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163120#comment-14163120 ] Tsuyoshi OZAWA commented on YARN-2312: -- The test failure is not related - the failure of TestFairScheduler is reported on YARN-2252 and the failure of TestPipeApplication is reported on YARN-6115. [~jianhe], [~jlowe], could you review latest patch? > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, > YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2252) Intermittent failure of TestFairScheduler.testContinuousScheduling
[ https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163118#comment-14163118 ] Tsuyoshi OZAWA commented on YARN-2252: -- Hi, this test failure is found on trunk - you can find it [here|https://issues.apache.org/jira/browse/YARN-2312?focusedCommentId=14161902&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14161902]. Log is as follows: {code} Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.582 sec <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372) {code} Can I reopen this problem? > Intermittent failure of TestFairScheduler.testContinuousScheduling > -- > > Key: YARN-2252 > URL: https://issues.apache.org/jira/browse/YARN-2252 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: trunk-win >Reporter: Ratandeep Ratti > Labels: hadoop2, scheduler, yarn > Fix For: 2.6.0 > > Attachments: YARN-2252-1.patch, yarn-2252-2.patch > > > This test-case is failing sporadically on my machine. I think I have a > plausible explanation for this. > It seems that when the Scheduler is being asked for resources, the resource > requests that are being constructed have no preference for the hosts (nodes). > The two mock hosts constructed, both have a memory of 8192 mb. > The containers(resources) being requested each require a memory of 1024mb, > hence a single node can execute both the resource requests for the > application. > In the end of the test-case it is being asserted that the containers > (resource requests) be executed on different nodes, but since we haven't > specified any preferences for nodes when requesting the resources, the > scheduler (at times) executes both the containers (requests) on the same node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for LRS
[ https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163116#comment-14163116 ] Zhijie Shen commented on YARN-2582: --- Thanks for the patch, Xuan! Here're some comments and thoughts. 1. This may not be sufficient. For example, "tmp_\[timestamp\].log" will become a false positive case here. We need to check whether the file name ends with TMP_FILE_SUFFIX or not. {code} && !fileName.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) { {code} {code} if (!thisNodeFile.getPath().getName() .contains(LogAggregationUtils.TMP_FILE_SUFFIX)) { {code} 2. I'm thinking what the better way should be to handle the case that it fails at fetching one log file, but not the others. Thoughts? {code} AggregatedLogFormat.LogReader reader = null; try { reader = new AggregatedLogFormat.LogReader(getConf(), thisNodeFile.getPath()); if (dumpAContainerLogs(containerId, reader, System.out) > -1) { foundContainerLogs = true; } } finally { if (reader != null) { reader.close(); } } {code} 3. In fact, in the case of either single container or all containers, when the container log is not found, it is possible that 1) the container log hasn't been aggregated yet or 2) the log aggregation is not enabled. The the former case it is also possible to have the third situation: the user are providing an invalid nodeId. It's good to uniform the warning message. {code} if (!foundContainerLogs) { System.out.println("Logs for container " + containerId + " are not present in this log-file."); return -1; } {code} {code} if (! foundAnyLogs) { System.out.println("Logs not available at " + remoteAppLogDir.toString()); System.out .println("Log aggregation has not completed or is not enabled."); return -1; } {code} 4. It seems that we don't have unit tests to cover the successful path of dumping the container log(s) currently. I'm not sure if it is going to be a big addition. If it is, let's handle it separately. There're two general issues: 1. Currently we don't have the web page on RM web app to show the aggregated logs. One the other hand, LRS is supposed to be always in the running state, such that the timeline server will not present it. Therefore, we still haven't a web entrance to view the aggregated logs. 2. For LRS, the latest logs are not aggregated in the HDFS, but in the local dir of NM. When a user says he wants to check the logs of this LRS, it doesn't make sense that he can just view yesterday's log, because today's is still not uploaded to HDFS. It's good to have a combined view of both latest log on local dir of NM and aggregated logs on HDFS. Thoughts? > Log related CLI and Web UI changes for LRS > -- > > Key: YARN-2582 > URL: https://issues.apache.org/jira/browse/YARN-2582 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2582.1.patch > > > After YARN-2468, we have change the log layout to support log aggregation for > Long Running Service. Log CLI and related Web UI should be modified > accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163088#comment-14163088 ] Hadoop QA commented on YARN-2180: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673521/YARN-2180-trunk-v7.patch against trunk revision 1efd9c9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5321//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5321//console This message is automatically generated. > In-memory backing store for cache manager > - > > Key: YARN-2180 > URL: https://issues.apache.org/jira/browse/YARN-2180 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, > YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, > YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch > > > Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
[ https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2655: Description: AllocatedGB/AvailableGB in nodemanager JMX showing only integer values Screenshot attached was:AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > -- > > Key: YARN-2655 > URL: https://issues.apache.org/jira/browse/YARN-2655 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > Screenshot attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
[ https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2655: Attachment: screenshot-2.png > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > -- > > Key: YARN-2655 > URL: https://issues.apache.org/jira/browse/YARN-2655 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
[ https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2655: Attachment: screenshot-1.png > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > -- > > Key: YARN-2655 > URL: https://issues.apache.org/jira/browse/YARN-2655 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Priority: Minor > Attachments: screenshot-1.png > > > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
Nishan Shetty created YARN-2655: --- Summary: AllocatedGB/AvailableGB in nodemanager JMX showing only integer values Key: YARN-2655 URL: https://issues.apache.org/jira/browse/YARN-2655 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Nishan Shetty Priority: Minor AllocatedGB/AvailableGB in nodemanager JMX showing only integer values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163057#comment-14163057 ] Hadoop QA commented on YARN-2579: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673338/YARN-2579.patch against trunk revision 1efd9c9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5320//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5320//console This message is automatically generated. > Both RM's state is Active , but 1 RM is not really active. > -- > > Key: YARN-2579 > URL: https://issues.apache.org/jira/browse/YARN-2579 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Rohith >Assignee: Rohith > Attachments: YARN-2579.patch > > > I encountered a situaltion where both RM's web page was able to access and > its state displayed as Active. But One of the RM's ActiveServices were > stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2180: --- Attachment: YARN-2180-trunk-v7.patch [~kasha] Attached v7. Changes: 1. Moved AppChecker parameter to SCM instead of store and changed variable names. 2. Moved createAppCheckerService method to a static method in SharedCacheManager. Originally I moved it to the shared cache util class, but that is in yarn-server-common which required moving AppChecker as well. I only see AppChecker being used in the context of the SCM so I kept it where it was. Let me know if you would still like me to move it to common and make the create method part of the util class. 3. Renamed SharedCacheStructureUtil to SharedCacheUtil. 4. Filed YARN-2654 subtask to revisit config names. > In-memory backing store for cache manager > - > > Key: YARN-2180 > URL: https://issues.apache.org/jira/browse/YARN-2180 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, > YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, > YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch > > > Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
Chris Trezzo created YARN-2654: -- Summary: Revisit all shared cache config parameters to ensure quality names Key: YARN-2654 URL: https://issues.apache.org/jira/browse/YARN-2654 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Priority: Blocker Revisit all the shared cache config parameters in YarnConfiguration and yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162917#comment-14162917 ] Junping Du commented on YARN-2331: -- Thanks [~jlowe] for the patch. One thing I want to confirm here is: after this patch, if we setting "yarn.nodemanager.recovery.enabled" to true but setting "yarn.nodemanager.recovery.supervised" to false, we can still keep container running if we kill NM daemon by "kill -9" but go through "yarn-daemon.sh stop nodemanager" will kill running containers. Isn't it? > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2331.patch, YARN-2331v2.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162908#comment-14162908 ] Hadoop QA commented on YARN-913: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673478/YARN-913-021.patch against trunk revision 9b8a35a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 37 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1267 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5319//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5319//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5319//console This message is automatically generated. > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, > YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, > yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162875#comment-14162875 ] Karthik Kambatla commented on YARN-1879: Thanks for the updates, Tsuyoshi. Sorry to vacillate on this JIRA. Chatted with Jian offline about marking the methods in question Idempotent vs AtMostOnce. I believe we agreed on "Methods that identify and ignore duplicate requests as duplicate should be AtMostOnce, and those that repeat the method without any adverse side-effects should be Idempotent. Retries on RM restart/failover are the same for Idempotent and AtMostOnce methods." Following that, registerApplicationMaster should be Idempotent, allocate and finishApplicationMaster should be AtMostOnce. Given all the methods handle duplicate requests, the retry-cache is not necessary but could be an optimization we can pursue/investigate on another JIRA. Review comments on the patch: # Nit: Not added in this patch, can we rename TestApplicationMasterServiceProtocolOnHA#initiate to initialize()? # IIUC, the tests (ProtocolHATestBase) induce a failover while processing one of these requests. We should also probably add a test that makes duplicate requests to the same/different RM and verify the behavior is as expected. Correct me if existing tests already do this. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, > YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162865#comment-14162865 ] Hadoop QA commented on YARN-2629: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673479/YARN-2629.4.patch against trunk revision 9b8a35a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5318//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5318//console This message is automatically generated. > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, > YARN-2629.4.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162862#comment-14162862 ] Hadoop QA commented on YARN-913: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673460/YARN-913-020.patch against trunk revision 9b8a35a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 37 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1267 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5317//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5317//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5317//console This message is automatically generated. > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, > YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, > yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2653) Support an AM liveness mechanism that works for both sync/async AM-RM clients
sidharta seethana created YARN-2653: --- Summary: Support an AM liveness mechanism that works for both sync/async AM-RM clients Key: YARN-2653 URL: https://issues.apache.org/jira/browse/YARN-2653 Project: Hadoop YARN Issue Type: Improvement Components: client, resourcemanager Reporter: sidharta seethana Priority: Minor Currently, the “heartbeat” mechanism is only supported in the async client ( AMRMClientAsyncImpl ). The reason for this seems to be because liveness monitoring is currently implemented based on periodic (possibly empty) allocation requests - which may return container allocation responses for earlier allocation requests. This mechanism only works for the async client where allocation responses can be queued and is not application to the synchronous/blocking client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162835#comment-14162835 ] Varun Vasudev commented on YARN-2566: - Unfortunately the defaults are still to use the full disk. We probably just change the default to use a maximum of 90% of disk space. > IOException happen in startLocalizer of DefaultContainerExecutor due to not > enough disk space for the first localDir. > - > > Key: YARN-2566 > URL: https://issues.apache.org/jira/browse/YARN-2566 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2566.000.patch, YARN-2566.001.patch, > YARN-2566.002.patch, YARN-2566.003.patch > > > startLocalizer in DefaultContainerExecutor will only use the first localDir > to copy the token file, if the copy is failed for first localDir due to not > enough disk space in the first localDir, the localization will be failed even > there are plenty of disk space in other localDirs. We see the following error > for this case: > {code} > 2014-09-13 23:33:25,171 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to > create app directory > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > java.io.IOException: mkdir of > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,185 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.FileNotFoundException: File > file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,186 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1410663092546_0004_01_01 transitioned from > LOCALIZING to LOCALIZATION_FAILED > 2014-09-13 23:33:25,187 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera > OPERATION=Container Finished - Failed TARGET=Cont
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162823#comment-14162823 ] Jian He commented on YARN-1879: --- Thanks Tsuyoshi ! One more thing is that when failover happens on unregister, if RM1 already saves the app final state into state-store, and AM tries to unregister with RM2, AM will get ApplicationDoesNotExistInCacheException. We should probably move the following check in finishApplicationMaster call upfront so that ApplicationDoesNotExistInCacheException is not thrown for already completed apps. {code} if (rmApp.isAppFinalStateStored()) { return FinishApplicationMasterResponse.newInstance(true); } {code} [~kasha], do you have more comments ? > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, > YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2517) Implement TimelineClientAsync
[ https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162820#comment-14162820 ] Zhijie Shen commented on YARN-2517: --- bq. Clearly, async calls need call-back handlers just for errors. As of today, there are no APIs that really need to send back results (not error) asynchronously. I realize that another requirement of TimelineClient is to wrap the read APIs to facilitate the java app developers. These APIs are going to return results. > Implement TimelineClientAsync > - > > Key: YARN-2517 > URL: https://issues.apache.org/jira/browse/YARN-2517 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2517.1.patch > > > In some scenarios, we'd like to put timeline entities in another thread no to > block the current one. > It's good to have a TimelineClientAsync like AMRMClientAsync and > NMClientAsync. It can buffer entities, put them in a separate thread, and > have callback to handle the responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162816#comment-14162816 ] Zhijie Shen commented on YARN-2423: --- Update the title and description, as it's good to do the work for domain APIs too. > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2423: -- Description: TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. (was: TimelineClient provides the Java method to put timeline entities. It's also good to wrap over three GET APIs, and deserialize the json response into Java POJO objects.) Summary: TimelineClient should wrap all GET APIs to facilitate Java users (was: TimelineClient should wrap the three GET APIs to facilitate Java users) > TimelineClient should wrap all GET APIs to facilitate Java users > > > Key: YARN-2423 > URL: https://issues.apache.org/jira/browse/YARN-2423 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen > > TimelineClient provides the Java method to put timeline entities. It's also > good to wrap over all GET APIs (both entity and domain), and deserialize the > json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162812#comment-14162812 ] Li Lu commented on YARN-2629: - OK, new patch LGTM. Maybe some committers would like to check it again and commit it? > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, > YARN-2629.4.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: YARN-913-021.patch the -021 patch is the -020 patch with all whitespace stripped, so that patch -p0 will also apply it cleanly > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, > YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, > yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2629: -- Attachment: YARN-2629.4.patch > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, > YARN-2629.4.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162804#comment-14162804 ] Zhijie Shen commented on YARN-2629: --- bq. 1. I noticed that you've changed a few method signatures, and removed the "throws IOException, YarnException" clause from the signatures. Meanwhile, in the method I noticed that you're catching all types of exceptions in one catch statement. I agree that this change saves some repeated lines of try-catch statements in various call sites, but is it OK catch all exceptions internally? Previously, the exception was handled at the caller of publish(), but I moved into publish() to simplify the change, such that there's no need to throw the exception any more. Usually, it's a good behavior to handle each potential exception explicitly. Here since I intentionally don't want to do such fine-grained handling, I choose to log whatever the exception is. bq. 2. Maybe we want to add one more test with no arguments on domain/view_acl/modify_acl? Timeline domain is a newly added feature, and maybe we'd like to make sure it would not break any existing distributed shell usages (esp. in some scripts)? Added one bq. This variable name looks unclear. Maybe we want to rename it so that people can easily connect it with timeline domains? It's explained in the usage of the option and have be commented: {code} // Flag to indicate whether to create the domain of the given ID private boolean toCreate = false; {code} but let me elaborate this var in code. > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Description: improve node decommission latency in RM. Currently the node decommission only happened after RM received nodeHeartbeat from the Node Manager. The node heartbeat interval is configurable. The default value is 1 second. It will be better to do the decommission during RM Refresh(NodesListManager) instead of nodeHeartbeat(ResourceTrackerService). This will be a much more serious issue: After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed before NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. The RMNode will only expire in RM after "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time. was: improve node decommission latency in RM. Currently the node decommission only happened after RM received nodeHeartbeat from the Node Manager. The node heartbeat interval is configurable. The default value is 1 second. It will be better to do the decommission during RM Refresh(NodesListManager) instead of nodeHeartbeat(ResourceTrackerService). > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). > This will be a much more serious issue: > After RM is refreshed (refreshNodes), If the NM to be decommissioned is > killed before NM sent heartbeat to RM. The RMNode will never be > decommissioned in RM. The RMNode will only expire in RM after > "yarn.nm.liveness-monitor.expiry-interval-ms"(default value 10 minutes) time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162764#comment-14162764 ] Hadoop QA commented on YARN-2641: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673446/YARN-2641.002.patch against trunk revision 9b8a35a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5316//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5316//console This message is automatically generated. > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162740#comment-14162740 ] zhihai xu commented on YARN-2641: - Hi [~djp], thanks to review the patch. I removed the following RMNode decommission in nodeHeartbeat(ResourceTrackerService.java). {code} this.rmContext.getDispatcher().getEventHandler().handle( new RMNodeEvent(nodeId, RMNodeEventType.DECOMMISSION)); {code} I added RMNode decommission in refreshNodes(NodesListManager.java). Did you still see the decommission happen after the heartbeat back to NM in the patch? I didn't have unit test in my first patch(YARN-2641.000.patch). In my second patch(YARN-2641.001.patch), I change the unit test in TestResourceTrackerService to verify the RMNodeEventType.DECOMMISSION is sent in {code}rm.getNodesListManager().refreshNodes(conf);{code} instead of {code}nodeHeartbeat = nm1.nodeHeartbeat(true); {code} > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: YARN-913-020.patch patch -020; deleted unused operation/assignment in {{RegistryUtils.statChildren()}} > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, > YARN-913-019.patch, YARN-913-020.patch, yarnregistry.pdf, yarnregistry.pdf, > yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: (was: YARN-2641.002.patch) > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162683#comment-14162683 ] Hadoop QA commented on YARN-913: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673424/YARN-913-019.patch against trunk revision 30d56fd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 37 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1267 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5315//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5315//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5315//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5315//console This message is automatically generated. > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, > YARN-913-019.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, > yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162665#comment-14162665 ] Li Lu commented on YARN-2629: - Hi [~zjshen], I reviewed this patch (YARN-2629.3.patch), and it generally LGTM. However, I do have a few quick comments: 1. I noticed that you've changed a few method signatures, and removed the "throws IOException, YarnException" clause from the signatures. Meanwhile, in the method I noticed that you're catching all types of exceptions in one catch statement. I agree that this change saves some repeated lines of try-catch statements in various call sites, but is it OK catch all exceptions internally? (I think so, but would like to verify that. ) 2. Maybe we want to add one more test with no arguments on domain/view_acl/modify_acl? Timeline domain is a newly added feature, and maybe we'd like to make sure it would not break any existing distributed shell usages (esp. in some scripts)? 3. In Client.java: {code} private boolean toCreate = false; {code} This variable name looks unclear. Maybe we want to rename it so that people can easily connect it with timeline domains? > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: YARN-2641.002.patch > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch, > YARN-2641.002.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store
[ https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162640#comment-14162640 ] Hadoop QA commented on YARN-2320: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673422/YARN-2320.3.patch against trunk revision 30d56fd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 24 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5314//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5314//console This message is automatically generated. > Removing old application history store after we store the history data to > timeline store > > > Key: YARN-2320 > URL: https://issues.apache.org/jira/browse/YARN-2320 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2320.1.patch, YARN-2320.2.patch, YARN-2320.3.patch > > > After YARN-2033, we should deprecate application history store set. There's > no need to maintain two sets of store interfaces. In addition, we should > conclude the outstanding jira's under YARN-321 about the application history > store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2483) TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state
[ https://issues.apache.org/jira/browse/YARN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162530#comment-14162530 ] Jian He commented on YARN-2483: --- YARN-2649 may solve this > TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to > incorrect AppAttempt state > > > Key: YARN-2483 > URL: https://issues.apache.org/jira/browse/YARN-2483 > Project: Hadoop YARN > Issue Type: Test >Reporter: Ted Yu > > From https://builds.apache.org/job/Hadoop-Yarn-trunk/665/console : > {code} > testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart) > Time elapsed: 49.686 sec <<< FAILURE! > java.lang.AssertionError: AppAttempt state is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:84) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:582) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:589) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForNewAMToLaunchAndRegister(MockRM.java:182) > at > org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:402) > {code} > TestApplicationMasterLauncher#testallocateBeforeAMRegistration fails with > similar cause. > These tests failed in build #664 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: YARN-913-019.patch This a patch which addresses sanjay's review comments, with the key changes being * {{ServiceRecord}} supports arbitrary string attributes; the yarn ID and persistence are simply two of these. * moved zk-related classes under {{/impl/zk}} package * TLA+ spec in sync with new design I also * moved the packaging from {{org.apache.hadoop.yarn.registry}} to {{org.apache.hadoop.registry}}. That would reduce the impact of any promotion of the registry into hadoop-common if ever desired. * moved the registry/site documentation under yarn-site and hooked up with existing site docs. * added the TLA toolbox generated artifacts (pdf and a toolbox dir) to {{/.gitignore}} > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, > YARN-913-019.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, > yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162510#comment-14162510 ] Hudson commented on YARN-1857: -- FAILURE: Integrated in Hadoop-trunk-Commit #6206 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6206/]) YARN-1857. CapacityScheduler headroom doesn't account for other AM's running. Contributed by Chen He and Craig Welch (jianhe: rev 30d56fdbb40d06c4e267d6c314c8c767a7adc6a3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Fix For: 2.6.0 > > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, > YARN-1857.patch, YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2320) Removing old application history store after we store the history data to timeline store
[ https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2320: -- Attachment: YARN-2320.3.patch Update against the latest trunk > Removing old application history store after we store the history data to > timeline store > > > Key: YARN-2320 > URL: https://issues.apache.org/jira/browse/YARN-2320 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2320.1.patch, YARN-2320.2.patch, YARN-2320.3.patch > > > After YARN-2033, we should deprecate application history store set. There's > no need to maintain two sets of store interfaces. In addition, we should > conclude the outstanding jira's under YARN-321 about the application history > store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162497#comment-14162497 ] Chen He commented on YARN-1857: --- Sorry, my bad, looks like YARN-2400 is checked in. Anyway, it is not related to this patch. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, > YARN-1857.patch, YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162496#comment-14162496 ] Chen He commented on YARN-1857: --- Unit test failure is because of YARN-2400 > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, > YARN-1857.patch, YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: yarnregistry.pdf TLA+ specification in sync with (forthcoming -019 patch) # uses consistent naming # service record specified as allowing arbitrary {{String |-> String}} attributes; {{yarn:id}} and {{yarn:persistence}} are merely two of these > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, > yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162487#comment-14162487 ] Jian He commented on YARN-1857: --- Craig, thanks for updating. looks good, +1 > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, > YARN-1857.patch, YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2652) add hadoop-yarn-registry package under hadoop-yarn
Steve Loughran created YARN-2652: Summary: add hadoop-yarn-registry package under hadoop-yarn Key: YARN-2652 URL: https://issues.apache.org/jira/browse/YARN-2652 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Steve Loughran Assignee: Steve Loughran Commit the core {{hadoop-yarn-registry}} module, including * changes to POMs to include in the build * additions to the yarn site This patch excludes * RM integration (YARN-2571) * Distributed Shell integration and tests (YARN-2646) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162461#comment-14162461 ] Hadoop QA commented on YARN-1857: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673410/YARN-1857.7.patch against trunk revision 9196db9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5311//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5311//console This message is automatically generated. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, > YARN-1857.patch, YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162419#comment-14162419 ] Hadoop QA commented on YARN-2414: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673402/YARN-2414.patch against trunk revision 2e789eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5310//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5310//console This message is automatically generated. > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > Attachments: YARN-2414.patch > > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jett
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162416#comment-14162416 ] Hadoop QA commented on YARN-2583: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673409/YARN-2583.4.patch against trunk revision 9196db9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5313//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5313//console This message is automatically generated. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162414#comment-14162414 ] Hadoop QA commented on YARN-2331: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch against trunk revision 9196db9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5312//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5312//console This message is automatically generated. > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2331.patch, YARN-2331v2.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162366#comment-14162366 ] Jason Lowe commented on YARN-2414: -- Thanks, Wangda! Looks good overall. Nit: appMerics should be appMetrics. Have you done any manual testing with this patch? Seems like it would be straightforward to mock up an injector with a mock context containing an app with no attempts. With that the unit test can verify render doesn't throw. Thinking something along the lines of WebAppTests.testPage/testBlock. > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > Attachments: YARN-2414.patch > > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.Ht
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162357#comment-14162357 ] Chen He commented on YARN-1857: --- Hi [~cwelch], hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm has the detail information about these parameters. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, > YARN-1857.patch, YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1857: -- Attachment: YARN-1857.7.patch I see it now - uploading newer patch with simplified final headroom calculation > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, > YARN-1857.patch, YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2583: Attachment: YARN-2583.4.patch > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162324#comment-14162324 ] Xuan Gong commented on YARN-2583: - Upload a new patch to address all the comments > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2331: - Attachment: YARN-2331v2.patch Updated patch to fix the unit tests. > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2331.patch, YARN-2331v2.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162316#comment-14162316 ] Hadoop QA commented on YARN-796: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673374/YARN-796.node-label.consolidate.14.patch against trunk revision 2e789eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 42 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.pipes.TestPipeApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5307//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5307//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5307//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5307//console This message is automatically generated. > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.14.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2414: - Attachment: YARN-2414.patch Attached a simple fix for this issue. Please kindly view. Thanks, > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > Attachments: YARN-2414.patch > > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$Po
[jira] [Commented] (YARN-2633) TestContainerLauncherImpl sometimes fails
[ https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162266#comment-14162266 ] Hadoop QA commented on YARN-2633: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673394/YARN-2633.patch against trunk revision 2e789eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5309//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5309//console This message is automatically generated. > TestContainerLauncherImpl sometimes fails > - > > Key: YARN-2633 > URL: https://issues.apache.org/jira/browse/YARN-2633 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-2633.patch, YARN-2633.patch > > > {noformat} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close() > at java.lang.Class.getMethod(Class.java:1665) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-314: -- Attachment: yarn-314-prelim.patch Attaching a preliminary patch for any early feedback. > Schedulers should allow resource requests of different sizes at the same > priority and location > -- > > Key: YARN-314 > URL: https://issues.apache.org/jira/browse/YARN-314 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > Fix For: 2.6.0 > > Attachments: yarn-314-prelim.patch > > > Currently, resource requests for the same container and locality are expected > to all be the same size. > While it it doesn't look like it's needed for apps currently, and can be > circumvented by specifying different priorities if absolutely necessary, it > seems to me that the ability to request containers with different resource > requirements at the same priority level should be there for the future and > for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162216#comment-14162216 ] Vinod Kumar Vavilapalli commented on YARN-2566: --- I am surprised YARN-1781 didn't take care of this /cc [~vvasudev]. IAC, instead of creating yet another algorithm, this should plug into the good-local-dirs added via YARN-1781. It can simply take in the one (first or randomized) of the good-local-dirs instead of the first in the overall list. That should address this issue. > IOException happen in startLocalizer of DefaultContainerExecutor due to not > enough disk space for the first localDir. > - > > Key: YARN-2566 > URL: https://issues.apache.org/jira/browse/YARN-2566 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2566.000.patch, YARN-2566.001.patch, > YARN-2566.002.patch, YARN-2566.003.patch > > > startLocalizer in DefaultContainerExecutor will only use the first localDir > to copy the token file, if the copy is failed for first localDir due to not > enough disk space in the first localDir, the localization will be failed even > there are plenty of disk space in other localDirs. We see the following error > for this case: > {code} > 2014-09-13 23:33:25,171 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to > create app directory > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > java.io.IOException: mkdir of > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,185 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.FileNotFoundException: File > file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,186 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1410663092546_0004_01_01
[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2566: -- Issue Type: Sub-task (was: Bug) Parent: YARN-91 > IOException happen in startLocalizer of DefaultContainerExecutor due to not > enough disk space for the first localDir. > - > > Key: YARN-2566 > URL: https://issues.apache.org/jira/browse/YARN-2566 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2566.000.patch, YARN-2566.001.patch, > YARN-2566.002.patch, YARN-2566.003.patch > > > startLocalizer in DefaultContainerExecutor will only use the first localDir > to copy the token file, if the copy is failed for first localDir due to not > enough disk space in the first localDir, the localization will be failed even > there are plenty of disk space in other localDirs. We see the following error > for this case: > {code} > 2014-09-13 23:33:25,171 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to > create app directory > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > java.io.IOException: mkdir of > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,185 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.FileNotFoundException: File > file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,186 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1410663092546_0004_01_01 transitioned from > LOCALIZING to LOCALIZATION_FAILED > 2014-09-13 23:33:25,187 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILED > APPID=applica
[jira] [Updated] (YARN-2633) TestContainerLauncherImpl sometimes fails
[ https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2633: Attachment: YARN-2633.patch > TestContainerLauncherImpl sometimes fails > - > > Key: YARN-2633 > URL: https://issues.apache.org/jira/browse/YARN-2633 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-2633.patch, YARN-2633.patch > > > {noformat} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close() > at java.lang.Class.getMethod(Class.java:1665) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162170#comment-14162170 ] Craig Welch commented on YARN-1857: --- This is an interesting question - that logic predates this change, and I wondered if there were cases when userLimit could somehow be > queueMaxCap, and as I look at the code, surprisingly, I believe so. Userlimit is calculated based on absolute queue values whereas, at least since [YARN-2008], queueMaxCap takes into account actual useage in other queues. So, it is entirely possible for userLimit to be > queueMaxCap due to how they are calculated, at least post [YARN-2008]. I'm not sure if pre-2008 that was possible as well, it may have been, there is a bit to how that was calculated even before that change - in any event, it is the case now. So, as it happens, I don't believe we can do the simplification. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, > YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162157#comment-14162157 ] Xuan Gong commented on YARN-2583: - Here is the proposal: * Add private configuration for number of logs we can save in NM side. We will delete old logs if the num of logs is larger than this configured value. This is a temporary solution. The configuration will be deleted once we find a more scalable method(will be tracked by YARN-2548) to only write a single log file per LRS. * jhs contacts RM to check whether app is still running or not. If this app is still running, we need to keep the app dir, but remove the old logs. * Remove per-app LogRollingInterval completely and then have NM wake up every so often and upload log files. In this ticket, we can spin off LogRollingInterval from AppLogAggregatorImpl. YARN-2651 will be used to track the changes for other places. * Enforce the minimal log rolling interval. (3600 seconds will be used as minimal value) > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2651) Spin off the LogRollingInterval from LogAggregationContext
Xuan Gong created YARN-2651: --- Summary: Spin off the LogRollingInterval from LogAggregationContext Key: YARN-2651 URL: https://issues.apache.org/jira/browse/YARN-2651 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162143#comment-14162143 ] Vinod Kumar Vavilapalli commented on YARN-2583: --- [~xgong], can you please post a short summary of the final proposal here? > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162117#comment-14162117 ] Hadoop QA commented on YARN-2331: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673377/YARN-2331.patch against trunk revision 2e789eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery org.apache.hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5308//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5308//console This message is automatically generated. > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2331.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2650) TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated YARN-2650: - Attachment: TestRMRestart.tar.gz Test output > TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk > -- > > Key: YARN-2650 > URL: https://issues.apache.org/jira/browse/YARN-2650 > Project: Hadoop YARN > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Attachments: TestRMRestart.tar.gz > > > I got the following failure running on Linux: > {code} > TestRMRestart.testRMRestartGetApplicationList:952 > rMAppManager.logApplicationSummary( > isA(org.apache.hadoop.yarn.api.records.ApplicationId) > ); > Wanted 3 times: > -> at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:952) > But was 2 times: > -> at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:64) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162091#comment-14162091 ] Jason Lowe commented on YARN-1915: -- Test failure is unrelated, see YARN-2483. > ClientToAMTokenMasterKey should be provided to AM at launch time > > > Key: YARN-1915 > URL: https://issues.apache.org/jira/browse/YARN-1915 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Hitesh Shah >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch > > > Currently, the AM receives the key as part of registration. This introduces a > race where a client can connect to the AM when the AM has not received the > key. > Current Flow: > 1) AM needs to start the client listening service in order to get host:port > and send it to the RM as part of registration > 2) RM gets the port info in register() and transitions the app to RUNNING. > Responds back with client secret to AM. > 3) User asks RM for client token. Gets it and pings the AM. AM hasn't > received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2650) TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk
Ted Yu created YARN-2650: Summary: TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk Key: YARN-2650 URL: https://issues.apache.org/jira/browse/YARN-2650 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Priority: Minor I got the following failure running on Linux: {code} TestRMRestart.testRMRestartGetApplicationList:952 rMAppManager.logApplicationSummary( isA(org.apache.hadoop.yarn.api.records.ApplicationId) ); Wanted 3 times: -> at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:952) But was 2 times: -> at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:64) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162076#comment-14162076 ] Hadoop QA commented on YARN-1915: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673366/YARN-1915v3.patch against trunk revision 2e789eb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5306//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5306//console This message is automatically generated. > ClientToAMTokenMasterKey should be provided to AM at launch time > > > Key: YARN-1915 > URL: https://issues.apache.org/jira/browse/YARN-1915 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Hitesh Shah >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch > > > Currently, the AM receives the key as part of registration. This introduces a > race where a client can connect to the AM when the AM has not received the > key. > Current Flow: > 1) AM needs to start the client listening service in order to get host:port > and send it to the RM as part of registration > 2) RM gets the port info in register() and transitions the app to RUNNING. > Responds back with client secret to AM. > 3) User asks RM for client token. Gets it and pings the AM. AM hasn't > received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2331: - Attachment: YARN-2331.patch In the interest of getting something done for this in time for 2.6, here's a patch that adds a conf to tell the NM whether or not it's supervised. If supervised then it is expected to be quickly restarted on shutdown and will not kill containers. If unsupervised then it will kill containers on shutdown since it does not expect to be restarted in a timely manner. > Distinguish shutdown during supervision vs. shutdown for rolling upgrade > > > Key: YARN-2331 > URL: https://issues.apache.org/jira/browse/YARN-2331 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe > Attachments: YARN-2331.patch > > > When the NM is shutting down with restart support enabled there are scenarios > we'd like to distinguish and behave accordingly: > # The NM is running under supervision. In that case containers should be > preserved so the automatic restart can recover them. > # The NM is not running under supervision and a rolling upgrade is not being > performed. In that case the shutdown should kill all containers since it is > unlikely the NM will be restarted in a timely manner to recover them. > # The NM is not running under supervision and a rolling upgrade is being > performed. In that case the shutdown should not kill all containers since a > restart is imminent due to the rolling upgrade and the containers will be > recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: YARN-796.node-label.consolidate.14.patch > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.14.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1915: - Attachment: YARN-1915v3.patch Refreshed patch to latest trunk. [~vinodkv] could you comment? I fully agree with Hitesh that the current patch is a stop-gap at best. However there's some confusion as to how the client token master key should be sent to the RM (e.g.: via container credentials, via the current method, etc.). The original env variable approach apparently is problematic on Windows per YARN-610. If we won't have time to develop the best fix for 2.6 then I'd like to see something like this patch put in to improve things in the interim. > ClientToAMTokenMasterKey should be provided to AM at launch time > > > Key: YARN-1915 > URL: https://issues.apache.org/jira/browse/YARN-1915 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Hitesh Shah >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch > > > Currently, the AM receives the key as part of registration. This introduces a > race where a client can connect to the AM when the AM has not received the > key. > Current Flow: > 1) AM needs to start the client listening service in order to get host:port > and send it to the RM as part of registration > 2) RM gets the port info in register() and transitions the app to RUNNING. > Responds back with client secret to AM. > 3) User asks RM for client token. Gets it and pings the AM. AM hasn't > received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161969#comment-14161969 ] Hudson commented on YARN-2615: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/]) YARN-2615. Changed ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf as payload. Contributed by Junping Du (jianhe: rev ea26cc0b4ac02b8af686dfda80f540e5aa70c358) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java * hadoop-yarn-project/CHANGES.txt > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, > YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161971#comment-14161971 ] Hudson commented on YARN-1051: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/]) Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786) * hadoop-yarn-project/CHANGES.txt > YARN Admission Control/Planner: enhancing the resource allocation model with > time. > -- > > Key: YARN-1051 > URL: https://issues.apache.org/jira/browse/YARN-1051 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, > YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, > techreport.pdf > > > In this umbrella JIRA we propose to extend the YARN RM to handle time > explicitly, allowing users to "reserve" capacity over time. This is an > important step towards SLAs, long-running services, workflows, and helps for > gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161970#comment-14161970 ] Hudson commented on YARN-2644: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/]) YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM allocates. Contributed by Craig Welch (jianhe: rev 519e5a7dd2bd540105434ec3c8939b68f6c024f8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/CHANGES.txt > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch, YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161913#comment-14161913 ] Hudson commented on YARN-2644: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/]) YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM allocates. Contributed by Craig Welch (jianhe: rev 519e5a7dd2bd540105434ec3c8939b68f6c024f8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch, YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161914#comment-14161914 ] Hudson commented on YARN-1051: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/]) Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786) * hadoop-yarn-project/CHANGES.txt > YARN Admission Control/Planner: enhancing the resource allocation model with > time. > -- > > Key: YARN-1051 > URL: https://issues.apache.org/jira/browse/YARN-1051 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, > YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, > techreport.pdf > > > In this umbrella JIRA we propose to extend the YARN RM to handle time > explicitly, allowing users to "reserve" capacity over time. This is an > important step towards SLAs, long-running services, workflows, and helps for > gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161912#comment-14161912 ] Hudson commented on YARN-2615: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/]) YARN-2615. Changed ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf as payload. Contributed by Junping Du (jianhe: rev ea26cc0b4ac02b8af686dfda80f540e5aa70c358) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, > YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161902#comment-14161902 ] Hadoop QA commented on YARN-2312: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673331/YARN-2312.6.patch against trunk revision 0fb2735. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.pipes.TestPipeApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5304//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5304//console This message is automatically generated. > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, > YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161826#comment-14161826 ] Hadoop QA commented on YARN-1879: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673336/YARN-1879.23.patch against trunk revision 0fb2735. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5305//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5305//console This message is automatically generated. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, > YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161793#comment-14161793 ] Hudson commented on YARN-2644: -- FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/704/]) YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM allocates. Contributed by Craig Welch (jianhe: rev 519e5a7dd2bd540105434ec3c8939b68f6c024f8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch, YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161794#comment-14161794 ] Hudson commented on YARN-1051: -- FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/704/]) Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786) * hadoop-yarn-project/CHANGES.txt > YARN Admission Control/Planner: enhancing the resource allocation model with > time. > -- > > Key: YARN-1051 > URL: https://issues.apache.org/jira/browse/YARN-1051 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, > YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, > techreport.pdf > > > In this umbrella JIRA we propose to extend the YARN RM to handle time > explicitly, allowing users to "reserve" capacity over time. This is an > important step towards SLAs, long-running services, workflows, and helps for > gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161792#comment-14161792 ] Hudson commented on YARN-2615: -- FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/704/]) YARN-2615. Changed ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf as payload. Contributed by Junping Du (jianhe: rev ea26cc0b4ac02b8af686dfda80f540e5aa70c358) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, > YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161781#comment-14161781 ] Rohith commented on YARN-2579: -- Verified manually for below tests using help of eclipse debug point. 1. Call transitionToStandBy from admin service obtaining RM lock, and at same time RMFatalEventDispatcher wait for RM lock to transtionToStandBy(This issue scenario) 2. Call transitionToStandBy from RMFatalEventDispatcher obtaining RM lock, and at same time admin service wait for RM lock to transtionToStandBy. Please review patch. > Both RM's state is Active , but 1 RM is not really active. > -- > > Key: YARN-2579 > URL: https://issues.apache.org/jira/browse/YARN-2579 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Rohith >Assignee: Rohith > Attachments: YARN-2579.patch > > > I encountered a situaltion where both RM's web page was able to access and > its state displayed as Active. But One of the RM's ActiveServices were > stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161774#comment-14161774 ] Rohith commented on YARN-2579: -- Considering 1 st approach as feasible, I attached patch. Thinking of how do I write tests!! > Both RM's state is Active , but 1 RM is not really active. > -- > > Key: YARN-2579 > URL: https://issues.apache.org/jira/browse/YARN-2579 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Rohith >Assignee: Rohith > Attachments: YARN-2579.patch > > > I encountered a situaltion where both RM's web page was able to access and > its state displayed as Active. But One of the RM's ActiveServices were > stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2579: - Attachment: YARN-2579.patch > Both RM's state is Active , but 1 RM is not really active. > -- > > Key: YARN-2579 > URL: https://issues.apache.org/jira/browse/YARN-2579 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Rohith >Assignee: Rohith > Attachments: YARN-2579.patch > > > I encountered a situaltion where both RM's web page was able to access and > its state displayed as Active. But One of the RM's ActiveServices were > stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1879: - Attachment: YARN-1879.23.patch The failure of TestClientToAMTokens looks not related - it passed on my local. Let me submit same patch again. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, > YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2312: - Attachment: YARN-2312.6.patch Thanks Jason and Jian for review. Updated: * Removed unnecessary change in TestTaskAttemptListenerImpl.java - sorry for this, this change was included wrongly. * Defined 0xffL as CONTAINER_ID_BITMASK and exposed it. > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, > YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1458) FairScheduler: Zero weight can lead to livelock
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161724#comment-14161724 ] Tsuyoshi OZAWA commented on YARN-1458: -- Sure, thanks for reply :-) > FairScheduler: Zero weight can lead to livelock > --- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, > YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, > YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, > yarn-1458-7.patch, yarn-1458-8.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161664#comment-14161664 ] Hadoop QA commented on YARN-2641: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673311/YARN-2641.001.patch against trunk revision 0fb2735. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5303//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5303//console This message is automatically generated. > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161660#comment-14161660 ] Junping Du commented on YARN-2641: -- The idea here sounds interestingHowever, the decommission still happen after the heartbeat back to NM. AM I missing something here? > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161658#comment-14161658 ] Hadoop QA commented on YARN-2641: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673309/YARN-2641.000.patch against trunk revision 0fb2735. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5302//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5302//console This message is automatically generated. > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: YARN-2641.001.patch > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch, YARN-2641.001.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: (was: YARN-2641.001.patch) > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: YARN-2641.001.patch > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: (was: YARN-2641.001.patch) > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: YARN-2641.001.patch > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: YARN-2641.000.patch > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: (was: YARN-2641.000.patch) > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)