[jira] [Created] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
Bikas Saha created YARN-1366: Summary: ApplicationMasterService should Resync with the AM upon allocate call after restart Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1367) After restart NM should resync with the RM without killing containers
Bikas Saha created YARN-1367: Summary: After restart NM should resync with the RM without killing containers Key: YARN-1367 URL: https://issues.apache.org/jira/browse/YARN-1367 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha After RM restart, the RM sends a resync response to NMs that heartbeat to it. Upon receiving the resync response, the NM kills all containers and re-registers with the RM. The NM should be changed to not kill the container and instead inform the RM about all currently running containers including their allocations etc. After the re-register, the NM should send all pending container completions to the RM as usual. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1368) RM should populate running container allocation information from NM resync
Bikas Saha created YARN-1368: Summary: RM should populate running container allocation information from NM resync Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1368) RM should populate running container allocation information from NM resync
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1368: - Issue Type: Sub-task (was: Bug) Parent: YARN-556 RM should populate running container allocation information from NM resync -- Key: YARN-1368 URL: https://issues.apache.org/jira/browse/YARN-1368 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228
[ https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808799#comment-13808799 ] Hudson commented on YARN-1306: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4671 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4671/]) YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Clean up hadoop-sls sample-conf according to YARN-1228 -- Key: YARN-1306 URL: https://issues.apache.org/jira/browse/YARN-1306 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-1306.patch Move fair scheduler allocations configuration to fair-scheduler.xml, and move all scheduler stuffs to yarn-site.xml -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1370) Fair scheduler to re-populate container allocation state
Bikas Saha created YARN-1370: Summary: Fair scheduler to re-populate container allocation state Key: YARN-1370 URL: https://issues.apache.org/jira/browse/YARN-1370 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading
[ https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808800#comment-13808800 ] Hudson commented on YARN-1228: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4671 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4671/]) YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Clean up Fair Scheduler configuration loading - Key: YARN-1228 URL: https://issues.apache.org/jira/browse/YARN-1228 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.0 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch Currently the Fair Scheduler is configured in two ways * An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties. * With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format. The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml. It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary. Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems 1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default. 2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified. We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1369) Capacity scheduler to re-populate container allocation information
Bikas Saha created YARN-1369: Summary: Capacity scheduler to re-populate container allocation information Key: YARN-1369 URL: https://issues.apache.org/jira/browse/YARN-1369 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1371) FIFO scheduler to re-populate container allocation state
Bikas Saha created YARN-1371: Summary: FIFO scheduler to re-populate container allocation state Key: YARN-1371 URL: https://issues.apache.org/jira/browse/YARN-1371 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1369) Capacity scheduler to re-populate container allocation state
[ https://issues.apache.org/jira/browse/YARN-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1369: - Summary: Capacity scheduler to re-populate container allocation state (was: Capacity scheduler to re-populate container allocation information) Capacity scheduler to re-populate container allocation state Key: YARN-1369 URL: https://issues.apache.org/jira/browse/YARN-1369 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running containers and the RM will pass this information to the schedulers along with the node information. The schedulers are currently already informed about previously running apps when the app data is recovered from the store. The scheduler is expected to be able to repopulate its allocation state from the above 2 sources of information. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228
[ https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1306: - Hadoop Flags: Reviewed Clean up hadoop-sls sample-conf according to YARN-1228 -- Key: YARN-1306 URL: https://issues.apache.org/jira/browse/YARN-1306 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1306.patch Move fair scheduler allocations configuration to fair-scheduler.xml, and move all scheduler stuffs to yarn-site.xml -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
Bikas Saha created YARN-1372: Summary: Ensure all completed containers are reported to the AMs across RM restart Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps
Bikas Saha created YARN-1373: Summary: Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps Key: YARN-1373 URL: https://issues.apache.org/jira/browse/YARN-1373 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Currently the RM moves recovered app attempts to the a terminal recovered state and starts a new attempt. Instead, it will have to transition the last attempt to a running state such that it can proceed as normal once the running attempt has resynced with the ApplicationMasterService (YARN-1365 and YARN-1366). If the RM had started the application container before dying then the AM would be up and trying to contact the RM. The RM may have had died before launching the container. For this case, the RM should wait for AM liveliness period and issue a kill container for the stored master container. It should transition this attempt to some RECOVER_ERROR state and proceed to start a new attempt. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808815#comment-13808815 ] Bikas Saha commented on YARN-556: - Added some coarse grained tasks based on the attached proposal. More tasks may be added as details get dissected. RM Restart phase 2 - Work preserving restart Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: Work Preserving RM Restart.pdf YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808965#comment-13808965 ] Hudson commented on YARN-1068: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/378/]) YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536888) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java Add admin support for HA operations --- Key: YARN-1068 URL: https://issues.apache.org/jira/browse/YARN-1068 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Fix For: 2.3.0 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, yarn-1068-9.patch, YARN-1068.Karthik.patch, yarn-1068-prelim.patch Support HA admin operations to facilitate transitioning the RM to Active and Standby states. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading
[ https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808962#comment-13808962 ] Hudson commented on YARN-1228: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/378/]) YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Clean up Fair Scheduler configuration loading - Key: YARN-1228 URL: https://issues.apache.org/jira/browse/YARN-1228 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.0 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch Currently the Fair Scheduler is configured in two ways * An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties. * With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format. The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml. It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary. Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems 1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default. 2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified. We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228
[ https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808994#comment-13808994 ] Hudson commented on YARN-1306: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/]) YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Clean up hadoop-sls sample-conf according to YARN-1228 -- Key: YARN-1306 URL: https://issues.apache.org/jira/browse/YARN-1306 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1306.patch Move fair scheduler allocations configuration to fair-scheduler.xml, and move all scheduler stuffs to yarn-site.xml -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading
[ https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808995#comment-13808995 ] Hudson commented on YARN-1228: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/]) YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Clean up Fair Scheduler configuration loading - Key: YARN-1228 URL: https://issues.apache.org/jira/browse/YARN-1228 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.0 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch Currently the Fair Scheduler is configured in two ways * An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties. * With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format. The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml. It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary. Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems 1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default. 2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified. We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
Devaraj K created YARN-1374: --- Summary: Resource Manager fails to start due to ConcurrentModificationException Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Priority: Blocker Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809024#comment-13809024 ] Devaraj K commented on YARN-1374: - It is occurring when the scheduler monitor is enabled using 'yarn.resourcemanager.scheduler.monitor.enable' configuration. SchedulingMonitor service is getting added to RM services during RM services init which is causing ConcurrentModificationException. SchedulingMonitor service needs to be added to RMActiveServices instead of adding to RM service to avoid this. Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Priority: Blocker Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
unsub
[jira] [Created] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring
Devaraj K created YARN-1375: --- Summary: RM logs get filled with scheduler monitor logs when we enable scheduler monitoring Key: YARN-1375 URL: https://issues.apache.org/jira/browse/YARN-1375 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K When we enable scheduler monitor, it is filling the RM logs with the same queue states periodically. We can log only when any difference with the previous state instead of logging the same message. {code:xml} 2013-10-30 23:30:08,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:11,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:14,465 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:17,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:20,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:23,467 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:26,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:29,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:32,469 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228
[ https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809053#comment-13809053 ] Hudson commented on YARN-1306: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/]) YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Clean up hadoop-sls sample-conf according to YARN-1228 -- Key: YARN-1306 URL: https://issues.apache.org/jira/browse/YARN-1306 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1306.patch Move fair scheduler allocations configuration to fair-scheduler.xml, and move all scheduler stuffs to yarn-site.xml -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading
[ https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809054#comment-13809054 ] Hudson commented on YARN-1228: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/]) YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Clean up Fair Scheduler configuration loading - Key: YARN-1228 URL: https://issues.apache.org/jira/browse/YARN-1228 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.2.0 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch Currently the Fair Scheduler is configured in two ways * An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties. * With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format. The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml. It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary. Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems 1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default. 2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified. We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809085#comment-13809085 ] Steve Loughran commented on YARN-1374: -- If this exists, my code. I'd put the list into an unmodifiable list to stop concurrency problems, because I knew the risk -adding children while adding children existed. It looks like that isn't enough -we need to take a snapshot of the list and then iterate through it. -steve Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Priority: Blocker Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1031) JQuery UI components reference external css in branch-23
[ https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1031: - Attachment: YARN-1031-2-branch-0.23.patch Updating previous patch to include the corresponding images JQuery UI components reference external css in branch-23 Key: YARN-1031 URL: https://issues.apache.org/jira/browse/YARN-1031 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1031-2-branch-0.23.patch, YARN-1031-branch-0.23.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23
[ https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809171#comment-13809171 ] Hadoop QA commented on YARN-1031: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611075/YARN-1031-2-branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2315//console This message is automatically generated. JQuery UI components reference external css in branch-23 Key: YARN-1031 URL: https://issues.apache.org/jira/browse/YARN-1031 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1031-2-branch-0.23.patch, YARN-1031-branch-0.23.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService
[ https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1318: --- Attachment: yarn-1318-2.patch I am able to compile locally. Resubmitting the same patch. Promote AdminService to an Always-On service and merge in RMHAProtocolService - Key: YARN-1318 URL: https://issues.apache.org/jira/browse/YARN-1318 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Labels: ha Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, yarn-1318-2.patch Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1031) JQuery UI components reference external css in branch-23
[ https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1031: - Attachment: YARN-1031-3-branch-0.23.patch Oops, generated the diff with git diff --binary instead of git format-patch. Uploading a new patch. Patch needs to be applied with git apply, and Jenkins doesn't know how to deal with patches against anything but trunk. I tested this patch locally and the icons are once again working correctly on the YARN scheduler page and in tables. JQuery UI components reference external css in branch-23 Key: YARN-1031 URL: https://issues.apache.org/jira/browse/YARN-1031 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1031-2-branch-0.23.patch, YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService
[ https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809244#comment-13809244 ] Hadoop QA commented on YARN-1318: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611085/yarn-1318-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2316//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2316//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2316//console This message is automatically generated. Promote AdminService to an Always-On service and merge in RMHAProtocolService - Key: YARN-1318 URL: https://issues.apache.org/jira/browse/YARN-1318 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Labels: ha Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, yarn-1318-2.patch Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23
[ https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809249#comment-13809249 ] Hadoop QA commented on YARN-1031: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611097/YARN-1031-3-branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2317//console This message is automatically generated. JQuery UI components reference external css in branch-23 Key: YARN-1031 URL: https://issues.apache.org/jira/browse/YARN-1031 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1031-2-branch-0.23.patch, YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23
[ https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809307#comment-13809307 ] Jonathan Eagles commented on YARN-1031: --- +1. Verified Jasons changes. Blocked access to ajax.googleapis.com via /etc/hosts before and after the change to visually inspect. Programmatically scanned network activity via firebug to verify new jquery-ui.css and icons are downloaded via local with no GETs to ajax.googleapis.com. JQuery UI components reference external css in branch-23 Key: YARN-1031 URL: https://issues.apache.org/jira/browse/YARN-1031 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.9 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1031-2-branch-0.23.patch, YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809344#comment-13809344 ] Xuan Gong commented on YARN-1279: - The basic idea is NM notifies the RM about its log aggregation status of all the containers through the node heartBeat. And when RMNode get the log aggregation status from the npdeUpdateEvent, it will forward the log status to related RMApp. After that, the client can get the log aggregation status by calling related API. Expose a client API to allow clients to figure if log aggregation is complete - Key: YARN-1279 URL: https://issues.apache.org/jira/browse/YARN-1279 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Xuan Gong Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete
[ https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809354#comment-13809354 ] Xuan Gong commented on YARN-1279: - Will split the work into two parts. This ticket is used to track the work on RM side. It will include all the changes after RMNode receives the STATUS_UPDATE event, changes on NodeStatus and related PB changes. Expose a client API to allow clients to figure if log aggregation is complete - Key: YARN-1279 URL: https://issues.apache.org/jira/browse/YARN-1279 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Xuan Gong Expose a client API to allow clients to figure if log aggregation is complete -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
Xuan Gong created YARN-1376: --- Summary: NM need to notify the log aggregation status to RM through Node heartbeat Key: YARN-1376 URL: https://issues.apache.org/jira/browse/YARN-1376 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Expose a client API to allow clients to figure if log aggregation is complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809382#comment-13809382 ] Rohith Sharma K S commented on YARN-1366: - Hi Bikas, I have gone through your pdf file attached (YARN-556) and got understand about over all idea behind this subtask. I have some doubts , please clariffy 1. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM I understood like, need to reset lastResponseID to 0 and should not clear ask , release , blacklistAdditions and blacklistRemovals. Is am I correct? 2. During RM restart , RM get new AMRMTokenSecretManager. At this time, there will be difference password. Is this handled from RM side during recovery for individual application? Otherwise impact is , heatbeat to restarted RM get fail with an authentication error passoword does not match ApplicationMasterService should Resync with the AM upon allocate call after restart --- Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly
[ https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809416#comment-13809416 ] Hadoop QA commented on YARN-1321: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1264/YARN-1321.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2318//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2318//console This message is automatically generated. NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly Key: YARN-1321 URL: https://issues.apache.org/jira/browse/YARN-1321 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Attachments: YARN-1321-20131029.txt, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens. The error observed in the client side is something like: {code} ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. NMToken for application attempt : appattempt_1382038445650_0002_01 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_01 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring
[ https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned YARN-1375: -- Assignee: haosdent RM logs get filled with scheduler monitor logs when we enable scheduler monitoring -- Key: YARN-1375 URL: https://issues.apache.org/jira/browse/YARN-1375 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: haosdent When we enable scheduler monitor, it is filling the RM logs with the same queue states periodically. We can log only when any difference with the previous state instead of logging the same message. {code:xml} 2013-10-30 23:30:08,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:11,464 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:14,465 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:17,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:20,466 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:23,467 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:26,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:29,468 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 2013-10-30 23:30:32,469 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService
[ https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1318: --- Attachment: yarn-1318-3.patch New patch to fix the findbugs warning: use get/set methods to access haState. Promote AdminService to an Always-On service and merge in RMHAProtocolService - Key: YARN-1318 URL: https://issues.apache.org/jira/browse/YARN-1318 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Labels: ha Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, yarn-1318-2.patch, yarn-1318-3.patch Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809565#comment-13809565 ] Bikas Saha commented on YARN-1343: -- It looks like in the reconnect with different capacity case we will end up sending 2 NODE_USABLE events for the same node. {code} } rmNode.context.getRMNodes().put(newNode.getNodeID(), newNode); rmNode.context.getDispatcher().getEventHandler().handle( new RMNodeEvent(newNode.getNodeID(), RMNodeEventType.STARTED)); // === First instance when this triggers the ADD_NODE_Transition } rmNode.context.getDispatcher().getEventHandler().handle( new NodesListManagerEvent( NodesListManagerEventType.NODE_USABLE, rmNode)); // === Second instance {code} So we could probably move the second instance to the first if-stmt where it also sends the NodeAddedSchedulerEvent. That would handle the case of the same node coming back while the STARTED event in the else stmt will cover the case of a different node with the same node name coming back (same as a new node being added). {code} if (rmNode.getTotalCapability().equals(newNode.getTotalCapability()) rmNode.getHttpPort() == newNode.getHttpPort()) { // Reset heartbeat ID since node just restarted. rmNode.getLastNodeHeartBeatResponse().setResponseId(0); if (rmNode.getState() != NodeState.UNHEALTHY) { // Only add new node if old state is not UNHEALTHY rmNode.context.getDispatcher().getEventHandler().handle( new NodeAddedSchedulerEvent(rmNode)); } } {code} I modified the patch testcase to try out reconnect with different capability and the above issue showed up. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService
[ https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809592#comment-13809592 ] Hadoop QA commented on YARN-1318: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611154/yarn-1318-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2319//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2319//console This message is automatically generated. Promote AdminService to an Always-On service and merge in RMHAProtocolService - Key: YARN-1318 URL: https://issues.apache.org/jira/browse/YARN-1318 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Labels: ha Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, yarn-1318-2.patch, yarn-1318-3.patch Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1357) TestContainerLaunch.testContainerEnvVariables fails on Windows
[ https://issues.apache.org/jira/browse/YARN-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-1357: Component/s: nodemanager Target Version/s: 3.0.0, 2.2.1 Affects Version/s: 2.2.0 Hadoop Flags: Reviewed +1 for the patch. I'll commit this. TestContainerLaunch.testContainerEnvVariables fails on Windows -- Key: YARN-1357 URL: https://issues.apache.org/jira/browse/YARN-1357 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 3.0.0, 2.2.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-1357.patch This test fails on Windows due to incorrect use of batch script command. Error messages are as follows. {noformat} junit.framework.AssertionFailedError: expected:java.nio.HeapByteBuffer[pos=0 lim=19 cap=19] but was:java.nio.HeapByteBuffer[pos=0 lim=19 cap=19] at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:74) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1359) AMRMToken should not be sent to Container other than AM.
[ https://issues.apache.org/jira/browse/YARN-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809610#comment-13809610 ] Omkar Vinit Joshi commented on YARN-1359: - Today node manager doesn't do this filtering of tokens. Proposal :- Let node manager filter out AMRMToken from tokens while launching container other than AM. Thereby we are only (truly) allowing AM container to talk to RM on AMRM protocol. Enhancements :- today node manager doesn't know which container is AM container. There are lot of problems because of this. So we first need a way to inform node manager about the container being AM. As today node manager comes to know everything about the new container from container token so it will be better to add isAM flag inside the token . Thoughts? (Note: we are anyway not encouraging users to talk to RM using multiple containers which are sharing same AMRMToken). AMRMToken should not be sent to Container other than AM. Key: YARN-1359 URL: https://issues.apache.org/jira/browse/YARN-1359 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1357) TestContainerLaunch.testContainerEnvVariables fails on Windows
[ https://issues.apache.org/jira/browse/YARN-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809615#comment-13809615 ] Hudson commented on YARN-1357: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4675 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4675/]) YARN-1357. TestContainerLaunch.testContainerEnvVariables fails on Windows. Contributed by Chuan Liu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537293) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java TestContainerLaunch.testContainerEnvVariables fails on Windows -- Key: YARN-1357 URL: https://issues.apache.org/jira/browse/YARN-1357 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 3.0.0, 2.2.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: YARN-1357.patch This test fails on Windows due to incorrect use of batch script command. Error messages are as follows. {noformat} junit.framework.AssertionFailedError: expected:java.nio.HeapByteBuffer[pos=0 lim=19 cap=19] but was:java.nio.HeapByteBuffer[pos=0 lim=19 cap=19] at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:74) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level
Omkar Vinit Joshi created YARN-1377: --- Summary: Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level Key: YARN-1377 URL: https://issues.apache.org/jira/browse/YARN-1377 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Today when application finishes it starts aggregating all the logs but that may slow down the whole process significantly... there can be situations where certain containers overwrote the logs .. say in multiple GBsin these scenarios we need a way to cancel log aggregation for certain containers. These can be at per application level or at per container level. thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level
[ https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1377: Assignee: Xuan Gong Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level --- Key: YARN-1377 URL: https://issues.apache.org/jira/browse/YARN-1377 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Xuan Gong Today when application finishes it starts aggregating all the logs but that may slow down the whole process significantly... there can be situations where certain containers overwrote the logs .. say in multiple GBsin these scenarios we need a way to cancel log aggregation for certain containers. These can be at per application level or at per container level. thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel log aggregation at application or container level
[ https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1377: Summary: Log aggregation via node manager should expose expose a way to cancel log aggregation at application or container level (was: Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level) Log aggregation via node manager should expose expose a way to cancel log aggregation at application or container level --- Key: YARN-1377 URL: https://issues.apache.org/jira/browse/YARN-1377 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Xuan Gong Today when application finishes it starts aggregating all the logs but that may slow down the whole process significantly... there can be situations where certain containers overwrote the logs .. say in multiple GBsin these scenarios we need a way to cancel log aggregation for certain containers. These can be at per application level or at per container level. thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1358) TestYarnCLI fails on Windows due to line endings
[ https://issues.apache.org/jira/browse/YARN-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-1358: Component/s: client Target Version/s: 3.0.0, 2.2.1 Affects Version/s: 2.2.0 Hadoop Flags: Reviewed +1 for the patch. I'll commit this. TestYarnCLI fails on Windows due to line endings Key: YARN-1358 URL: https://issues.apache.org/jira/browse/YARN-1358 Project: Hadoop YARN Issue Type: Test Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: YARN-1358.2.patch, YARN-1358.patch The unit test fails on Windows due to incorrect line endings was used for comparing the output from command line output. Error messages are as follows. {noformat} junit.framework.ComparisonFailure: expected:...argument for options[] usage: application ... but was:...argument for options[ ] usage: application ... at junit.framework.Assert.assertEquals(Assert.java:85) at junit.framework.Assert.assertEquals(Assert.java:91) at org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1358) TestYarnCLI fails on Windows due to line endings
[ https://issues.apache.org/jira/browse/YARN-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809651#comment-13809651 ] Hudson commented on YARN-1358: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4676 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4676/]) YARN-1358. TestYarnCLI fails on Windows due to line endings. Contributed by Chuan Liu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537305) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java TestYarnCLI fails on Windows due to line endings Key: YARN-1358 URL: https://issues.apache.org/jira/browse/YARN-1358 Project: Hadoop YARN Issue Type: Test Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Fix For: 3.0.0, 2.2.1 Attachments: YARN-1358.2.patch, YARN-1358.patch The unit test fails on Windows due to incorrect line endings was used for comparing the output from command line output. Error messages are as follows. {noformat} junit.framework.ComparisonFailure: expected:...argument for options[] usage: application ... but was:...argument for options[ ] usage: application ... at junit.framework.Assert.assertEquals(Assert.java:85) at junit.framework.Assert.assertEquals(Assert.java:91) at org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1378) Implement a RMStateStore cleaner for deleting application/attempt info
Jian He created YARN-1378: - Summary: Implement a RMStateStore cleaner for deleting application/attempt info Key: YARN-1378 URL: https://issues.apache.org/jira/browse/YARN-1378 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Now that we are storing the final state of application/attempt instead of removing application/attempt info on application/attempt completion(YARN-891), we need a separate RMStateStore cleaner for cleaning the application/attempt state. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1123: Attachment: YARN-1123-3.patch Adding the Patch with latest rebase and change Container status to Container state and add container exit status Thanks, Mayank [YARN-321] Adding ContainerReport and Protobuf implementation - Key: YARN-1123 URL: https://issues.apache.org/jira/browse/YARN-1123 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch Like YARN-978, we need some client-oriented class to expose the container history info. Neither Container nor RMContainer is the right one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809664#comment-13809664 ] Karthik Kambatla commented on YARN-1374: I see the issue. Will upload a patch shortly to add the SchedulingMonitor to RMActiveServices. Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-1374: -- Assignee: Karthik Kambatla Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
Vinod Kumar Vavilapalli created YARN-1379: - Summary: [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170 Key: YARN-1379 URL: https://issues.apache.org/jira/browse/YARN-1379 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Found this while merging YARN-321 to the latest branch-2. Without this, compilation fails. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1343: - Attachment: YARN-1343.patch [~bikassaha], thanks for the review and catching the double dispatching. Uploading a patch with the changes you suggested and also adding a test to verify the NODE_USABLE event is dispatched when a reconnect happens and the node has different capabilities. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809676#comment-13809676 ] Hadoop QA commented on YARN-1343: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611185/YARN-1343.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2321//console This message is automatically generated. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809679#comment-13809679 ] Hadoop QA commented on YARN-1123: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611179/YARN-1123-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2320//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2320//console This message is automatically generated. [YARN-321] Adding ContainerReport and Protobuf implementation - Key: YARN-1123 URL: https://issues.apache.org/jira/browse/YARN-1123 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch Like YARN-978, we need some client-oriented class to expose the container history info. Neither Container nor RMContainer is the right one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment
[ https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809678#comment-13809678 ] Sandy Ryza commented on YARN-1290: -- [~ywskycn], the current patch no longer applies. Would you mind rebasing? Let continuous scheduling achieve more balanced task assignment --- Key: YARN-1290 URL: https://issues.apache.org/jira/browse/YARN-1290 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, YARN-1290.patch Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks. We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment
[ https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809682#comment-13809682 ] Wei Yan commented on YARN-1290: --- [~sandyr]. I'll fix it. Let continuous scheduling achieve more balanced task assignment --- Key: YARN-1290 URL: https://issues.apache.org/jira/browse/YARN-1290 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, YARN-1290.patch Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks. We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
[ https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1379: -- Attachment: YARN-1379.txt Simple patch that adds package names. Compilation passes after this. [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170 -- Key: YARN-1379 URL: https://issues.apache.org/jira/browse/YARN-1379 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1379.txt Found this while merging YARN-321 to the latest branch-2. Without this, compilation fails. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809719#comment-13809719 ] Hadoop QA commented on YARN-1343: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611185/YARN-1343.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2322//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2322//console This message is automatically generated. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly
[ https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809723#comment-13809723 ] Hudson commented on YARN-1321: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4678 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4678/]) YARN-1321. Changed NMTokenCache to support both singleton and an instance usage. Contributed by Alejandro Abdelnur. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537334) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/NMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/NMTokenCache.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly Key: YARN-1321 URL: https://issues.apache.org/jira/browse/YARN-1321 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.2.1 Attachments: YARN-1321-20131029.txt, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens. The error observed in the client side is something like: {code} ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. NMToken for application attempt : appattempt_1382038445650_0002_01 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_01 {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
[ https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809752#comment-13809752 ] Zhijie Shen commented on YARN-1379: --- +1. Verified it locally. The branch got compiled after the patch was applied. [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170 -- Key: YARN-1379 URL: https://issues.apache.org/jira/browse/YARN-1379 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1379.txt Found this while merging YARN-321 to the latest branch-2. Without this, compilation fails. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1374: --- Attachment: yarn-1374-1.patch Here is a patch that moves creating the monitor policies to RMActiveServices. Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1374-1.patch Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809753#comment-13809753 ] Bikas Saha commented on YARN-1343: -- Can you please double check that testReconnectWithDifferentCapacity is actually resulting in reconnection? The test alters the existing node's capacity and thus I would expect the equality check in ReconnectTransition to consider the nodes same as before. We probably need to create a new node with same name and different capacity. Maybe stepping through in the debugger may show whats really happening. If reconnect with different capability code is getting executed then I would expect mock rm context to have to mock getRMNodes() method and a listener to be added for RMNodeEvents. Or else the test will have exceptions in the output. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1374: --- Attachment: yarn-1374-1.patch Forgot to add license headers. Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1374-1.patch, yarn-1374-1.patch Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809755#comment-13809755 ] Bikas Saha commented on YARN-1366: -- Yes. the lastResponseId needs to reset to 0 and all the client side data like asks, blacklists etc should be sent in full to the RM. The AMRMToken for the attempt is saved and restored. So the existing attempt will be able to reconnect to the restarted RM. This currently works. ApplicationMasterService should Resync with the AM upon allocate call after restart --- Key: YARN-1366 URL: https://issues.apache.org/jira/browse/YARN-1366 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha The ApplicationMasterService currently sends a resync response to which the AM responds by shutting down. The AM behavior is expected to change to calling resyncing with the RM. Resync means resetting the allocate RPC sequence number to 0 and the AM should send its entire outstanding request to the RM. Note that if the AM is making its first allocate call to the RM then things should proceed like normal without needing a resync. The RM will return all containers that have completed since the RM last synced with the AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
[ https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809763#comment-13809763 ] Mayank Bansal commented on YARN-1379: - +1 , verified. Thanks, Mayank [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170 -- Key: YARN-1379 URL: https://issues.apache.org/jira/browse/YARN-1379 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1379.txt Found this while merging YARN-321 to the latest branch-2. Without this, compilation fails. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-311: Attachment: YARN-311-v12.patch In v12 patch, fix a tiny issue with adding volatile tag to ResourceOption in o.a.h.y.sls.nodemanager.NodeInfo (consumed by NMSimulator) after discussing with Luke offline. Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v10.patch, YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
[ https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1379. --- Resolution: Fixed Fix Version/s: YARN-321 Hadoop Flags: Reviewed Tx for the quick verification! I just committed this to branch YARN-321. [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170 -- Key: YARN-1379 URL: https://issues.apache.org/jira/browse/YARN-1379 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: YARN-321 Attachments: YARN-1379.txt Found this while merging YARN-321 to the latest branch-2. Without this, compilation fails. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809781#comment-13809781 ] Hadoop QA commented on YARN-311: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611210/YARN-311-v12.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2324//console This message is automatically generated. Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v10.patch, YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809782#comment-13809782 ] Hadoop QA commented on YARN-1374: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611209/yarn-1374-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2323//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2323//console This message is automatically generated. Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1374-1.patch, yarn-1374-1.patch Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1123: Attachment: YARN-1123-4.patch Adding toString optimization. Thanks, Mayank [YARN-321] Adding ContainerReport and Protobuf implementation - Key: YARN-1123 URL: https://issues.apache.org/jira/browse/YARN-1123 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, YARN-1123-4.patch Like YARN-978, we need some client-oriented class to expose the container history info. Neither Container nor RMContainer is the right one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809793#comment-13809793 ] Karthik Kambatla commented on YARN-1374: The test fails without the fix and passes with it. Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1374-1.patch, yarn-1374-1.patch Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated YARN-1343: - Attachment: YARN-1343.patch TheTestRMNodeTransition tests only verify the expected follow up events for the {{NodeListManager}} are dispatched. To test that reconnect is happening with different capabilities we need to add a test for the {{ResourceTrackerService.registerNodeManager()}}. Uploading a patch that tests a RECONNECTED event dispatching with same and different capabilities. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-978: --- Attachment: YARN-978.9.patch After YARN-947 Had to make changes and remove stuff from this patch. Added toString optimization Thanks, Mayank [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation -- Key: YARN-978 URL: https://issues.apache.org/jira/browse/YARN-978 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, YARN-978.8.patch, YARN-978.9.patch We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-311: Attachment: YARN-311-v12b.patch Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v10.patch, YARN-311-v11.patch, YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809797#comment-13809797 ] Junping Du commented on YARN-311: - The log didn't show it is an build failure (it works well locally), so the jenkins failure above is not related with patch but an accident. Rename it to v12b (exactly the same) patch and submit it again. Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v10.patch, YARN-311-v11.patch, YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.
[ https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1320: -- Summary: Custom log4j properties in Distributed shell does not work properly. (was: Custom log4j properties does not work properly.) Custom log4j properties in Distributed shell does not work properly. Key: YARN-1320 URL: https://issues.apache.org/jira/browse/YARN-1320 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch Distributed shell cannot pick up custom log4j properties (specified with -log_properties). It always uses default log4j properties. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809801#comment-13809801 ] Hadoop QA commented on YARN-978: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611217/YARN-978.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2326//console This message is automatically generated. [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation -- Key: YARN-978 URL: https://issues.apache.org/jira/browse/YARN-978 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, YARN-978.8.patch, YARN-978.9.patch We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation
[ https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809803#comment-13809803 ] Hadoop QA commented on YARN-1123: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611214/YARN-1123-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2327//console This message is automatically generated. [YARN-321] Adding ContainerReport and Protobuf implementation - Key: YARN-1123 URL: https://issues.apache.org/jira/browse/YARN-1123 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, YARN-1123-4.patch Like YARN-978, we need some client-oriented class to expose the container history info. Neither Container nor RMContainer is the right one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.
[ https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809808#comment-13809808 ] Vinod Kumar Vavilapalli commented on YARN-1320: --- Patch looks good to me. Can you update what tests you've done? Also may be we can write a test? By making use of log-aggregation :) Custom log4j properties in Distributed shell does not work properly. Key: YARN-1320 URL: https://issues.apache.org/jira/browse/YARN-1320 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.2.1 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch Distributed shell cannot pick up custom log4j properties (specified with -log_properties). It always uses default log4j properties. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809814#comment-13809814 ] Bikas Saha commented on YARN-1343: -- lgtm. In the new testReconnect() we should check that the number of RMNode in rmContext.getRMNodes() is still 1. eg. the second node actually replaced the previous node (desired behavior) as opposed to both getting into the list. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809815#comment-13809815 ] Bikas Saha commented on YARN-1343: -- +1 for committing. Thanks! NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809820#comment-13809820 ] Hadoop QA commented on YARN-311: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611218/YARN-311-v12b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2325//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2325//console This message is automatically generated. Dynamic node resource configuration: core scheduler changes --- Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-311-v10.patch, YARN-311-v11.patch, YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809821#comment-13809821 ] Alejandro Abdelnur commented on YARN-1343: -- [~bikassaha], on your ask about checking for the node count, I don't think is necessary, if a reconnect is triggered, it means it was found, in the {{ResourceTrackerService.registerNodeManager()}}: {code} RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode); if (oldNode == null) { this.rmContext.getDispatcher().getEventHandler().handle( new RMNodeEvent(nodeId, RMNodeEventType.STARTED)); } else { LOG.info(Reconnect from the node at: + host); this.nmLivelinessMonitor.unregister(nodeId); this.rmContext.getDispatcher().getEventHandler().handle( new RMNodeReconnectEvent(nodeId, rmNode)); } {code} thx NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-891) Store completed application information in RM state store
[ https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-891: - Attachment: YARN-891.8.patch Thanks Vinod/Bikas for the reviews. - New patch fixed the above comments. - Made a new change in RMAppManager.recover() that recovers the application synchronously , as otherwise client is possible to see the application not recovered because ClientRMService is already started at that time. Store completed application information in RM state store - Key: YARN-891 URL: https://issues.apache.org/jira/browse/YARN-891 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1290) Let continuous scheduling achieve more balanced task assignment
[ https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1290: -- Attachment: YARN-1290.patch Let continuous scheduling achieve more balanced task assignment --- Key: YARN-1290 URL: https://issues.apache.org/jira/browse/YARN-1290 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, YARN-1290.patch, YARN-1290.patch Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks. We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-891) Store completed application information in RM state store
[ https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809840#comment-13809840 ] Hadoop QA commented on YARN-891: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611222/YARN-891.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2328//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2328//console This message is automatically generated. Store completed application information in RM state store - Key: YARN-891 URL: https://issues.apache.org/jira/browse/YARN-891 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-891) Store completed application information in RM state store
[ https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-891: - Attachment: YARN-891.9.patch Fixed the test failure Store completed application information in RM state store - Key: YARN-891 URL: https://issues.apache.org/jira/browse/YARN-891 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-891) Store completed application information in RM state store
[ https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809849#comment-13809849 ] Hadoop QA commented on YARN-891: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611232/YARN-891.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2330//console This message is automatically generated. Store completed application information in RM state store - Key: YARN-891 URL: https://issues.apache.org/jira/browse/YARN-891 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment
[ https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809851#comment-13809851 ] Hadoop QA commented on YARN-1290: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611228/YARN-1290.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2329//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2329//console This message is automatically generated. Let continuous scheduling achieve more balanced task assignment --- Key: YARN-1290 URL: https://issues.apache.org/jira/browse/YARN-1290 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, YARN-1290.patch, YARN-1290.patch Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks. We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-998) Persistent resource change during NM restart
[ https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-998: --- Assignee: Junping Du Persistent resource change during NM restart Key: YARN-998 URL: https://issues.apache.org/jira/browse/YARN-998 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du When NM is restarted by plan or from a failure, previous dynamic resource setting should be kept for consistency. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809874#comment-13809874 ] Chris Douglas commented on YARN-1374: - +1 lgtm Resource Manager fails to start due to ConcurrentModificationException -- Key: YARN-1374 URL: https://issues.apache.org/jira/browse/YARN-1374 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1374-1.patch, yarn-1374-1.patch Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 / {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved YARN-1343. -- Resolution: Fixed Hadoop Flags: Reviewed NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs
[ https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809883#comment-13809883 ] Hudson commented on YARN-1343: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4680 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4680/]) YARN-1343. NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs. (tucu) (tucu: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537368) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs - Key: YARN-1343 URL: https://issues.apache.org/jira/browse/YARN-1343 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 2.2.1 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, YARN-1343.patch If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1324) NodeManager potentially causes unnecessary operations on all its disks
[ https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809885#comment-13809885 ] Chris Douglas commented on YARN-1324: - bq. When does MR use multiple disks in the same task/container? Isnt the map output written to a single indexed partition file? Spills are spread across all volumes, but merged into a single file at the end. Would randomizing the order of disks be a reasonable short-term workaround for (1)? Future changes could weight/elide directories based on other criteria, but that's a simple change. So would changing the random selection to bias its search order using a hash of the task id (instead of disk usage when creating the spill), so the ShuffleHandler could search fewer directories on average. I agree with Vinod, it would be hard to prevent the search altogether... bq. Requiring apps to specify the number of disks for a container is also a viable solution and can be done in a back-compatible manner by changing MR to specify multiple disks and leaving the default to 1 for apps that dont care. This makes sense as a hint, but some users might interpret it as a constraint and be confused when a NM schedules them on a node the reports fewer local dirs (due to failure, heterogeneous config). NodeManager potentially causes unnecessary operations on all its disks -- Key: YARN-1324 URL: https://issues.apache.org/jira/browse/YARN-1324 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Bikas Saha Currently, for every container, the NM creates a directory on every disk and expects the container-task to choose 1 of them and load balance the use of the disks across all containers. 1) This may have worked fine in the MR world where MR tasks would randomly choose dirs but in general we cannot expect every app/task writer to understand these nuances and randomly pick disks. So we could end up overloading the first disk if most people decide to use the first disk. 2) This makes a number of NM operations to scan every disk (thus randomizing that disk) to locate the dir which the task has actually chosen to use for its files. Makes all these operations expensive for the NM as well as disruptive for users of disks that did not have the real task working dirs. I propose that NM should up-front decide the disk it is assigning to tasks. It could choose to do so randomly or weighted-randomly by looking at space and load on each disk. So it could do a better job of load balancing. Then, it would associate the chosen working directory with the container context so that subsequent operations on the NM can directly seek to the correct location instead of having to seek on every disk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-891) Store completed application information in RM state store
[ https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-891: - Attachment: YARN-891.10.patch resubmit the patch Store completed application information in RM state store - Key: YARN-891 URL: https://issues.apache.org/jira/browse/YARN-891 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-891.10.patch, YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-891) Store completed application information in RM state store
[ https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809934#comment-13809934 ] Hadoop QA commented on YARN-891: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611248/YARN-891.10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2331//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2331//console This message is automatically generated. Store completed application information in RM state store - Key: YARN-891 URL: https://issues.apache.org/jira/browse/YARN-891 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-891.10.patch, YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195 -- This message was sent by Atlassian JIRA (v6.1#6144)