[jira] [Created] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1366:


 Summary: ApplicationMasterService should Resync with the AM upon 
allocate call after restart
 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


The ApplicationMasterService currently sends a resync response to which the AM 
responds by shutting down. The AM behavior is expected to change to calling 
resyncing with the RM. Resync means resetting the allocate RPC sequence number 
to 0 and the AM should send its entire outstanding request to the RM. Note that 
if the AM is making its first allocate call to the RM then things should 
proceed like normal without needing a resync. The RM will return all containers 
that have completed since the RM last synced with the AM. Some container 
completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1367) After restart NM should resync with the RM without killing containers

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1367:


 Summary: After restart NM should resync with the RM without 
killing containers
 Key: YARN-1367
 URL: https://issues.apache.org/jira/browse/YARN-1367
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


After RM restart, the RM sends a resync response to NMs that heartbeat to it.  
Upon receiving the resync response, the NM kills all containers and 
re-registers with the RM. The NM should be changed to not kill the container 
and instead inform the RM about all currently running containers including 
their allocations etc. After the re-register, the NM should send all pending 
container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1368) RM should populate running container allocation information from NM resync

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1368:


 Summary: RM should populate running container allocation 
information from NM resync
 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha


YARN-1367 adds support for the NM to tell the RM about all currently running 
containers upon registration. The RM needs to send this information to the 
schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1368) RM should populate running container allocation information from NM resync

2013-10-30 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1368:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-556

 RM should populate running container allocation information from NM resync
 --

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha

 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808799#comment-13808799
 ] 

Hudson commented on YARN-1306:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4671 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4671/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Clean up hadoop-sls sample-conf according to YARN-1228
 --

 Key: YARN-1306
 URL: https://issues.apache.org/jira/browse/YARN-1306
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-1306.patch


 Move fair scheduler allocations configuration to fair-scheduler.xml, and move 
 all scheduler stuffs to yarn-site.xml



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1370) Fair scheduler to re-populate container allocation state

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1370:


 Summary: Fair scheduler to re-populate container allocation state
 Key: YARN-1370
 URL: https://issues.apache.org/jira/browse/YARN-1370
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
containers and the RM will pass this information to the schedulers along with 
the node information. The schedulers are currently already informed about 
previously running apps when the app data is recovered from the store. The 
scheduler is expected to be able to repopulate its allocation state from the 
above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808800#comment-13808800
 ] 

Hudson commented on YARN-1228:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4671 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4671/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Clean up Fair Scheduler configuration loading
 -

 Key: YARN-1228
 URL: https://issues.apache.org/jira/browse/YARN-1228
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.0

 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch


 Currently the Fair Scheduler is configured in two ways
 * An allocations file that has a different format than the standard Hadoop 
 configuration file, which makes it easier to specify hierarchical objects 
 like queues and their properties. 
 * With properties like yarn.scheduler.fair.max.assign that are specified in 
 the standard Hadoop configuration format.
 The standard and default way of configuring it is to use fair-scheduler.xml 
 as the allocations file and to put the yarn.scheduler properties in 
 yarn-site.xml.
 It is also possible to specify a different file as the allocations file, and 
 to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
 interpreted as in the standard Hadoop configuration format.  This flexibility 
 is both confusing and unnecessary.
 Additionally, the allocation file is loaded as fair-scheduler.xml from the 
 classpath if it is not specified, but is loaded as a File if it is.  This 
 causes two problems
 1. We see different behavior when not setting the 
 yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
 which is its default.
 2. Classloaders may choose to cache resources, which can break the reload 
 logic when yarn.scheduler.fair.allocation.file is not specified.
 We should never allow the yarn.scheduler properties to go into 
 fair-scheduler.xml.  And we should always load the allocations file as a 
 file, not as a resource on the classpath.  To preserve existing behavior and 
 allow loading files from the classpath, we can look for files on the 
 classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1369) Capacity scheduler to re-populate container allocation information

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1369:


 Summary: Capacity scheduler to re-populate container allocation 
information
 Key: YARN-1369
 URL: https://issues.apache.org/jira/browse/YARN-1369
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
containers and the RM will pass this information to the schedulers along with 
the node information. The schedulers are currently already informed about 
previously running apps when the app data is recovered from the store. The 
scheduler is expected to be able to repopulate its allocation state from the 
above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1371) FIFO scheduler to re-populate container allocation state

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1371:


 Summary: FIFO scheduler to re-populate container allocation state
 Key: YARN-1371
 URL: https://issues.apache.org/jira/browse/YARN-1371
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
containers and the RM will pass this information to the schedulers along with 
the node information. The schedulers are currently already informed about 
previously running apps when the app data is recovered from the store. The 
scheduler is expected to be able to repopulate its allocation state from the 
above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1369) Capacity scheduler to re-populate container allocation state

2013-10-30 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1369:
-

Summary: Capacity scheduler to re-populate container allocation state  
(was: Capacity scheduler to re-populate container allocation information)

 Capacity scheduler to re-populate container allocation state
 

 Key: YARN-1369
 URL: https://issues.apache.org/jira/browse/YARN-1369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha

 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1306:
-

Hadoop Flags: Reviewed

 Clean up hadoop-sls sample-conf according to YARN-1228
 --

 Key: YARN-1306
 URL: https://issues.apache.org/jira/browse/YARN-1306
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1306.patch


 Move fair scheduler allocations configuration to fair-scheduler.xml, and move 
 all scheduler stuffs to yarn-site.xml



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1372:


 Summary: Ensure all completed containers are reported to the AMs 
across RM restart
 Key: YARN-1372
 URL: https://issues.apache.org/jira/browse/YARN-1372
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


Currently the NM informs the RM about completed containers and then removes 
those containers from the RM notification list. The RM passes on that completed 
container information to the AM and the AM pulls this data. If the RM dies 
before the AM pulls this data then the AM may not be able to get this 
information again. To fix this, NM should maintain a separate list of such 
completed container notifications sent to the RM. After the AM has pulled the 
containers from the RM then the RM will inform the NM about it and the NM can 
remove the completed container from the new list. Upon re-register with the RM 
(after RM restart) the NM should send the entire list of completed containers 
to the RM along with any other containers that completed while the RM was dead. 
This ensures that the RM can inform the AM's about all completed containers. 
Some container completions may be reported more than once since the AM may have 
pulled the container but the RM may die before notifying the NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1373:


 Summary: Transition RMApp and RMAppAttempt state to RUNNING after 
restart for recovered running apps
 Key: YARN-1373
 URL: https://issues.apache.org/jira/browse/YARN-1373
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


Currently the RM moves recovered app attempts to the a terminal recovered state 
and starts a new attempt. Instead, it will have to transition the last attempt 
to a running state such that it can proceed as normal once the running attempt 
has resynced with the ApplicationMasterService (YARN-1365 and YARN-1366). If 
the RM had started the application container before dying then the AM would be 
up and trying to contact the RM. The RM may have had died before launching the 
container. For this case, the RM should wait for AM liveliness period and issue 
a kill container for the stored master container. It should transition this 
attempt to some RECOVER_ERROR state and proceed to start a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808815#comment-13808815
 ] 

Bikas Saha commented on YARN-556:
-

Added some coarse grained tasks based on the attached proposal. More tasks may 
be added as details get dissected.

 RM Restart phase 2 - Work preserving restart
 

 Key: YARN-556
 URL: https://issues.apache.org/jira/browse/YARN-556
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: Work Preserving RM Restart.pdf


 YARN-128 covered storing the state needed for the RM to recover critical 
 information. This umbrella jira will track changes needed to recover the 
 running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808965#comment-13808965
 ] 

Hudson commented on YARN-1068:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536888)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Fix For: 2.3.0

 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, 
 yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
 yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
 yarn-1068-9.patch, YARN-1068.Karthik.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808962#comment-13808962
 ] 

Hudson commented on YARN-1228:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Clean up Fair Scheduler configuration loading
 -

 Key: YARN-1228
 URL: https://issues.apache.org/jira/browse/YARN-1228
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.0

 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch


 Currently the Fair Scheduler is configured in two ways
 * An allocations file that has a different format than the standard Hadoop 
 configuration file, which makes it easier to specify hierarchical objects 
 like queues and their properties. 
 * With properties like yarn.scheduler.fair.max.assign that are specified in 
 the standard Hadoop configuration format.
 The standard and default way of configuring it is to use fair-scheduler.xml 
 as the allocations file and to put the yarn.scheduler properties in 
 yarn-site.xml.
 It is also possible to specify a different file as the allocations file, and 
 to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
 interpreted as in the standard Hadoop configuration format.  This flexibility 
 is both confusing and unnecessary.
 Additionally, the allocation file is loaded as fair-scheduler.xml from the 
 classpath if it is not specified, but is loaded as a File if it is.  This 
 causes two problems
 1. We see different behavior when not setting the 
 yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
 which is its default.
 2. Classloaders may choose to cache resources, which can break the reload 
 logic when yarn.scheduler.fair.allocation.file is not specified.
 We should never allow the yarn.scheduler properties to go into 
 fair-scheduler.xml.  And we should always load the allocations file as a 
 file, not as a resource on the classpath.  To preserve existing behavior and 
 allow loading files from the classpath, we can look for files on the 
 classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808994#comment-13808994
 ] 

Hudson commented on YARN-1306:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Clean up hadoop-sls sample-conf according to YARN-1228
 --

 Key: YARN-1306
 URL: https://issues.apache.org/jira/browse/YARN-1306
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1306.patch


 Move fair scheduler allocations configuration to fair-scheduler.xml, and move 
 all scheduler stuffs to yarn-site.xml



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808995#comment-13808995
 ] 

Hudson commented on YARN-1228:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Clean up Fair Scheduler configuration loading
 -

 Key: YARN-1228
 URL: https://issues.apache.org/jira/browse/YARN-1228
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.0

 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch


 Currently the Fair Scheduler is configured in two ways
 * An allocations file that has a different format than the standard Hadoop 
 configuration file, which makes it easier to specify hierarchical objects 
 like queues and their properties. 
 * With properties like yarn.scheduler.fair.max.assign that are specified in 
 the standard Hadoop configuration format.
 The standard and default way of configuring it is to use fair-scheduler.xml 
 as the allocations file and to put the yarn.scheduler properties in 
 yarn-site.xml.
 It is also possible to specify a different file as the allocations file, and 
 to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
 interpreted as in the standard Hadoop configuration format.  This flexibility 
 is both confusing and unnecessary.
 Additionally, the allocation file is loaded as fair-scheduler.xml from the 
 classpath if it is not specified, but is loaded as a File if it is.  This 
 causes two problems
 1. We see different behavior when not setting the 
 yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
 which is its default.
 2. Classloaders may choose to cache resources, which can break the reload 
 logic when yarn.scheduler.fair.allocation.file is not specified.
 We should never allow the yarn.scheduler properties to go into 
 fair-scheduler.xml.  And we should always load the allocations file as a 
 file, not as a resource on the classpath.  To preserve existing behavior and 
 allow loading files from the classpath, we can look for files on the 
 classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Devaraj K (JIRA)
Devaraj K created YARN-1374:
---

 Summary: Resource Manager fails to start due to 
ConcurrentModificationException
 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Priority: Blocker


Resource Manager is failing to start with the below 
ConcurrentModificationException.

{code:xml}
2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing 
hosts (include/exclude) list
2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service 
ResourceManager failed in state INITED; cause: 
java.util.ConcurrentModificationException
java.util.ConcurrentModificationException
at 
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
2013-10-30 20:22:42,378 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
Transitioning to standby
2013-10-30 20:22:42,378 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned 
to standby
2013-10-30 20:22:42,378 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
ResourceManager
java.util.ConcurrentModificationException
at 
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
2013-10-30 20:22:42,379 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
/
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809024#comment-13809024
 ] 

Devaraj K commented on YARN-1374:
-

It is occurring when the scheduler monitor is enabled using 
'yarn.resourcemanager.scheduler.monitor.enable' configuration.

SchedulingMonitor service is getting added to RM services during RM services 
init which is causing ConcurrentModificationException. SchedulingMonitor 
service needs to be added to RMActiveServices instead of adding to RM service 
to avoid this.

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Priority: Blocker

 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


unsub

2013-10-30 Thread skyw932



[jira] [Created] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring

2013-10-30 Thread Devaraj K (JIRA)
Devaraj K created YARN-1375:
---

 Summary: RM logs get filled with scheduler monitor logs when we 
enable scheduler monitoring
 Key: YARN-1375
 URL: https://issues.apache.org/jira/browse/YARN-1375
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K



When we enable scheduler monitor, it is filling the RM logs with the same queue 
states periodically. We can log only when any difference with the previous 
state instead of logging the same message. 

{code:xml}
2013-10-30 23:30:08,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:11,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:14,465 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:17,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:20,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:23,467 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:26,468 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:29,468 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:32,469 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
{code}




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809053#comment-13809053
 ] 

Hudson commented on YARN-1306:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Clean up hadoop-sls sample-conf according to YARN-1228
 --

 Key: YARN-1306
 URL: https://issues.apache.org/jira/browse/YARN-1306
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.3.0

 Attachments: YARN-1306.patch


 Move fair scheduler allocations configuration to fair-scheduler.xml, and move 
 all scheduler stuffs to yarn-site.xml



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809054#comment-13809054
 ] 

Hudson commented on YARN-1228:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Clean up Fair Scheduler configuration loading
 -

 Key: YARN-1228
 URL: https://issues.apache.org/jira/browse/YARN-1228
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.0

 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch


 Currently the Fair Scheduler is configured in two ways
 * An allocations file that has a different format than the standard Hadoop 
 configuration file, which makes it easier to specify hierarchical objects 
 like queues and their properties. 
 * With properties like yarn.scheduler.fair.max.assign that are specified in 
 the standard Hadoop configuration format.
 The standard and default way of configuring it is to use fair-scheduler.xml 
 as the allocations file and to put the yarn.scheduler properties in 
 yarn-site.xml.
 It is also possible to specify a different file as the allocations file, and 
 to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
 interpreted as in the standard Hadoop configuration format.  This flexibility 
 is both confusing and unnecessary.
 Additionally, the allocation file is loaded as fair-scheduler.xml from the 
 classpath if it is not specified, but is loaded as a File if it is.  This 
 causes two problems
 1. We see different behavior when not setting the 
 yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
 which is its default.
 2. Classloaders may choose to cache resources, which can break the reload 
 logic when yarn.scheduler.fair.allocation.file is not specified.
 We should never allow the yarn.scheduler properties to go into 
 fair-scheduler.xml.  And we should always load the allocations file as a 
 file, not as a resource on the classpath.  To preserve existing behavior and 
 allow loading files from the classpath, we can look for files on the 
 classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809085#comment-13809085
 ] 

Steve Loughran commented on YARN-1374:
--

If this exists, my code. I'd put the list into an unmodifiable list to stop 
concurrency problems, because I knew the risk -adding children while adding 
children existed.

It looks like that isn't enough -we need to take a snapshot of the list and 
then iterate through it.

-steve

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Priority: Blocker

 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1031:
-

Attachment: YARN-1031-2-branch-0.23.patch

Updating previous patch to include the corresponding images

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-2-branch-0.23.patch, 
 YARN-1031-branch-0.23.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809171#comment-13809171
 ] 

Hadoop QA commented on YARN-1031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611075/YARN-1031-2-branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2315//console

This message is automatically generated.

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-2-branch-0.23.patch, 
 YARN-1031-branch-0.23.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1318:
---

Attachment: yarn-1318-2.patch

I am able to compile locally. Resubmitting the same patch.

 Promote AdminService to an Always-On service and merge in RMHAProtocolService
 -

 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
  Labels: ha
 Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
 yarn-1318-2.patch


 Per discussion in YARN-1068, we want AdminService to handle HA-admin 
 operations in addition to the regular non-HA admin operations. To facilitate 
 this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1031:
-

Attachment: YARN-1031-3-branch-0.23.patch

Oops, generated the diff with git diff --binary instead of git format-patch.  
Uploading a new patch.

Patch needs to be applied with git apply, and Jenkins doesn't know how to deal 
with patches against anything but trunk.

I tested this patch locally and the icons are once again working correctly on 
the YARN scheduler page and in tables.

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-2-branch-0.23.patch, 
 YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809244#comment-13809244
 ] 

Hadoop QA commented on YARN-1318:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611085/yarn-1318-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2316//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2316//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2316//console

This message is automatically generated.

 Promote AdminService to an Always-On service and merge in RMHAProtocolService
 -

 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
  Labels: ha
 Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
 yarn-1318-2.patch


 Per discussion in YARN-1068, we want AdminService to handle HA-admin 
 operations in addition to the regular non-HA admin operations. To facilitate 
 this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809249#comment-13809249
 ] 

Hadoop QA commented on YARN-1031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611097/YARN-1031-3-branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2317//console

This message is automatically generated.

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-2-branch-0.23.patch, 
 YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809307#comment-13809307
 ] 

Jonathan Eagles commented on YARN-1031:
---

+1. Verified Jasons changes. Blocked access to ajax.googleapis.com via 
/etc/hosts before and after the change to visually inspect. Programmatically 
scanned network activity via firebug to verify new jquery-ui.css and icons are 
downloaded via local with no GETs to ajax.googleapis.com.

 JQuery UI components reference external css in branch-23
 

 Key: YARN-1031
 URL: https://issues.apache.org/jira/browse/YARN-1031
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.9
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-1031-2-branch-0.23.patch, 
 YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-10-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809344#comment-13809344
 ] 

Xuan Gong commented on YARN-1279:
-

The basic idea is NM notifies the RM about its log aggregation status of all 
the containers through the node heartBeat. And when RMNode get the log 
aggregation status from the npdeUpdateEvent, it will forward the log status to 
related RMApp. After that, the client can get the log aggregation status by 
calling related API.

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong

 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-10-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809354#comment-13809354
 ] 

Xuan Gong commented on YARN-1279:
-

Will split the work into two parts. This ticket is used to track the work on RM 
side. It will include all the changes after RMNode receives the STATUS_UPDATE 
event, changes on NodeStatus and related PB changes.

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong

 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat

2013-10-30 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-1376:
---

 Summary: NM need to notify the log aggregation status to RM 
through Node heartbeat
 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


Expose a client API to allow clients to figure if log aggregation is complete. 
The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2013-10-30 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809382#comment-13809382
 ] 

Rohith Sharma K S commented on YARN-1366:
-

Hi Bikas,
 I have gone through your pdf file attached (YARN-556) and got understand about 
over all idea behind this subtask.
I have some doubts , please clariffy 

1. Resync means resetting the allocate RPC sequence number to 0 and the AM 
should send its entire outstanding request to the RM
 I understood like, need to reset lastResponseID to 0 and should not clear 
 ask , release , blacklistAdditions and blacklistRemovals. Is am I correct?

2. During RM restart , RM get new AMRMTokenSecretManager. At this time, there 
will be difference password. Is this handled from RM side during recovery for 
individual application? Otherwise impact is , heatbeat to restarted RM get fail 
with an authentication error passoword does not match


 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha

 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809416#comment-13809416
 ] 

Hadoop QA commented on YARN-1321:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/1264/YARN-1321.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2318//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2318//console

This message is automatically generated.

 NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to 
 work correctly
 

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Attachments: YARN-1321-20131029.txt, YARN-1321.patch, 
 YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, 
 YARN-1321.patch


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring

2013-10-30 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned YARN-1375:
--

Assignee: haosdent

 RM logs get filled with scheduler monitor logs when we enable scheduler 
 monitoring
 --

 Key: YARN-1375
 URL: https://issues.apache.org/jira/browse/YARN-1375
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: haosdent

 When we enable scheduler monitor, it is filling the RM logs with the same 
 queue states periodically. We can log only when any difference with the 
 previous state instead of logging the same message. 
 {code:xml}
 2013-10-30 23:30:08,464 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:11,464 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:14,465 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:17,466 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:20,466 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:23,467 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:26,468 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:29,468 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 2013-10-30 23:30:32,469 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1318:
---

Attachment: yarn-1318-3.patch

New patch to fix the findbugs warning: use get/set methods to access haState.

 Promote AdminService to an Always-On service and merge in RMHAProtocolService
 -

 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
  Labels: ha
 Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
 yarn-1318-2.patch, yarn-1318-3.patch


 Per discussion in YARN-1068, we want AdminService to handle HA-admin 
 operations in addition to the regular non-HA admin operations. To facilitate 
 this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809565#comment-13809565
 ] 

Bikas Saha commented on YARN-1343:
--

It looks like in the reconnect with different capacity case we will end up 
sending 2 NODE_USABLE events for the same node.
{code}
}
rmNode.context.getRMNodes().put(newNode.getNodeID(), newNode);
rmNode.context.getDispatcher().getEventHandler().handle(
new RMNodeEvent(newNode.getNodeID(), RMNodeEventType.STARTED)); // 
=== First instance when this triggers the ADD_NODE_Transition
  }
  rmNode.context.getDispatcher().getEventHandler().handle(
  new NodesListManagerEvent(
  NodesListManagerEventType.NODE_USABLE, rmNode)); // === Second 
instance
{code}

So we could probably move the second instance to the first if-stmt where it 
also sends the NodeAddedSchedulerEvent. That would handle the case of the same 
node coming back while the STARTED event in the else stmt will cover the case 
of a different node with the same node name coming back (same as a new node 
being added).
{code}
if (rmNode.getTotalCapability().equals(newNode.getTotalCapability())
   rmNode.getHttpPort() == newNode.getHttpPort()) {
// Reset heartbeat ID since node just restarted.
rmNode.getLastNodeHeartBeatResponse().setResponseId(0);
if (rmNode.getState() != NodeState.UNHEALTHY) {
  // Only add new node if old state is not UNHEALTHY
  rmNode.context.getDispatcher().getEventHandler().handle(
  new NodeAddedSchedulerEvent(rmNode));
}
  }
{code}

I modified the patch testcase to try out reconnect with different capability 
and the above issue showed up.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809592#comment-13809592
 ] 

Hadoop QA commented on YARN-1318:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611154/yarn-1318-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2319//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2319//console

This message is automatically generated.

 Promote AdminService to an Always-On service and merge in RMHAProtocolService
 -

 Key: YARN-1318
 URL: https://issues.apache.org/jira/browse/YARN-1318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
  Labels: ha
 Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
 yarn-1318-2.patch, yarn-1318-3.patch


 Per discussion in YARN-1068, we want AdminService to handle HA-admin 
 operations in addition to the regular non-HA admin operations. To facilitate 
 this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1357) TestContainerLaunch.testContainerEnvVariables fails on Windows

2013-10-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1357:


  Component/s: nodemanager
 Target Version/s: 3.0.0, 2.2.1
Affects Version/s: 2.2.0
 Hadoop Flags: Reviewed

+1 for the patch.  I'll commit this.

 TestContainerLaunch.testContainerEnvVariables fails on Windows
 --

 Key: YARN-1357
 URL: https://issues.apache.org/jira/browse/YARN-1357
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-1357.patch


 This test fails on Windows due to incorrect use of batch script command. 
 Error messages are as follows.
 {noformat}
 junit.framework.AssertionFailedError: expected:java.nio.HeapByteBuffer[pos=0 
 lim=19 cap=19] but was:java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at junit.framework.Assert.assertEquals(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1359) AMRMToken should not be sent to Container other than AM.

2013-10-30 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809610#comment-13809610
 ] 

Omkar Vinit Joshi commented on YARN-1359:
-

Today node manager doesn't do this filtering of tokens. 

Proposal :-
Let node manager filter out AMRMToken from tokens while launching container 
other than AM. Thereby we are only (truly) allowing AM container to talk to RM 
on AMRM protocol.

Enhancements :- today node manager doesn't know which container is AM 
container. There are lot of problems because of this. So we first need a way to 
inform node manager about the container being AM. As today node manager comes 
to know everything about the new container from container token so it will be 
better to add isAM flag inside the token . Thoughts? 
(Note: we are anyway not encouraging users to talk to RM using multiple 
containers which are sharing same AMRMToken).


 AMRMToken should not be sent to Container other than AM.
 

 Key: YARN-1359
 URL: https://issues.apache.org/jira/browse/YARN-1359
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1357) TestContainerLaunch.testContainerEnvVariables fails on Windows

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809615#comment-13809615
 ] 

Hudson commented on YARN-1357:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4675 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4675/])
YARN-1357. TestContainerLaunch.testContainerEnvVariables fails on Windows. 
Contributed by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537293)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


 TestContainerLaunch.testContainerEnvVariables fails on Windows
 --

 Key: YARN-1357
 URL: https://issues.apache.org/jira/browse/YARN-1357
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 3.0.0, 2.2.1

 Attachments: YARN-1357.patch


 This test fails on Windows due to incorrect use of batch script command. 
 Error messages are as follows.
 {noformat}
 junit.framework.AssertionFailedError: expected:java.nio.HeapByteBuffer[pos=0 
 lim=19 cap=19] but was:java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at junit.framework.Assert.assertEquals(Assert.java:74)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level

2013-10-30 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-1377:
---

 Summary: Log aggregation via node manager should expose expose a 
way to cancel aggregation at application or container level
 Key: YARN-1377
 URL: https://issues.apache.org/jira/browse/YARN-1377
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi


Today when application finishes it starts aggregating all the logs but that may 
slow down the whole process significantly...
there can be situations where certain containers overwrote the logs .. say in 
multiple GBsin these scenarios we need a way to cancel log aggregation for 
certain containers. These can be at per application level or at per container 
level.
thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level

2013-10-30 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1377:


Assignee: Xuan Gong

 Log aggregation via node manager should expose expose a way to cancel 
 aggregation at application or container level
 ---

 Key: YARN-1377
 URL: https://issues.apache.org/jira/browse/YARN-1377
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Xuan Gong

 Today when application finishes it starts aggregating all the logs but that 
 may slow down the whole process significantly...
 there can be situations where certain containers overwrote the logs .. say in 
 multiple GBsin these scenarios we need a way to cancel log aggregation 
 for certain containers. These can be at per application level or at per 
 container level.
 thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel log aggregation at application or container level

2013-10-30 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1377:


Summary: Log aggregation via node manager should expose expose a way to 
cancel log aggregation at application or container level  (was: Log aggregation 
via node manager should expose expose a way to cancel aggregation at 
application or container level)

 Log aggregation via node manager should expose expose a way to cancel log 
 aggregation at application or container level
 ---

 Key: YARN-1377
 URL: https://issues.apache.org/jira/browse/YARN-1377
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Xuan Gong

 Today when application finishes it starts aggregating all the logs but that 
 may slow down the whole process significantly...
 there can be situations where certain containers overwrote the logs .. say in 
 multiple GBsin these scenarios we need a way to cancel log aggregation 
 for certain containers. These can be at per application level or at per 
 container level.
 thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1358) TestYarnCLI fails on Windows due to line endings

2013-10-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1358:


  Component/s: client
 Target Version/s: 3.0.0, 2.2.1
Affects Version/s: 2.2.0
 Hadoop Flags: Reviewed

+1 for the patch.  I'll commit this.

 TestYarnCLI fails on Windows due to line endings
 

 Key: YARN-1358
 URL: https://issues.apache.org/jira/browse/YARN-1358
 Project: Hadoop YARN
  Issue Type: Test
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-1358.2.patch, YARN-1358.patch


 The unit test fails on Windows due to incorrect line endings was used for 
 comparing the output from command line output. Error messages are as follows.
 {noformat}
 junit.framework.ComparisonFailure: expected:...argument for options[]
 usage: application
 ... but was:...argument for options[
 ]
 usage: application
 ...
   at junit.framework.Assert.assertEquals(Assert.java:85)
   at junit.framework.Assert.assertEquals(Assert.java:91)
   at 
 org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1358) TestYarnCLI fails on Windows due to line endings

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809651#comment-13809651
 ] 

Hudson commented on YARN-1358:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4676 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4676/])
YARN-1358. TestYarnCLI fails on Windows due to line endings. Contributed by 
Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537305)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


 TestYarnCLI fails on Windows due to line endings
 

 Key: YARN-1358
 URL: https://issues.apache.org/jira/browse/YARN-1358
 Project: Hadoop YARN
  Issue Type: Test
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Fix For: 3.0.0, 2.2.1

 Attachments: YARN-1358.2.patch, YARN-1358.patch


 The unit test fails on Windows due to incorrect line endings was used for 
 comparing the output from command line output. Error messages are as follows.
 {noformat}
 junit.framework.ComparisonFailure: expected:...argument for options[]
 usage: application
 ... but was:...argument for options[
 ]
 usage: application
 ...
   at junit.framework.Assert.assertEquals(Assert.java:85)
   at junit.framework.Assert.assertEquals(Assert.java:91)
   at 
 org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1378) Implement a RMStateStore cleaner for deleting application/attempt info

2013-10-30 Thread Jian He (JIRA)
Jian He created YARN-1378:
-

 Summary: Implement a RMStateStore cleaner for deleting 
application/attempt info
 Key: YARN-1378
 URL: https://issues.apache.org/jira/browse/YARN-1378
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


Now that we are storing the final state of application/attempt instead of 
removing application/attempt info on application/attempt completion(YARN-891), 
we need a separate RMStateStore cleaner for cleaning the application/attempt 
state.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1123:


Attachment: YARN-1123-3.patch

Adding the Patch with latest rebase and change Container status to Container 
state and add container exit status

Thanks,
Mayank

 [YARN-321] Adding ContainerReport and Protobuf implementation
 -

 Key: YARN-1123
 URL: https://issues.apache.org/jira/browse/YARN-1123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch


 Like YARN-978, we need some client-oriented class to expose the container 
 history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809664#comment-13809664
 ] 

Karthik Kambatla commented on YARN-1374:


I see the issue. Will upload a patch shortly to add the SchedulingMonitor to 
RMActiveServices.

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker

 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1374:
--

Assignee: Karthik Kambatla

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker

 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1379:
-

 Summary: [YARN-321] AHS protocols need to be in yarn proto package 
name after YARN-1170
 Key: YARN-1379
 URL: https://issues.apache.org/jira/browse/YARN-1379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Found this while merging YARN-321 to the latest branch-2. Without this, 
compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1343:
-

Attachment: YARN-1343.patch

[~bikassaha], thanks for the review and catching the double dispatching. 
Uploading a patch with the changes you suggested and also adding a test to 
verify the NODE_USABLE event is dispatched when a reconnect happens and the 
node has different capabilities.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809676#comment-13809676
 ] 

Hadoop QA commented on YARN-1343:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611185/YARN-1343.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2321//console

This message is automatically generated.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809679#comment-13809679
 ] 

Hadoop QA commented on YARN-1123:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611179/YARN-1123-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2320//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2320//console

This message is automatically generated.

 [YARN-321] Adding ContainerReport and Protobuf implementation
 -

 Key: YARN-1123
 URL: https://issues.apache.org/jira/browse/YARN-1123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch


 Like YARN-978, we need some client-oriented class to expose the container 
 history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809678#comment-13809678
 ] 

Sandy Ryza commented on YARN-1290:
--

[~ywskycn], the current patch no longer applies.  Would you mind rebasing?

 Let continuous scheduling achieve more balanced task assignment
 ---

 Key: YARN-1290
 URL: https://issues.apache.org/jira/browse/YARN-1290
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
 YARN-1290.patch


 Currently, in continuous scheduling (YARN-1010), in each round, the thread 
 iterates over pre-ordered nodes and assigns tasks. This mechanism may 
 overload the first several nodes, while the latter nodes have no tasks.
 We should sort all nodes according to available resource. In each round, 
 always assign tasks to nodes with larger capacity, which can balance the load 
 distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809682#comment-13809682
 ] 

Wei Yan commented on YARN-1290:
---

[~sandyr]. I'll fix it.

 Let continuous scheduling achieve more balanced task assignment
 ---

 Key: YARN-1290
 URL: https://issues.apache.org/jira/browse/YARN-1290
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
 YARN-1290.patch


 Currently, in continuous scheduling (YARN-1010), in each round, the thread 
 iterates over pre-ordered nodes and assigns tasks. This mechanism may 
 overload the first several nodes, while the latter nodes have no tasks.
 We should sort all nodes according to available resource. In each round, 
 always assign tasks to nodes with larger capacity, which can balance the load 
 distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1379:
--

Attachment: YARN-1379.txt

Simple patch that adds package names.

Compilation passes after this.

 [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
 --

 Key: YARN-1379
 URL: https://issues.apache.org/jira/browse/YARN-1379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1379.txt


 Found this while merging YARN-321 to the latest branch-2. Without this, 
 compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809719#comment-13809719
 ] 

Hadoop QA commented on YARN-1343:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611185/YARN-1343.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2322//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2322//console

This message is automatically generated.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809723#comment-13809723
 ] 

Hudson commented on YARN-1321:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4678 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4678/])
YARN-1321. Changed NMTokenCache to support both singleton and an instance 
usage. Contributed by Alejandro Abdelnur. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537334)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/NMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/NMTokenCache.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java


 NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to 
 work correctly
 

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1

 Attachments: YARN-1321-20131029.txt, YARN-1321.patch, 
 YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, 
 YARN-1321.patch


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809752#comment-13809752
 ] 

Zhijie Shen commented on YARN-1379:
---

+1. Verified it locally. The branch got compiled after the patch was applied.

 [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
 --

 Key: YARN-1379
 URL: https://issues.apache.org/jira/browse/YARN-1379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1379.txt


 Found this while merging YARN-321 to the latest branch-2. Without this, 
 compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1374:
---

Attachment: yarn-1374-1.patch

Here is a patch that moves creating the monitor policies to RMActiveServices.

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1374-1.patch


 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809753#comment-13809753
 ] 

Bikas Saha commented on YARN-1343:
--

Can you please double check that testReconnectWithDifferentCapacity is actually 
resulting in reconnection? The test alters the existing node's capacity and 
thus I would expect the equality check in ReconnectTransition to consider the 
nodes same as before. We probably need to create a new node with same name and 
different capacity. Maybe stepping through in the debugger may show whats 
really happening.
If reconnect with different capability code is getting executed then I would 
expect mock rm context to have to mock getRMNodes() method and a listener to be 
added for RMNodeEvents. Or else the test will have exceptions in the output.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1374:
---

Attachment: yarn-1374-1.patch

Forgot to add license headers.

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1374-1.patch, yarn-1374-1.patch


 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809755#comment-13809755
 ] 

Bikas Saha commented on YARN-1366:
--

Yes. the lastResponseId needs to reset to 0 and all the client side data like 
asks, blacklists etc should be sent in full to the RM.

The AMRMToken for the attempt is saved and restored. So the existing attempt 
will be able to reconnect to the restarted RM. This currently works.

 ApplicationMasterService should Resync with the AM upon allocate call after 
 restart
 ---

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha

 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809763#comment-13809763
 ] 

Mayank Bansal commented on YARN-1379:
-

+1 , verified.

Thanks,
Mayank

 [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
 --

 Key: YARN-1379
 URL: https://issues.apache.org/jira/browse/YARN-1379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1379.txt


 Found this while merging YARN-321 to the latest branch-2. Without this, 
 compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-311:


Attachment: YARN-311-v12.patch

In v12 patch, fix a tiny issue with adding volatile tag to ResourceOption in 
o.a.h.y.sls.nodemanager.NodeInfo (consumed by NMSimulator) after discussing 
with Luke offline.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
 YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
 YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
 YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
 YARN-311-v9.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
 contain changes in scheduler. 
 The flow to update node's resource and awareness in resource scheduling is: 
 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
 2. When next NM heartbeat for updating status comes, the RMNode's resource 
 change will be aware and the delta resource is added to schedulerNode's 
 availableResource before actual scheduling happens.
 3. Scheduler do resource allocation according to new availableResource in 
 SchedulerNode.
 For more design details, please refer proposal and discussions in parent 
 JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1379.
---

   Resolution: Fixed
Fix Version/s: YARN-321
 Hadoop Flags: Reviewed

Tx for the quick verification! I just committed this to branch YARN-321.

 [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
 --

 Key: YARN-1379
 URL: https://issues.apache.org/jira/browse/YARN-1379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: YARN-321

 Attachments: YARN-1379.txt


 Found this while merging YARN-321 to the latest branch-2. Without this, 
 compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809781#comment-13809781
 ] 

Hadoop QA commented on YARN-311:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611210/YARN-311-v12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2324//console

This message is automatically generated.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
 YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
 YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
 YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
 YARN-311-v9.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
 contain changes in scheduler. 
 The flow to update node's resource and awareness in resource scheduling is: 
 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
 2. When next NM heartbeat for updating status comes, the RMNode's resource 
 change will be aware and the delta resource is added to schedulerNode's 
 availableResource before actual scheduling happens.
 3. Scheduler do resource allocation according to new availableResource in 
 SchedulerNode.
 For more design details, please refer proposal and discussions in parent 
 JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809782#comment-13809782
 ] 

Hadoop QA commented on YARN-1374:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611209/yarn-1374-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2323//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2323//console

This message is automatically generated.

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1374-1.patch, yarn-1374-1.patch


 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1123:


Attachment: YARN-1123-4.patch

Adding toString optimization.

Thanks,
Mayank

 [YARN-321] Adding ContainerReport and Protobuf implementation
 -

 Key: YARN-1123
 URL: https://issues.apache.org/jira/browse/YARN-1123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, 
 YARN-1123-4.patch


 Like YARN-978, we need some client-oriented class to expose the container 
 history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809793#comment-13809793
 ] 

Karthik Kambatla commented on YARN-1374:


The test fails without the fix and passes with it. 

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1374-1.patch, yarn-1374-1.patch


 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1343:
-

Attachment: YARN-1343.patch

TheTestRMNodeTransition tests only verify the expected follow up events for the 
{{NodeListManager}} are dispatched. 

To test that reconnect is happening with different capabilities we need to add 
a test for the {{ResourceTrackerService.registerNodeManager()}}.

Uploading a patch that tests a RECONNECTED event dispatching with same and 
different capabilities.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
 YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-10-30 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-978:
---

Attachment: YARN-978.9.patch

After YARN-947 Had to make changes and remove stuff from this patch.
Added toString optimization

Thanks,
Mayank

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, 
 YARN-978.8.patch, YARN-978.9.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-311:


Attachment: YARN-311-v12b.patch

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
 YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, 
 YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, 
 YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, 
 YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
 contain changes in scheduler. 
 The flow to update node's resource and awareness in resource scheduling is: 
 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
 2. When next NM heartbeat for updating status comes, the RMNode's resource 
 change will be aware and the delta resource is added to schedulerNode's 
 availableResource before actual scheduling happens.
 3. Scheduler do resource allocation according to new availableResource in 
 SchedulerNode.
 For more design details, please refer proposal and discussions in parent 
 JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809797#comment-13809797
 ] 

Junping Du commented on YARN-311:
-

The log didn't show it is an build failure (it works well locally), so the 
jenkins failure above is not related with patch but an accident. Rename it to 
v12b (exactly the same) patch and submit it again.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
 YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, 
 YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, 
 YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, 
 YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
 contain changes in scheduler. 
 The flow to update node's resource and awareness in resource scheduling is: 
 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
 2. When next NM heartbeat for updating status comes, the RMNode's resource 
 change will be aware and the delta resource is added to schedulerNode's 
 availableResource before actual scheduling happens.
 3. Scheduler do resource allocation according to new availableResource in 
 SchedulerNode.
 For more design details, please refer proposal and discussions in parent 
 JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1320:
--

Summary: Custom log4j properties in Distributed shell does not work 
properly.  (was: Custom log4j properties does not work properly.)

 Custom log4j properties in Distributed shell does not work properly.
 

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809801#comment-13809801
 ] 

Hadoop QA commented on YARN-978:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611217/YARN-978.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2326//console

This message is automatically generated.

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
 YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, 
 YARN-978.8.patch, YARN-978.9.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809803#comment-13809803
 ] 

Hadoop QA commented on YARN-1123:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611214/YARN-1123-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2327//console

This message is automatically generated.

 [YARN-321] Adding ContainerReport and Protobuf implementation
 -

 Key: YARN-1123
 URL: https://issues.apache.org/jira/browse/YARN-1123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, 
 YARN-1123-4.patch


 Like YARN-978, we need some client-oriented class to expose the container 
 history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809808#comment-13809808
 ] 

Vinod Kumar Vavilapalli commented on YARN-1320:
---

Patch looks good to me. Can you update what tests you've done?

Also may be we can write a test? By making use of log-aggregation :)

 Custom log4j properties in Distributed shell does not work properly.
 

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809814#comment-13809814
 ] 

Bikas Saha commented on YARN-1343:
--

lgtm.
In the new testReconnect() we should check that the number of RMNode in 
rmContext.getRMNodes() is still 1. eg. the second node actually replaced the 
previous node (desired behavior) as opposed to both getting into the list.

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
 YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809815#comment-13809815
 ] 

Bikas Saha commented on YARN-1343:
--

+1 for committing. Thanks!

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
 YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809820#comment-13809820
 ] 

Hadoop QA commented on YARN-311:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611218/YARN-311-v12b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2325//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2325//console

This message is automatically generated.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
 YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, 
 YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, 
 YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, 
 YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
 contain changes in scheduler. 
 The flow to update node's resource and awareness in resource scheduling is: 
 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
 2. When next NM heartbeat for updating status comes, the RMNode's resource 
 change will be aware and the delta resource is added to schedulerNode's 
 availableResource before actual scheduling happens.
 3. Scheduler do resource allocation according to new availableResource in 
 SchedulerNode.
 For more design details, please refer proposal and discussions in parent 
 JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809821#comment-13809821
 ] 

Alejandro Abdelnur commented on YARN-1343:
--

[~bikassaha], on your ask about checking for the node count, I don't think is 
necessary, if a reconnect is triggered, it means it was found, in the 
{{ResourceTrackerService.registerNodeManager()}}:

{code}
  RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode);
if (oldNode == null) {
  this.rmContext.getDispatcher().getEventHandler().handle(
  new RMNodeEvent(nodeId, RMNodeEventType.STARTED));
} else {
  LOG.info(Reconnect from the node at:  + host);
  this.nmLivelinessMonitor.unregister(nodeId);
  this.rmContext.getDispatcher().getEventHandler().handle(
  new RMNodeReconnectEvent(nodeId, rmNode));
}
{code}

thx

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
 YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.8.patch

Thanks Vinod/Bikas for the reviews.

- New patch fixed the above comments.
- Made a new change in RMAppManager.recover() that recovers the application 
synchronously , as otherwise client is possible to see the application not 
recovered because ClientRMService is already started at that time.

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
 YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
 YARN-891.7.patch, YARN-891.8.patch, YARN-891.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch


 Store completed application/attempt info in RMStateStore when 
 application/attempt completes. This solves some problems like finished 
 application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1290:
--

Attachment: YARN-1290.patch

 Let continuous scheduling achieve more balanced task assignment
 ---

 Key: YARN-1290
 URL: https://issues.apache.org/jira/browse/YARN-1290
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
 YARN-1290.patch, YARN-1290.patch


 Currently, in continuous scheduling (YARN-1010), in each round, the thread 
 iterates over pre-ordered nodes and assigns tasks. This mechanism may 
 overload the first several nodes, while the latter nodes have no tasks.
 We should sort all nodes according to available resource. In each round, 
 always assign tasks to nodes with larger capacity, which can balance the load 
 distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809840#comment-13809840
 ] 

Hadoop QA commented on YARN-891:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611222/YARN-891.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2328//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2328//console

This message is automatically generated.

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
 YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
 YARN-891.7.patch, YARN-891.8.patch, YARN-891.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch


 Store completed application/attempt info in RMStateStore when 
 application/attempt completes. This solves some problems like finished 
 application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.9.patch

Fixed the test failure

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
 YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
 YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch


 Store completed application/attempt info in RMStateStore when 
 application/attempt completes. This solves some problems like finished 
 application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809849#comment-13809849
 ] 

Hadoop QA commented on YARN-891:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611232/YARN-891.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2330//console

This message is automatically generated.

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
 YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
 YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch


 Store completed application/attempt info in RMStateStore when 
 application/attempt completes. This solves some problems like finished 
 application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809851#comment-13809851
 ] 

Hadoop QA commented on YARN-1290:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611228/YARN-1290.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2329//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2329//console

This message is automatically generated.

 Let continuous scheduling achieve more balanced task assignment
 ---

 Key: YARN-1290
 URL: https://issues.apache.org/jira/browse/YARN-1290
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
 YARN-1290.patch, YARN-1290.patch


 Currently, in continuous scheduling (YARN-1010), in each round, the thread 
 iterates over pre-ordered nodes and assigns tasks. This mechanism may 
 overload the first several nodes, while the latter nodes have no tasks.
 We should sort all nodes according to available resource. In each round, 
 always assign tasks to nodes with larger capacity, which can balance the load 
 distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-998) Persistent resource change during NM restart

2013-10-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-998:
---

Assignee: Junping Du

 Persistent resource change during NM restart
 

 Key: YARN-998
 URL: https://issues.apache.org/jira/browse/YARN-998
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du

 When NM is restarted by plan or from a failure, previous dynamic resource 
 setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809874#comment-13809874
 ] 

Chris Douglas commented on YARN-1374:
-

+1 lgtm

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1374-1.patch, yarn-1374-1.patch


 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved YARN-1343.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
 YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809883#comment-13809883
 ] 

Hudson commented on YARN-1343:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4680 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4680/])
YARN-1343. NodeManagers additions/restarts are not reported as node updates in 
AllocateResponse responses to AMs. (tucu) (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1537368)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java


 NodeManagers additions/restarts are not reported as node updates in 
 AllocateResponse responses to AMs
 -

 Key: YARN-1343
 URL: https://issues.apache.org/jira/browse/YARN-1343
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
 YARN-1343.patch


 If a NodeManager joins the cluster or gets restarted, running AMs never 
 receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1324) NodeManager potentially causes unnecessary operations on all its disks

2013-10-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809885#comment-13809885
 ] 

Chris Douglas commented on YARN-1324:
-

bq. When does MR use multiple disks in the same task/container? Isnt the map 
output written to a single indexed partition file?

Spills are spread across all volumes, but merged into a single file at the end.

Would randomizing the order of disks be a reasonable short-term workaround for 
(1)? Future changes could weight/elide directories based on other criteria, but 
that's a simple change. So would changing the random selection to bias its 
search order using a hash of the task id (instead of disk usage when creating 
the spill), so the ShuffleHandler could search fewer directories on average. I 
agree with Vinod, it would be hard to prevent the search altogether...

bq. Requiring apps to specify the number of disks for a container is also a 
viable solution and can be done in a back-compatible manner by changing MR to 
specify multiple disks and leaving the default to 1 for apps that dont care.

This makes sense as a hint, but some users might interpret it as a constraint 
and be confused when a NM schedules them on a node the reports fewer local dirs 
(due to failure, heterogeneous config).

 NodeManager potentially causes unnecessary operations on all its disks
 --

 Key: YARN-1324
 URL: https://issues.apache.org/jira/browse/YARN-1324
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Bikas Saha

 Currently, for every container, the NM creates a directory on every disk and 
 expects the container-task to choose 1 of them and load balance the use of 
 the disks across all containers. 
 1) This may have worked fine in the MR world where MR tasks would randomly 
 choose dirs but in general we cannot expect every app/task writer to 
 understand these nuances and randomly pick disks. So we could end up 
 overloading the first disk if most people decide to use the first disk.
 2) This makes a number of NM operations to scan every disk (thus randomizing 
 that disk) to locate the dir which the task has actually chosen to use for 
 its files. Makes all these operations expensive for the NM as well as 
 disruptive for users of disks that did not have the real task working dirs.
 I propose that NM should up-front decide the disk it is assigning to tasks. 
 It could choose to do so randomly or weighted-randomly by looking at space 
 and load on each disk. So it could do a better job of load balancing. Then, 
 it would associate the chosen working directory with the container context so 
 that subsequent operations on the NM can directly seek to the correct 
 location instead of having to seek on every disk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.10.patch

resubmit the patch

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.10.patch, YARN-891.1.patch, YARN-891.2.patch, 
 YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, 
 YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch


 Store completed application/attempt info in RMStateStore when 
 application/attempt completes. This solves some problems like finished 
 application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809934#comment-13809934
 ] 

Hadoop QA commented on YARN-891:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611248/YARN-891.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2331//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2331//console

This message is automatically generated.

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.10.patch, YARN-891.1.patch, YARN-891.2.patch, 
 YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, 
 YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch


 Store completed application/attempt info in RMStateStore when 
 application/attempt completes. This solves some problems like finished 
 application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)