[jira] [Commented] (YARN-515) Node Manager not getting the master key

2013-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618042#comment-13618042
 ] 

Hudson commented on YARN-515:
-

Integrated in Hadoop-Yarn-trunk #170 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/170/])
YARN-515. Node Manager not getting the master key. Contributed by Robert 
Joseph Evans (Revision 1462632)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462632
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerResponse.java


 Node Manager not getting the master key
 ---

 Key: YARN-515
 URL: https://issues.apache.org/jira/browse/YARN-515
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.4-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker
 Fix For: 2.0.5-beta

 Attachments: YARN-515.txt


 On branch-2 the latest version I see the following on a secure cluster.
 {noformat}
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security 
 enabled - updating secret keys now
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
 with ResourceManager as RM:PORT with total resource of me
 mory:12288, vCores:16
 2013-03-28 19:21:06,244 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is 
 started.
 2013-03-28 19:21:06,245 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
 2013-03-28 19:21:07,257 [Node Status Updater] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
 exception in status-updater
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
 {noformat}
 The Null pointer exception just keeps repeating and all of the nodes end up 
 being lost.  It looks like it never gets the secret key when it registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618043#comment-13618043
 ] 

Hudson commented on YARN-460:
-

Integrated in Hadoop-Yarn-trunk #170 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/170/])
YARN-460. CS user left in list of active users for the queue even when 
application finished (tgraves) (Revision 1462486)

 Result = FAILURE
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462486
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


 CS user left in list of active users for the queue even when application 
 finished
 -

 Key: YARN-460
 URL: https://issues.apache.org/jira/browse/YARN-460
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Blocker
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
 YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, 
 YARN-460.patch


 We have seen a user get left in the queues list of active users even though 
 the application was removed. This can cause everyone else in the queue to get 
 less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618062#comment-13618062
 ] 

Hudson commented on YARN-460:
-

Integrated in Hadoop-Hdfs-0.23-Build #568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/568/])
YARN-460. CS user left in list of active users for the queue even when 
application finished (tgraves) (Revision 1462497)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462497
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


 CS user left in list of active users for the queue even when application 
 finished
 -

 Key: YARN-460
 URL: https://issues.apache.org/jira/browse/YARN-460
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Blocker
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
 YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, 
 YARN-460.patch


 We have seen a user get left in the queues list of active users even though 
 the application was removed. This can cause everyone else in the queue to get 
 less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618067#comment-13618067
 ] 

Hudson commented on YARN-460:
-

Integrated in Hadoop-Hdfs-trunk #1359 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1359/])
YARN-460. CS user left in list of active users for the queue even when 
application finished (tgraves) (Revision 1462486)

 Result = FAILURE
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462486
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


 CS user left in list of active users for the queue even when application 
 finished
 -

 Key: YARN-460
 URL: https://issues.apache.org/jira/browse/YARN-460
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Blocker
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
 YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, 
 YARN-460.patch


 We have seen a user get left in the queues list of active users even though 
 the application was removed. This can cause everyone else in the queue to get 
 less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-515) Node Manager not getting the master key

2013-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618066#comment-13618066
 ] 

Hudson commented on YARN-515:
-

Integrated in Hadoop-Hdfs-trunk #1359 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1359/])
YARN-515. Node Manager not getting the master key. Contributed by Robert 
Joseph Evans (Revision 1462632)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462632
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerResponse.java


 Node Manager not getting the master key
 ---

 Key: YARN-515
 URL: https://issues.apache.org/jira/browse/YARN-515
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.4-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker
 Fix For: 2.0.5-beta

 Attachments: YARN-515.txt


 On branch-2 the latest version I see the following on a secure cluster.
 {noformat}
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security 
 enabled - updating secret keys now
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
 with ResourceManager as RM:PORT with total resource of me
 mory:12288, vCores:16
 2013-03-28 19:21:06,244 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is 
 started.
 2013-03-28 19:21:06,245 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
 2013-03-28 19:21:07,257 [Node Status Updater] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
 exception in status-updater
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
 {noformat}
 The Null pointer exception just keeps repeating and all of the nodes end up 
 being lost.  It looks like it never gets the secret key when it registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished

2013-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618080#comment-13618080
 ] 

Hudson commented on YARN-460:
-

Integrated in Hadoop-Mapreduce-trunk #1387 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1387/])
YARN-460. CS user left in list of active users for the queue even when 
application finished (tgraves) (Revision 1462486)

 Result = FAILURE
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462486
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java


 CS user left in list of active users for the queue even when application 
 finished
 -

 Key: YARN-460
 URL: https://issues.apache.org/jira/browse/YARN-460
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Blocker
 Fix For: 3.0.0, 0.23.7, 2.0.5-beta

 Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, 
 YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, 
 YARN-460.patch


 We have seen a user get left in the queues list of active users even though 
 the application was removed. This can cause everyone else in the queue to get 
 less resources if using the minimum user limit percent config.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-515) Node Manager not getting the master key

2013-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618079#comment-13618079
 ] 

Hudson commented on YARN-515:
-

Integrated in Hadoop-Mapreduce-trunk #1387 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1387/])
YARN-515. Node Manager not getting the master key. Contributed by Robert 
Joseph Evans (Revision 1462632)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462632
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerResponse.java


 Node Manager not getting the master key
 ---

 Key: YARN-515
 URL: https://issues.apache.org/jira/browse/YARN-515
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.4-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker
 Fix For: 2.0.5-beta

 Attachments: YARN-515.txt


 On branch-2 the latest version I see the following on a secure cluster.
 {noformat}
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security 
 enabled - updating secret keys now
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
 with ResourceManager as RM:PORT with total resource of me
 mory:12288, vCores:16
 2013-03-28 19:21:06,244 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is 
 started.
 2013-03-28 19:21:06,245 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
 2013-03-28 19:21:07,257 [Node Status Updater] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
 exception in status-updater
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
 {noformat}
 The Null pointer exception just keeps repeating and all of the nodes end up 
 being lost.  It looks like it never gets the secret key when it registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-515) Node Manager not getting the master key

2013-03-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618214#comment-13618214
 ] 

Vinod Kumar Vavilapalli commented on YARN-515:
--

[~revans2], [~jlowe], thanks for taking care of this. I'll request people to 
test patches in secure mode from now on.

 Node Manager not getting the master key
 ---

 Key: YARN-515
 URL: https://issues.apache.org/jira/browse/YARN-515
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.4-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker
 Fix For: 2.0.5-beta

 Attachments: YARN-515.txt


 On branch-2 the latest version I see the following on a secure cluster.
 {noformat}
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security 
 enabled - updating secret keys now
 2013-03-28 19:21:06,243 [main] INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
 with ResourceManager as RM:PORT with total resource of me
 mory:12288, vCores:16
 2013-03-28 19:21:06,244 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is 
 started.
 2013-03-28 19:21:06,245 [main] INFO 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
 2013-03-28 19:21:07,257 [Node Status Updater] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
 exception in status-updater
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
 {noformat}
 The Null pointer exception just keeps repeating and all of the nodes end up 
 being lost.  It looks like it never gets the secret key when it registers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-522) [Umbrella] Better reporting for crashed/Killed AMs

2013-03-30 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-522:


 Summary: [Umbrella] Better reporting for crashed/Killed AMs
 Key: YARN-522
 URL: https://issues.apache.org/jira/browse/YARN-522
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli


Crashing AMs has been a real pain for users since the beginning. And there are 
already a few tickets floating around, filing this to consolidate them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-499) On container failure, include last n lines of logs in diagnostics

2013-03-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-499:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-522

 On container failure, include last n lines of logs in diagnostics
 -

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-523) Container localization failures aren't reported from NM to RM

2013-03-30 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-523:


 Summary: Container localization failures aren't reported from NM 
to RM
 Key: YARN-523
 URL: https://issues.apache.org/jira/browse/YARN-523
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli


This is mainly a pain on crashing AMs, but once we fix this, containers also 
can benefit - same fix for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-03-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618248#comment-13618248
 ] 

Sandy Ryza commented on YARN-366:
-

Woah that's a lot of failing tests.  Working on a patch that fixes the 
getConfig() / init issue.

bq. Rename yarn.async.dispatcher.tracing to simply yarn.dispatcher.Please also 
document that it can/amy impact performance if enabled.
In this case should it still be a boolean or accept any class name?

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-03-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-366:


Attachment: YARN-366-2.patch

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira