[jira] [Commented] (YARN-515) Node Manager not getting the master key
[ https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618042#comment-13618042 ] Hudson commented on YARN-515: - Integrated in Hadoop-Yarn-trunk #170 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/170/]) YARN-515. Node Manager not getting the master key. Contributed by Robert Joseph Evans (Revision 1462632) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462632 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerResponse.java Node Manager not getting the master key --- Key: YARN-515 URL: https://issues.apache.org/jira/browse/YARN-515 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Blocker Fix For: 2.0.5-beta Attachments: YARN-515.txt On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of me mory:12288, vCores:16 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618043#comment-13618043 ] Hudson commented on YARN-460: - Integrated in Hadoop-Yarn-trunk #170 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/170/]) YARN-460. CS user left in list of active users for the queue even when application finished (tgraves) (Revision 1462486) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462486 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Blocker Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618062#comment-13618062 ] Hudson commented on YARN-460: - Integrated in Hadoop-Hdfs-0.23-Build #568 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/568/]) YARN-460. CS user left in list of active users for the queue even when application finished (tgraves) (Revision 1462497) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462497 Files : * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApp.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Blocker Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618067#comment-13618067 ] Hudson commented on YARN-460: - Integrated in Hadoop-Hdfs-trunk #1359 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1359/]) YARN-460. CS user left in list of active users for the queue even when application finished (tgraves) (Revision 1462486) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462486 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Blocker Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-515) Node Manager not getting the master key
[ https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618066#comment-13618066 ] Hudson commented on YARN-515: - Integrated in Hadoop-Hdfs-trunk #1359 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1359/]) YARN-515. Node Manager not getting the master key. Contributed by Robert Joseph Evans (Revision 1462632) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462632 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerResponse.java Node Manager not getting the master key --- Key: YARN-515 URL: https://issues.apache.org/jira/browse/YARN-515 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Blocker Fix For: 2.0.5-beta Attachments: YARN-515.txt On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of me mory:12288, vCores:16 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-460) CS user left in list of active users for the queue even when application finished
[ https://issues.apache.org/jira/browse/YARN-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618080#comment-13618080 ] Hudson commented on YARN-460: - Integrated in Hadoop-Mapreduce-trunk #1387 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1387/]) YARN-460. CS user left in list of active users for the queue even when application finished (tgraves) (Revision 1462486) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462486 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java CS user left in list of active users for the queue even when application finished - Key: YARN-460 URL: https://issues.apache.org/jira/browse/YARN-460 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Thomas Graves Assignee: Thomas Graves Priority: Blocker Fix For: 3.0.0, 0.23.7, 2.0.5-beta Attachments: YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460-branch-0.23.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch, YARN-460.patch We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-515) Node Manager not getting the master key
[ https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618079#comment-13618079 ] Hudson commented on YARN-515: - Integrated in Hadoop-Mapreduce-trunk #1387 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1387/]) YARN-515. Node Manager not getting the master key. Contributed by Robert Joseph Evans (Revision 1462632) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462632 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestRegisterNodeManagerResponse.java Node Manager not getting the master key --- Key: YARN-515 URL: https://issues.apache.org/jira/browse/YARN-515 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Blocker Fix For: 2.0.5-beta Attachments: YARN-515.txt On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of me mory:12288, vCores:16 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-515) Node Manager not getting the master key
[ https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618214#comment-13618214 ] Vinod Kumar Vavilapalli commented on YARN-515: -- [~revans2], [~jlowe], thanks for taking care of this. I'll request people to test patches in secure mode from now on. Node Manager not getting the master key --- Key: YARN-515 URL: https://issues.apache.org/jira/browse/YARN-515 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Blocker Fix For: 2.0.5-beta Attachments: YARN-515.txt On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of me mory:12288, vCores:16 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-522) [Umbrella] Better reporting for crashed/Killed AMs
Vinod Kumar Vavilapalli created YARN-522: Summary: [Umbrella] Better reporting for crashed/Killed AMs Key: YARN-522 URL: https://issues.apache.org/jira/browse/YARN-522 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Crashing AMs has been a real pain for users since the beginning. And there are already a few tickets floating around, filing this to consolidate them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-499) On container failure, include last n lines of logs in diagnostics
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-499: - Issue Type: Sub-task (was: Improvement) Parent: YARN-522 On container failure, include last n lines of logs in diagnostics - Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-523) Container localization failures aren't reported from NM to RM
Vinod Kumar Vavilapalli created YARN-523: Summary: Container localization failures aren't reported from NM to RM Key: YARN-523 URL: https://issues.apache.org/jira/browse/YARN-523 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli This is mainly a pain on crashing AMs, but once we fix this, containers also can benefit - same fix for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618248#comment-13618248 ] Sandy Ryza commented on YARN-366: - Woah that's a lot of failing tests. Working on a patch that fixes the getConfig() / init issue. bq. Rename yarn.async.dispatcher.tracing to simply yarn.dispatcher.Please also document that it can/amy impact performance if enabled. In this case should it still be a boolean or accept any class name? Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-366: Attachment: YARN-366-2.patch Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira