[jira] [Commented] (YARN-499) On container failure, surface logs to client
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655616#comment-13655616 ] Vinod Kumar Vavilapalli commented on YARN-499: -- bq. Is there a reason we have avoided pulling the logs directly in YARN as well? If not, should we do this for both the AM and task containers? I see your activity on MAPREDUCE-4362 and YARN-649. So that answers it? We can do that definitely for AMs too. bq. The issue I am aiming to solve is the last one you mention of the AM crashing before registering with the RM. A few JIRAs have been filed around this problem with little progress, so I wanted to put forth a concrete proposal. So, YARN-649/MAPREDUCE-4362 should address this? I think we should do the AM log-pull on failure feature in YarnClient itself and make JobClient to use it if possible. On container failure, surface logs to client Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-675) In YarnClient, pull AM logs on AM container failure
Sandy Ryza created YARN-675: --- Summary: In YarnClient, pull AM logs on AM container failure Key: YARN-675 URL: https://issues.apache.org/jira/browse/YARN-675 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to be able to pull its logs so that they can be displayed immediately to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-499) On container failure, surface logs to client
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655619#comment-13655619 ] Sandy Ryza commented on YARN-499: - bq. So that answers it? Yeah, it does. Thanks Vinod. bq. I think we should do the AM log-pull on failure feature in YarnClient itself and make JobClient to use it if possible. Good idea. Just filed YARN-675 for this. On container failure, surface logs to client Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-499) On container failure, surface logs to client
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved YARN-499. - Resolution: Won't Fix On container failure, surface logs to client Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-499) On container failure, include logs in diagnostics
[ https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-499: Summary: On container failure, include logs in diagnostics (was: On container failure, surface logs to client) On container failure, include logs in diagnostics - Key: YARN-499 URL: https://issues.apache.org/jira/browse/YARN-499 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-499.patch When a container fails, the only way to diagnose it is to look at the logs. ContainerStatuses include a diagnostic string that is reported back to the resource manager by the node manager. Currently in MR2 I believe whatever is sent to the task's standard out is added to the diagnostics string, but for MR standard out is redirected to a file called stdout. In MR1, this string was populated with the last few lines of the task's stdout file, and got printed to the console, allowing for easy debugging. Handling this would help to soothe the infuriating problem of an AM dying for a mysterious reason before setting a tracking URL (MAPREDUCE-3688). This could be done in one of two ways. * Use tee to send MR's standard out to both the stdout file and standard out. This requires modifying ShellCmdExecutor to roll what it reads in, as we wouldn't want to be storing the entire task log in NM memory. * Read the task's log files. This would require standardizing or making the container log files configurable. Right now the log files are determined in userland and all that is YARN is aware of the log directory. Does this present any issues I'm not considering? If so it this might only be needed for AMs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-675) In YarnClient, pull AM logs on AM container failure
[ https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-675: Description: Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to pull its logs from the NM to the client so that they can be displayed immediately to the user. (was: Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to be able to pull its logs so that they can be displayed immediately to the user.) In YarnClient, pull AM logs on AM container failure --- Key: YARN-675 URL: https://issues.apache.org/jira/browse/YARN-675 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to pull its logs from the NM to the client so that they can be displayed immediately to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-675) In YarnClient, pull AM logs on AM container failure
[ https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-675: - Issue Type: Sub-task (was: Improvement) Parent: YARN-522 In YarnClient, pull AM logs on AM container failure --- Key: YARN-675 URL: https://issues.apache.org/jira/browse/YARN-675 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to pull its logs from the NM to the client so that they can be displayed immediately to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-638) Restore RMDelegationTokens after RM Restart
[ https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-638: - Attachment: YARN-638.6.patch The new patch: 1. adds the real FileSystemStore for recovering RMDelegationTokens. 2. renamed logUpdatedMasterKey, logExpireToken in hadoop-common to storeNewMasterKey and removeExpiredToken, also adds a new method removeStoredMasterKey Restore RMDelegationTokens after RM Restart --- Key: YARN-638 URL: https://issues.apache.org/jira/browse/YARN-638 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-638.1.patch, YARN-638.2.patch, YARN-638.3.patch, YARN-638.4.patch, YARN-638.5.patch, YARN-638.6.patch This is missed in YARN-581. After RM restart, RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-307) NodeManager should log container launch command.
[ https://issues.apache.org/jira/browse/YARN-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-307: - Labels: usability (was: ) NodeManager should log container launch command. Key: YARN-307 URL: https://issues.apache.org/jira/browse/YARN-307 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Labels: usability NodeManager's DefaultContainerExecutor seems to log only path of default container executor script instead of contents of script. It would be good to log the execution command so that one could see what is being launched. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-615) ContainerLaunchContext.containerTokens should simply be called tokens
[ https://issues.apache.org/jira/browse/YARN-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-615: - Attachment: YARN-615-20130512.txt Patch against latest trunk. ContainerLaunchContext.containerTokens should simply be called tokens - Key: YARN-615 URL: https://issues.apache.org/jira/browse/YARN-615 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-615-20130503.txt, YARN-615-20130512.txt ContainerToken is the name of the specific token that AMs use to launch containers on NMs, so we should rename CLC.containerTokens to be simply tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655661#comment-13655661 ] Bikas Saha commented on YARN-674: - Looks like we might have to resurrect the remaining changes proposed in the document in YARN-549, namely sending an event to RMAppManager instead of calling it RMAppManager.submitApplication() directly since that method is no longer cheap. Any other alternatives? Slow or failing DelegationToken renewals on submission itself make RM unavailable - Key: YARN-674 URL: https://issues.apache.org/jira/browse/YARN-674 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-615) ContainerLaunchContext.containerTokens should simply be called tokens
[ https://issues.apache.org/jira/browse/YARN-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655675#comment-13655675 ] Hadoop QA commented on YARN-615: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582870/YARN-615-20130512.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/916//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/916//console This message is automatically generated. ContainerLaunchContext.containerTokens should simply be called tokens - Key: YARN-615 URL: https://issues.apache.org/jira/browse/YARN-615 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-615-20130503.txt, YARN-615-20130512.txt ContainerToken is the name of the specific token that AMs use to launch containers on NMs, so we should rename CLC.containerTokens to be simply tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-422) Add NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655689#comment-13655689 ] Bikas Saha commented on YARN-422: - Rename onExceptionRaisedWhenStartingContainer() to onStartContainerError()? What do you say? The comment should not say 10 since the value can change with time. Is the initial value checked to be less than the MAX specified? {code} +// Start with a default core-pool size of 10 and change it dynamically. +threadPool = new ThreadPoolExecutor(INITIAL_THREAD_POOL_SIZE, +Integer.MAX_VALUE, 1, TimeUnit.HOURS, {code} Improve grammar? {code} + // See if we need up the pool size only if haven't reached the + // maximum limit yet. {code} Should the boolean flag set/get be part of NMClient interface itself? {code} +if (!(client instanceof NMClientImpl) || +((NMClientImpl) client).stopAllRunningContainersOnStoppingEnabled()) {code} Why is TestAMRMClient being removed? We need to double check the synchronization/thread safety of this class. Lots of objects and threads. Can you please document the expected locking order? Add NM client library - Key: YARN-422 URL: https://issues.apache.org/jira/browse/YARN-422 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: AMNMClient_Defination.txt, AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, YARN-422.1.patch, YARN-422.2.patch, YARN-422.3.patch, YARN-422.4.patch, YARN-422.5.patch Create a simple wrapper over the ContainerManager protocol to provide hide the details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-638) Restore RMDelegationTokens after RM Restart
[ https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655691#comment-13655691 ] Hadoop QA commented on YARN-638: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582869/YARN-638.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/915//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/915//console This message is automatically generated. Restore RMDelegationTokens after RM Restart --- Key: YARN-638 URL: https://issues.apache.org/jira/browse/YARN-638 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-638.1.patch, YARN-638.2.patch, YARN-638.3.patch, YARN-638.4.patch, YARN-638.5.patch, YARN-638.6.patch This is missed in YARN-581. After RM restart, RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-307) NodeManager should log container launch command.
[ https://issues.apache.org/jira/browse/YARN-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lohit Vijayarenu resolved YARN-307. --- Resolution: Invalid Resolving as wont invalid NodeManager should log container launch command. Key: YARN-307 URL: https://issues.apache.org/jira/browse/YARN-307 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Labels: usability NodeManager's DefaultContainerExecutor seems to log only path of default container executor script instead of contents of script. It would be good to log the execution command so that one could see what is being launched. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-502: - Issue Type: Sub-task (was: Bug) Parent: YARN-676 RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-676) [Umbrella] Daemons crashing because of invalid state transitions
Vinod Kumar Vavilapalli created YARN-676: Summary: [Umbrella] Daemons crashing because of invalid state transitions Key: YARN-676 URL: https://issues.apache.org/jira/browse/YARN-676 Project: Hadoop YARN Issue Type: Task Reporter: Vinod Kumar Vavilapalli There are several tickets tracking invalid transitions which essentially crash the daemons - RM, NM or AM. This is tracking ticket. We should try to fix as many of them as soon as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-245: - Issue Type: Sub-task (was: Bug) Parent: YARN-676 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Devaraj K {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl
[ https://issues.apache.org/jira/browse/YARN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-296: - Issue Type: Sub-task (was: Bug) Parent: YARN-676 Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl Key: YARN-296 URL: https://issues.apache.org/jira/browse/YARN-296 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K {code:xml} 2012-12-28 11:14:47,671 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:528) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:72) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:405) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:389) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-346) InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE for ContainerImpl in Node Manager
[ https://issues.apache.org/jira/browse/YARN-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-346: - Issue Type: Sub-task (was: Bug) Parent: YARN-676 InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE for ContainerImpl in Node Manager --- Key: YARN-346 URL: https://issues.apache.org/jira/browse/YARN-346 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha, 0.23.5 Reporter: Devaraj K Assignee: Devaraj K Priority: Critical {code:xml} 2013-01-16 23:55:52,067 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [INIT_CONTAINER] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2013-01-16 23:55:52,067 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358353581666_1326_01_10 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
I'll be out of the office returning 16 May
I will be out of the office starting 05/12/2013 and will not return until 05/16/2013. I will be out of office at a customer site with limitted ot no internet access. For urgent matters, please contact my manager Mohamed Obide (mob...@eg.ibm.com)
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655767#comment-13655767 ] Steve Loughran commented on YARN-530: - will look at this on monday. Off the top of my head # concept of blockers is making explicit what things are depending on (simple name-string map) so that if a service is explicitly waiting for something to come up (say DN on NN), then it's visible, rather than just have something appearing to hang. Right now we are second guessing why the JT doesn't come up when HDFS is in safe mode, by polling HDFS and assuming the two states are are correlated. # {{ServiceOperations()} have been in for a while try to handle the old model; happy to pull them # I'd love to mark the init/start stop and final, but one test using mockito didn't like it. I'll see if I can fix that test. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira