[jira] [Commented] (YARN-499) On container failure, surface logs to client

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655616#comment-13655616
 ] 

Vinod Kumar Vavilapalli commented on YARN-499:
--

bq. Is there a reason we have avoided pulling the logs directly in YARN as 
well? If not, should we do this for both the AM and task containers?
I see your activity on MAPREDUCE-4362 and YARN-649. So that answers it? We can 
do that definitely for AMs too.

bq. The issue I am aiming to solve is the last one you mention of the AM 
crashing before registering with the RM. A few JIRAs have been filed around 
this problem with little progress, so I wanted to put forth a concrete proposal.
So, YARN-649/MAPREDUCE-4362 should address this? I think we should do the AM 
log-pull on failure feature in YarnClient itself and make JobClient to use it 
if possible.

 On container failure, surface logs to client
 

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-05-12 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-675:
---

 Summary: In YarnClient, pull AM logs on AM container failure
 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza


Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
be able to pull its logs so that they can be displayed immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-499) On container failure, surface logs to client

2013-05-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655619#comment-13655619
 ] 

Sandy Ryza commented on YARN-499:
-

bq. So that answers it?
Yeah, it does. Thanks Vinod.

bq. I think we should do the AM log-pull on failure feature in YarnClient 
itself and make JobClient to use it if possible.
Good idea.  Just filed YARN-675 for this.

 On container failure, surface logs to client
 

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-499) On container failure, surface logs to client

2013-05-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved YARN-499.
-

Resolution: Won't Fix

 On container failure, surface logs to client
 

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-499) On container failure, include logs in diagnostics

2013-05-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-499:


Summary: On container failure, include logs in diagnostics  (was: On 
container failure, surface logs to client)

 On container failure, include logs in diagnostics
 -

 Key: YARN-499
 URL: https://issues.apache.org/jira/browse/YARN-499
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-499.patch


 When a container fails, the only way to diagnose it is to look at the logs.  
 ContainerStatuses include a diagnostic string that is reported back to the 
 resource manager by the node manager.
 Currently in MR2 I believe whatever is sent to the task's standard out is 
 added to the diagnostics string, but for MR standard out is redirected to a 
 file called stdout.  In MR1, this string was populated with the last few 
 lines of the task's stdout file, and got printed to the console, allowing for 
 easy debugging.
 Handling this would help to soothe the infuriating problem of an AM dying for 
 a mysterious reason before setting a tracking URL (MAPREDUCE-3688).
 This could be done in one of two ways.
 * Use tee to send MR's standard out to both the stdout file and standard out. 
  This requires modifying ShellCmdExecutor to roll what it reads in, as we 
 wouldn't want to be storing the entire task log in NM memory.
 * Read the task's log files.  This would require standardizing or making the 
 container log files configurable.  Right now the log files are determined in 
 userland and all that is YARN is aware of the log directory.
 Does this present any issues I'm not considering?  If so it this might only 
 be needed for AMs? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-05-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-675:


Description: Similar to MAPREDUCE-4362, when an AM container fails, it 
would be helpful to pull its logs from the NM to the client so that they can be 
displayed immediately to the user.  (was: Similar to MAPREDUCE-4362, when an AM 
container fails, it would be helpful to be able to pull its logs so that they 
can be displayed immediately to the user.)

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-675:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-522

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-638) Restore RMDelegationTokens after RM Restart

2013-05-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-638:
-

Attachment: YARN-638.6.patch

The new patch:
1. adds the real FileSystemStore for recovering RMDelegationTokens.
2. renamed logUpdatedMasterKey, logExpireToken in hadoop-common to 
storeNewMasterKey and removeExpiredToken, also adds a new method 
removeStoredMasterKey

 Restore RMDelegationTokens after RM Restart
 ---

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch, YARN-638.2.patch, YARN-638.3.patch, 
 YARN-638.4.patch, YARN-638.5.patch, YARN-638.6.patch


 This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-307) NodeManager should log container launch command.

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-307:
-

Labels: usability  (was: )

 NodeManager should log container launch command.
 

 Key: YARN-307
 URL: https://issues.apache.org/jira/browse/YARN-307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
  Labels: usability

 NodeManager's DefaultContainerExecutor seems to log only path of default 
 container executor script instead of contents of script. It would be good to 
 log the execution command so that one could see what is being launched.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-615) ContainerLaunchContext.containerTokens should simply be called tokens

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-615:
-

Attachment: YARN-615-20130512.txt

Patch against latest trunk.

 ContainerLaunchContext.containerTokens should simply be called tokens
 -

 Key: YARN-615
 URL: https://issues.apache.org/jira/browse/YARN-615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-615-20130503.txt, YARN-615-20130512.txt


 ContainerToken is the name of the specific token that AMs use to launch 
 containers on NMs, so we should rename CLC.containerTokens to be simply 
 tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-05-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655661#comment-13655661
 ] 

Bikas Saha commented on YARN-674:
-

Looks like we might have to resurrect the remaining changes proposed in the 
document in YARN-549, namely sending an event to RMAppManager instead of 
calling it RMAppManager.submitApplication() directly since that method is no 
longer cheap. Any other alternatives?

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-615) ContainerLaunchContext.containerTokens should simply be called tokens

2013-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655675#comment-13655675
 ] 

Hadoop QA commented on YARN-615:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12582870/YARN-615-20130512.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/916//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/916//console

This message is automatically generated.

 ContainerLaunchContext.containerTokens should simply be called tokens
 -

 Key: YARN-615
 URL: https://issues.apache.org/jira/browse/YARN-615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-615-20130503.txt, YARN-615-20130512.txt


 ContainerToken is the name of the specific token that AMs use to launch 
 containers on NMs, so we should rename CLC.containerTokens to be simply 
 tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-422) Add NM client library

2013-05-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655689#comment-13655689
 ] 

Bikas Saha commented on YARN-422:
-


Rename onExceptionRaisedWhenStartingContainer() to onStartContainerError()? 
What do you say?

The comment should not say 10 since the value can change with time. Is the 
initial value checked to be less than the MAX specified?
{code}
+// Start with a default core-pool size of 10 and change it dynamically.
+threadPool = new ThreadPoolExecutor(INITIAL_THREAD_POOL_SIZE,
+Integer.MAX_VALUE, 1, TimeUnit.HOURS,
{code}

Improve grammar?
{code}
+  // See if we need up the pool size only if haven't reached the
+  // maximum limit yet.
{code}

Should the boolean flag set/get be part of NMClient interface itself?
{code}
+if (!(client instanceof NMClientImpl) ||
+((NMClientImpl) client).stopAllRunningContainersOnStoppingEnabled()) 
{code}

Why is TestAMRMClient being removed?

We need to double check the synchronization/thread safety of this class. Lots 
of objects and threads. Can you please document the expected locking order?

 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.1.patch, YARN-422.2.patch, YARN-422.3.patch, YARN-422.4.patch, 
 YARN-422.5.patch


 Create a simple wrapper over the ContainerManager protocol to provide hide 
 the details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-638) Restore RMDelegationTokens after RM Restart

2013-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655691#comment-13655691
 ] 

Hadoop QA commented on YARN-638:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12582869/YARN-638.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/915//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/915//console

This message is automatically generated.

 Restore RMDelegationTokens after RM Restart
 ---

 Key: YARN-638
 URL: https://issues.apache.org/jira/browse/YARN-638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-638.1.patch, YARN-638.2.patch, YARN-638.3.patch, 
 YARN-638.4.patch, YARN-638.5.patch, YARN-638.6.patch


 This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
 added both in DelegationTokenRenewer (addressed in YARN-581), and 
 delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-307) NodeManager should log container launch command.

2013-05-12 Thread Lohit Vijayarenu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lohit Vijayarenu resolved YARN-307.
---

Resolution: Invalid

Resolving as wont invalid

 NodeManager should log container launch command.
 

 Key: YARN-307
 URL: https://issues.apache.org/jira/browse/YARN-307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
  Labels: usability

 NodeManager's DefaultContainerExecutor seems to log only path of default 
 container executor script instead of contents of script. It would be good to 
 log the execution command so that one could see what is being launched.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-502) RM crash with NPE on NODE_REMOVED event

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-502:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-676

 RM crash with NPE on NODE_REMOVED event
 ---

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu

 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-676) [Umbrella] Daemons crashing because of invalid state transitions

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-676:


 Summary: [Umbrella] Daemons crashing because of invalid state 
transitions
 Key: YARN-676
 URL: https://issues.apache.org/jira/browse/YARN-676
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Vinod Kumar Vavilapalli


There are several tickets tracking invalid transitions which essentially crash 
the daemons - RM, NM or AM. This is tracking ticket.

We should try to fix as many of them as soon as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-245:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-676

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Devaraj K

 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-296:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-676

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING for RMAppImpl
 

 Key: YARN-296
 URL: https://issues.apache.org/jira/browse/YARN-296
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K

 {code:xml}
 2012-12-28 11:14:47,671 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
 this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:528)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:72)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:405)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:389)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-346) InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE for ContainerImpl in Node Manager

2013-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-346:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-676

 InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE for 
 ContainerImpl in Node Manager
 ---

 Key: YARN-346
 URL: https://issues.apache.org/jira/browse/YARN-346
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha, 0.23.5
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Critical

 {code:xml}
 2013-01-16 23:55:52,067 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [INIT_CONTAINER]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 INIT_CONTAINER at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2013-01-16 23:55:52,067 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1358353581666_1326_01_10 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


I'll be out of the office returning 16 May

2013-05-12 Thread Anas Mosaad

I will be out of the office starting  05/12/2013 and will not return until
05/16/2013.

I will be out of office at a customer site with limitted ot no internet
access. For urgent matters, please contact my manager Mohamed Obide
(mob...@eg.ibm.com)



[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-05-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655767#comment-13655767
 ] 

Steve Loughran commented on YARN-530:
-

will look at this on monday. Off the top of my head
# concept of blockers is making explicit what things are depending on (simple 
name-string map) so that if a service is explicitly waiting for something to 
come up (say DN on NN), then it's visible, rather than just have something 
appearing to hang. Right now we are second guessing why the JT doesn't come up 
when HDFS is in safe mode, by polling HDFS and assuming the two states are are 
correlated.

# {{ServiceOperations()} have been in for a while  try to handle the old 
model; happy to pull them
# I'd love to mark the init/start stop and final, but one test using mockito 
didn't like it. I'll see if I can fix that test.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.4.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira