[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-02 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697629#comment-13697629
 ] 

Devaraj K commented on YARN-353:


The patch overall looks good, here are my observations on the patch.

1. {code:xml}
+  property
+descriptionACL's to be used for ZooKeeper znodes.
+This may be supplied when using
+org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
+as the value for yarn.resourcemanager.store.class/description
+nameyarn.resourcemanager.zk.rm-state-store.timeout.ms/name
+!--valueworld:anyone:rwcda/value--
+  /property
{code}

Here configuration name should be yarn.resourcemanager.zk.rm-state-store.acl.


2. {code:xml}
+  // protected to mock for testing
+  protected synchronized ZooKeeper getNewZooKeeper() throws Exception {
{code}

Can we also annotate with @VisibleForTesting for this method?


3. {code:xml}
+  /** HostPort of ZK server for ZKRMStateStore */
+descriptionHostPort of the ZooKeeper server when using 
{code}

These two places can we use Host:Port instead of HostPort for 
comment/description.


4. {code:xml}
+zkHostPort = conf.get(YarnConfiguration.ZK_RM_STATE_STORE_ADDRESS);
{code}


Can we use the default value for this config with this as present for other 
props,
{code:xml}
+!--value127.0.0.1:2181/value--
{code}

5. {code:xml}
+  public static final String DEFAULT_ZK_RM_STATE_STORE_PARENT_PATH = ;
{code}
Can we use the default value for this config with this instead of having empty,
{code:xml}
+!--value/rmstore/value--
{code}

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-379) yarn [node,application] command print logger info messages

2013-07-02 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved YARN-379.
---

   Resolution: Not A Problem
Fix Version/s: 2.2.0

This issue seems to have been fixed by YARN-530. I do not see the annoying 
message(s) any more.

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Ravi Prakash
  Labels: usability
 Fix For: 2.2.0

 Attachments: YARN-379.patch, YARN-379.patch


 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-395) RM should have a way to disable scheduling to a set of nodes

2013-07-02 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved YARN-395.
-

Resolution: Fixed

YARN-750 covers most cases.

 RM should have a way to disable scheduling to a set of nodes
 

 Key: YARN-395
 URL: https://issues.apache.org/jira/browse/YARN-395
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Arun C Murthy

 There should be a way to say schedule to A, B and C but never to D.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-395) RM should have a way to disable scheduling to a set of nodes

2013-07-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698018#comment-13698018
 ] 

Bikas Saha commented on YARN-395:
-

Not exactly. But it should be good enough for now.

 RM should have a way to disable scheduling to a set of nodes
 

 Key: YARN-395
 URL: https://issues.apache.org/jira/browse/YARN-395
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Arun C Murthy

 There should be a way to say schedule to A, B and C but never to D.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-763:
---

Attachment: YARN-763.2.patch

1. remove the boolean stop
2. put heartbeat thread interrupt logic inside the switch block 

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-353:
-

Attachment: YARN-353.3.patch

Devaraj, thanks for your review
new patch, fixed the findbug and comments.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698071#comment-13698071
 ] 

Hadoop QA commented on YARN-763:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12590473/YARN-763.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1415//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1415//console

This message is automatically generated.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698099#comment-13698099
 ] 

Hadoop QA commented on YARN-353:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12590474/YARN-353.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1416//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1416//console

This message is automatically generated.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows

2013-07-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned YARN-894:
--

Assignee: Chris Nauroth  (was: Chuan Liu)

 NodeHealthScriptRunner timeout checking is inaccurate on Windows
 

 Key: YARN-894
 URL: https://issues.apache.org/jira/browse/YARN-894
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chris Nauroth
Priority: Minor
 Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, 
 YARN-894-trunk.patch


 In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based 
 on the Shell execution results. Some status are based on the exception thrown 
 during the Shell script execution.
 Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, 
 and if Shell has the timeout status set at the same time, we will also set 
 HealthChecker status to timeout.
 We have following execution sequence in Shell:
 1) In main thread, schedule a delayed timer task that will kill the original 
 process upon timeout.
 2) In main thread, open a buffered reader and feed in the process's standard 
 input stream.
 3) When timeout happens, the timer task will call {{Process#destroy()}}
  to kill the main process.
 On Linux, when timeout happened and process killed, the buffered reader will 
 thrown an IOException with message: Stream closed in main thread.
 On Windows, we don't have the IOException. Only -1 was returned from the 
 reader that indicates the buffer is finished. As a result, the timeout status 
 is not set on Windows, and {{TestNodeHealthService}} fails on Windows because 
 of this.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-710) Add to ser/deser methods to RecordFactory

2013-07-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698232#comment-13698232
 ] 

Siddharth Seth commented on YARN-710:
-

In the unit test, the setters on the ApplicationId aren't meant to be used 
(will end up throwing exceptions - this is replaced by newInstance in 
AppliactionId). Don't think getProto() needs to be changed at all in 
RecordFactoryPBImpl - instead a new getBuilder method should be sufficient. 
Somewhere along the flow, it looks like the default proto ends up being created 
- possibly linked to the getProto changes.

 Add to ser/deser methods to RecordFactory
 -

 Key: YARN-710
 URL: https://issues.apache.org/jira/browse/YARN-710
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-710.patch, YARN-710.patch, YARN-710-wip.patch


 I order to do things like AMs failover and checkpointing I need to serialize 
 app IDs, app attempt IDs, containers and/or IDs,  resource requests, etc.
 Because we are wrapping/hiding the PB implementation from the APIs, we are 
 hiding the built in PB ser/deser capabilities.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-814:
-

Attachment: YARN-814.4.patch

new patch ,account for both stdout and stderr.

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698250#comment-13698250
 ] 

Jian He commented on YARN-814:
--

ran on single node, see the log messages

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-871) Failed to run MR example against latest trunk

2013-07-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698267#comment-13698267
 ] 

Junping Du commented on YARN-871:
-

Hi [~zjshen], given YARN-874 is committed, shall we resolve it?

 Failed to run MR example against latest trunk
 -

 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log


 Built the latest trunk, deployed a single node cluster and ran examples, such 
 as
 {code}
  hadoop jar 
 hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
  teragen 10 out1
 {code}
 The job failed with the following console message:
 {code}
 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
 uber mode : false
 13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
 state FAILED due to: Application application_1371844267731_0001 failed 2 
 times due to AM Container for appattempt_1371844267731_0001_02 exited 
 with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698280#comment-13698280
 ] 

Hadoop QA commented on YARN-814:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12590515/YARN-814.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1417//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1417//console

This message is automatically generated.

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-871) Failed to run MR example against latest trunk

2013-07-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-871.
--

Resolution: Cannot Reproduce

Thanks, [~djp]! Close it as cannot reproduce

 Failed to run MR example against latest trunk
 -

 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log


 Built the latest trunk, deployed a single node cluster and ran examples, such 
 as
 {code}
  hadoop jar 
 hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
  teragen 10 out1
 {code}
 The job failed with the following console message:
 {code}
 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
 uber mode : false
 13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
 state FAILED due to: Application application_1371844267731_0001 failed 2 
 times due to AM Container for appattempt_1371844267731_0001_02 exited 
 with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-02 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698326#comment-13698326
 ] 

Mayank Bansal commented on YARN-845:


I had an offline discussion with [~arpitgupta] and [~bikassaha] 

We are not able to reproduce the issue however we can synchronize the 
application object on assignreserved containers to make it consistent with 
another calls.
I am adding more logs to find the issue if we can get this crash. 

I am also sending yean run time exceptions if we get this null again.

Thanks,
Mayank

 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
 (ResourceManager.java:run(426)) - Exiting, bbye..
 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 SelectChannelConnector@hostXX:8088
 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
 (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
 recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
 interrupted
 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager 

[jira] [Updated] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-02 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-845:
---

Attachment: YARN-845-trunk-1.patch

Attaching updated patch and rebasing it.

Thanks,
Mayank

 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-1.patch, 
 YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
 (ResourceManager.java:run(426)) - Exiting, bbye..
 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 SelectChannelConnector@hostXX:8088
 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
 (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
 recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
 interrupted
 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
 system...
 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
 shutdown complete.
 2013-06-17 

[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED

2013-07-02 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698337#comment-13698337
 ] 

Mayank Bansal commented on YARN-245:


I just tried this patch and it does not need rebasing.

Thanks,
Mayank

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl

2013-07-02 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698346#comment-13698346
 ] 

Mayank Bansal commented on YARN-295:


Latest patch does not need any rebasing

Thanks,
Mayank

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE

2013-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698357#comment-13698357
 ] 

Hadoop QA commented on YARN-845:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12590532/YARN-845-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1418//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1418//console

This message is automatically generated.

 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-1.patch, 
 YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
   

[jira] [Created] (YARN-895) If NameNode is in safemode when RM restarts, RM should wait instead of crashing.

2013-07-02 Thread Jian He (JIRA)
Jian He created YARN-895:


 Summary: If NameNode is in safemode when RM restarts, RM should 
wait instead of crashing.
 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE

2013-07-02 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698414#comment-13698414
 ] 

Mayank Bansal commented on YARN-299:


This patch does not need rebasing

Thanks,
Mayank

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text

2013-07-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698415#comment-13698415
 ] 

Zhijie Shen commented on YARN-649:
--

Read the patch quickly. It looks almost fine to me. One minor question: why 
does getLogs not support XML?

{code}
+  @GET
+  @Path(/containerlogs/{containerid}/{filename})
+  @Produces({ MediaType.TEXT_PLAIN, MediaType.APPLICATION_JSON })
+  @Evolving
+  public Response getLogs(@PathParam(containerid) String containerIdStr,
+  @PathParam(filename) String filename) {
{code}

Here's some additional thoughts. For the long running applications, they may 
have a big log file, such that it will take a long time to download the log 
file via the RESTful API. Consequently, HTTP connection may timeout before 
downloading before downloading a complete log file. Maybe it is good to zip the 
log file before sending it, and unzip it after receiving it. Moreover, it can 
be more advanced to query the part of log which is recorded during timestamp1 
and timestamp2. Just think out loudly. Not sure it is required right now.

 Make container logs available over HTTP in plain text
 -

 Key: YARN-649
 URL: https://issues.apache.org/jira/browse/YARN-649
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649-4.patch, 
 YARN-649.patch, YARN-752-1.patch


 It would be good to make container logs available over the REST API for 
 MAPREDUCE-4362 and so that they can be accessed programatically in general.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-02 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698418#comment-13698418
 ] 

Mayank Bansal commented on YARN-502:


Latest patch does not need rebasing

Thanks,
Mayank

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text

2013-07-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698444#comment-13698444
 ] 

Sandy Ryza commented on YARN-649:
-

Thanks for taking a look Zhijie.

bq. why does getLogs not support XML?
Oops, leaving in MediaType.APPLICATION_JSON was a mistake.  My intention was 
actually to have it only support plain text.  Thoughts?

Regarding the zip files and the time-based queries, these seem like useful 
features, but I think they would be better for a separate JIRA, and can be 
added in a backwards-compatible manner with additional request parameters.  My 
goal here was to implement the minimum needed to work on MAPREDUCE-4362 and 
YARN-675.


 Make container logs available over HTTP in plain text
 -

 Key: YARN-649
 URL: https://issues.apache.org/jira/browse/YARN-649
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649-4.patch, 
 YARN-649.patch, YARN-752-1.patch


 It would be good to make container logs available over the REST API for 
 MAPREDUCE-4362 and so that they can be accessed programatically in general.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory

2013-07-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7:
--

Attachment: YARN-7-v2.patch

Sync up patch with latest trunk branch.

 Add support for DistributedShell to ask for CPUs along with memory
 --

 Key: YARN-7
 URL: https://issues.apache.org/jira/browse/YARN-7
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Arun C Murthy
Assignee: Junping Du
  Labels: patch
 Attachments: YARN-7.patch, YARN-7-v2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-02 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698500#comment-13698500
 ] 

Robert Joseph Evans commented on YARN-896:
--

During the most recent Hadoop Summit there was a developer meetup where we 
discussed some of these issues.  This is to summarize what was discussed at 
that meeting and to add in a few things that have also been discussed on 
mailing lists and other places.

HDFS delegation tokens have a maximum life time. Currently tokens submitted to 
the RM when the app master is launched will be renewed by the RM until the 
application finishes and the logs from the application have finished 
aggregating.  The only token currently used by the YARN framework is the HDFS 
delegation token.  This is used to read files from HDFS as part of the 
distributed cache and to write the aggregated logs out to HDFS.

In order to support relaunching an app master after the HDFS the maximum 
lifetime of the HDFS delegation token, we either need to allow for tokens that 
do not expire or provide an API to allow the RM to replace the old token with a 
new one.  Because removing the maximum lifetime of a token reduces the security 
of the cluster as a whole I think it would be better to provide an API to 
replace the token with a new one.

If we want to continue supporting log aggregation we also need to provide a way 
for the Node Managers to get the new token too.  It is assumed that each app 
master will also provide an API to get the new token so it can start using it.


Log aggregation is another issue, although not required for long lived 
applications to work.  Logs are aggregated into HDFS when the application 
finishes.  This is not really that useful for applications that are never 
intended to exit.  Ideally the processing of logs by the node manager should be 
pluggable so that clusters and applications can select how and when logs are 
processed/displayed to the end user.  Because many of these systems roll their 
logs to avoid filling up disks we will probably need a protocol of some sort 
for the container to communicate with the Node Manager when logs are ready to 
be processed.

Another issue is to allow containers to out live the app master that launched 
them and also to allow containers to outlive the node manager that launched 
them.  This is especially critical for the stability of applications durring 
rolling upgrades to YARN.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-02 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698505#comment-13698505
 ] 

Robert Joseph Evans commented on YARN-896:
--

Another issue that has been discussed in the past is the impact that long lived 
processes can have on resource scheduling. It is possible for a long lived 
process to grab lots of resources and then never release them even though it is 
using more resources than it would be allowed to have when the cluster is full. 
 Recent preemption changes should be able to prevent this from happening 
between different queues/pools, but we may need to think if we need more 
control about this within a queue.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users

2013-07-02 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698518#comment-13698518
 ] 

Omkar Vinit Joshi commented on YARN-661:


* First part..very much straight forward.. container-executor.c already had 
some code to do this ...just modified 
ResourceLocalizationService.cleanUpFilesFromSubDir to trigger it. (basically 
swapping subDir with baseDir )...
* I am exposing deletion task dependency to user via DeletionService. Now user 
can specify multilevel deletion task DAG and deletion service will take care of 
it one all parent (root) deletion tasks are started by user after defining 
dependency.
I tested this locally on secured cluster.. but will add test cases to verify 
that DAG actually works. I will update patch with test cases attaching initial 
patch.


 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users

2013-07-02 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-661:
---

Attachment: YARN-661-20130701.patch

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698522#comment-13698522
 ] 

Karthik Kambatla commented on YARN-502:
---

Looks good to me. +1

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira