[jira] [Assigned] (YARN-715) TestDistributedShell and TestUnmanagedAMLauncher are failing

2013-05-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-715:


Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

 TestDistributedShell and TestUnmanagedAMLauncher are failing
 

 Key: YARN-715
 URL: https://issues.apache.org/jira/browse/YARN-715
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Zhijie Shen
 Fix For: 2.0.5-beta

 Attachments: YARN-715-20130522.txt


 Tests are timing out. Looks like this is related to YARN-617.
 {code}
 2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] 
 containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to 
 start container.
 Expected containerId: user Found: container_1369183214008_0001_01_01
 2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] 
 security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - 
 PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
 Expected containerId: user Found: container_1369183214008_0001_01_01
 2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server 
 (Server.java:run(1864)) - IPC Server handler 0 on 54024, call 
 org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
 Expected containerId: user Found: container_1369183214008_0001_01_01
 org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request 
 to start container.
 Expected containerId: user Found: container_1369183214008_0001_01_01
   at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
   at 
 org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-715) TestDistributedShell and TestUnmanagedAMLauncher are failing

2013-05-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-715:


Assignee: Vinod Kumar Vavilapalli  (was: Zhijie Shen)

Press the wrong button, sorry. Return to Vinod

 TestDistributedShell and TestUnmanagedAMLauncher are failing
 

 Key: YARN-715
 URL: https://issues.apache.org/jira/browse/YARN-715
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.0.5-beta

 Attachments: YARN-715-20130522.txt


 Tests are timing out. Looks like this is related to YARN-617.
 {code}
 2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] 
 containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to 
 start container.
 Expected containerId: user Found: container_1369183214008_0001_01_01
 2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] 
 security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - 
 PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
 Expected containerId: user Found: container_1369183214008_0001_01_01
 2013-05-21 17:40:23,695 INFO  [IPC Server handler 0 on 54024] ipc.Server 
 (Server.java:run(1864)) - IPC Server handler 0 on 54024, call 
 org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
 Expected containerId: user Found: container_1369183214008_0001_01_01
 org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request 
 to start container.
 Expected containerId: user Found: container_1369183214008_0001_01_01
   at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
   at 
 org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-422) Add NM client library

2013-05-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Attachment: YARN-422.10.patch

Thanks, Vinod! Addressed most of your comments except the following:

bq. Once you do the above. cleanupRunningContainers need not be synchronized.

synchronized is kept for synchronization consistency on starteContainers. It 
won't be a performance bottleneck as it is called when NMClient stops.

The biggest change is using state machine in the newest patch.



 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.10.patch, YARN-422.1.patch, YARN-422.2.patch, YARN-422.3.patch, 
 YARN-422.4.patch, YARN-422.5.patch, YARN-422.6.patch, YARN-422.8.patch, 
 YARN-422.9.patch, YARN-422-javadoc-fixes.txt


 Create a simple wrapper over the ContainerManager protocol to provide hide 
 the details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-422) Add NM client library

2013-05-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Attachment: YARN-422.11.patch

Fix the findbug problem

 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.10.patch, YARN-422.11.patch, YARN-422.1.patch, YARN-422.2.patch, 
 YARN-422.3.patch, YARN-422.4.patch, YARN-422.5.patch, YARN-422.6.patch, 
 YARN-422.8.patch, YARN-422.9.patch, YARN-422-javadoc-fixes.txt


 Create a simple wrapper over the ContainerManager protocol to provide hide 
 the details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-422) Add NM client library

2013-05-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Attachment: YARN-422.12.patch

Thanks, Vinod! Addressed your comments. One minor thing is that I still don't 
use newInstance to construct the complex record, such as 
ApplicationSubmissionContext and ContainerLaunchContext, which has too may 
parameters to fill.

 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.10.patch, YARN-422.11.patch, YARN-422.12.patch, YARN-422.1.patch, 
 YARN-422.2.patch, YARN-422.3.patch, YARN-422.4.patch, YARN-422.5.patch, 
 YARN-422.6.patch, YARN-422.8.patch, YARN-422.9.patch, 
 YARN-422-javadoc-fixes.txt


 Create a simple wrapper over the ContainerManager protocol to provide hide 
 the details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-422) Add NM client library

2013-05-24 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Attachment: YARN-422.13.patch

Fix the findbug

 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.10.patch, YARN-422.11.patch, YARN-422.12.patch, YARN-422.13.patch, 
 YARN-422.1.patch, YARN-422.2.patch, YARN-422.3.patch, YARN-422.4.patch, 
 YARN-422.5.patch, YARN-422.6.patch, YARN-422.8.patch, YARN-422.9.patch, 
 YARN-422-javadoc-fixes.txt


 Create a simple wrapper over the ContainerManager protocol to provide hide 
 the details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-422) Add NM client library

2013-05-28 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-422:
-

Attachment: YARN-422.14.patch

Addressed Vinod's comments. One notable change is that now all the errors go 
through the error callback functions, and starContainer, stopContainer and 
getContainerStatus of NMClienAsync don't throw exceptions anymore.

 Add NM client library
 -

 Key: YARN-422
 URL: https://issues.apache.org/jira/browse/YARN-422
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: AMNMClient_Defination.txt, 
 AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf, 
 YARN-422.10.patch, YARN-422.11.patch, YARN-422.12.patch, YARN-422.13.patch, 
 YARN-422.14.patch, YARN-422.1.patch, YARN-422.2.patch, YARN-422.3.patch, 
 YARN-422.4.patch, YARN-422.5.patch, YARN-422.6.patch, YARN-422.8.patch, 
 YARN-422.9.patch, YARN-422-javadoc-fixes.txt


 Create a simple wrapper over the ContainerManager protocol to provide hide 
 the details of the protocol implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-639) Make AM of Distributed Shell Use NMClient

2013-05-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-639:
-

Attachment: YARN-639.1.patch

Now change AM of distributed shell to use NMClient, and update the tests.

 Make AM of Distributed Shell Use NMClient
 -

 Key: YARN-639
 URL: https://issues.apache.org/jira/browse/YARN-639
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-639.1.patch


 YARN-422 adds NMClient. AM of Distributed Shell should use it instead of 
 using ContainerManager directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-730) NMClientAsync needs to remove completed container

2013-05-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-730:


 Summary: NMClientAsync needs to remove completed container
 Key: YARN-730
 URL: https://issues.apache.org/jira/browse/YARN-730
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


NMClientAsync needs to remove completed container

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-730) NMClientAsync needs to remove completed container

2013-05-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-730:
-

Attachment: YARN-730.1.patch

If the container is failed or done, it should be removed from the 
StartedContainer collection.

 NMClientAsync needs to remove completed container
 -

 Key: YARN-730
 URL: https://issues.apache.org/jira/browse/YARN-730
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-730.1.patch


 NMClientAsync needs to remove completed container

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-733) TestNMClient fails occasionally

2013-05-31 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-733:
-

Attachment: YARN-733.1.patch

In the patch:
1. Update the tests to wait until the expected container status occur
2. In NMClientImpl, add a piece of javadoc to describe that 
startContainer/stopContainer returns doesn't mean container is actually 
started/stopped. There could be a transit container status.

Have run the test for tens of times, and no failure occurs.

 TestNMClient fails occasionally
 ---

 Key: YARN-733
 URL: https://issues.apache.org/jira/browse/YARN-733
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-733.1.patch


 The problem happens at:
 {code}
 // getContainerStatus can be called after stopContainer
 try {
   ContainerStatus status = nmClient.getContainerStatus(
   container.getId(), container.getNodeId(),
   container.getContainerToken());
   assertEquals(container.getId(), status.getContainerId());
   assertEquals(ContainerState.RUNNING, status.getState());
   assertTrue( + i, status.getDiagnostics().contains(
   Container killed by the ApplicationMaster.));
   assertEquals(-1000, status.getExitStatus());
 } catch (YarnRemoteException e) {
   fail(Exception is not expected);
 }
 {code}
 NMClientImpl#stopContainer returns, but container hasn't been stopped 
 immediately. ContainerManangerImpl implements stopContainer in async style. 
 Therefore, the container's status is in transition. 
 NMClientImpl#getContainerStatus immediately after stopContainer will get 
 either the RUNNING status or the COMPLETE one.
 There will be the similar problem wrt NMClientImpl#startContainer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-720) container-log4j.properties should not refer to mapreduce properties

2013-05-31 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-720:


Assignee: Zhijie Shen

 container-log4j.properties should not refer to mapreduce properties
 ---

 Key: YARN-720
 URL: https://issues.apache.org/jira/browse/YARN-720
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Zhijie Shen

 This refers to yarn.app.mapreduce.container.log.dir and 
 yarn.app.mapreduce.container.log.filesize. This should either be moved into 
 the MR codebase. Alternately the parameters should be renamed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-733) TestNMClient fails occasionally

2013-05-31 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-733:
-

Attachment: YARN-733.2.patch

Fix the javadoc and refactor the test. Thank Omkar and Vinod for your review.

 TestNMClient fails occasionally
 ---

 Key: YARN-733
 URL: https://issues.apache.org/jira/browse/YARN-733
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-733.1.patch, YARN-733.2.patch


 The problem happens at:
 {code}
 // getContainerStatus can be called after stopContainer
 try {
   ContainerStatus status = nmClient.getContainerStatus(
   container.getId(), container.getNodeId(),
   container.getContainerToken());
   assertEquals(container.getId(), status.getContainerId());
   assertEquals(ContainerState.RUNNING, status.getState());
   assertTrue( + i, status.getDiagnostics().contains(
   Container killed by the ApplicationMaster.));
   assertEquals(-1000, status.getExitStatus());
 } catch (YarnRemoteException e) {
   fail(Exception is not expected);
 }
 {code}
 NMClientImpl#stopContainer returns, but container hasn't been stopped 
 immediately. ContainerManangerImpl implements stopContainer in async style. 
 Therefore, the container's status is in transition. 
 NMClientImpl#getContainerStatus immediately after stopContainer will get 
 either the RUNNING status or the COMPLETE one.
 There will be the similar problem wrt NMClientImpl#startContainer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-761) TestNMClientAsync fails sometimes

2013-06-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-761:
-

Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

 TestNMClientAsync fails sometimes
 -

 Key: YARN-761
 URL: https://issues.apache.org/jira/browse/YARN-761
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-761.1.patch


 See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/.
 It passed on my machine though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-761) TestNMClientAsync fails sometimes

2013-06-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-761:
-

Attachment: YARN-761.1.patch

It's the race condition problem, but is restricted with the test.

It happens when
{code}
Assert.assertEquals(Completed container is not removed, 0,
asyncClient.containers.size());
{code}
executes before all ContainerEventProcessors completes and containers are 
removed.

After the following step:
{code}
while (!((TestCallbackHandler1) asyncClient.callbackHandler)
.isStopFailureCallsExecuted()) {
  Thread.sleep(10);
}
{code}
all the callback functions are executed, but the remaining operations in 
ContainerEventProcessors#run may still not be executed.

 TestNMClientAsync fails sometimes
 -

 Key: YARN-761
 URL: https://issues.apache.org/jira/browse/YARN-761
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-761.1.patch


 See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/.
 It passed on my machine though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-641) Make AMLauncher in RM Use NMClient

2013-06-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677368#comment-13677368
 ] 

Zhijie Shen commented on YARN-641:
--

One blocking issue here is that if AMLauncher want to refer NMClient, 
resourcemanager sub-project needs to add dependency on yarn-client. On the 
other hand, yarn-client already has the dependency on resourcemanager for the 
test phase. It seems that the cyclic dependency error will be reported no 
matter what the phase of the dependency is. Correct me if I am wrong.

IMHO, the way to walk around could be refactoring the code: either moving 
NMClient set to hadoop-common or moving test classes to their related server 
sub-projects and eliminating yarn-client's dependency on server sub-projects. 
Any suggestion?

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions

2013-06-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-731:


Assignee: Zhijie Shen

 RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
 --

 Key: YARN-731
 URL: https://issues.apache.org/jira/browse/YARN-731
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Zhijie Shen

 Will be required for YARN-662. Also, remote NPEs show up incorrectly for some 
 unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-777) Remove unreferenced objects from proto

2013-06-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678448#comment-13678448
 ] 

Zhijie Shen commented on YARN-777:
--

+1. Checked the code base, StringURLMapProto is useless. Straightforward 
change, no tests is ok.

 Remove unreferenced objects from proto
 --

 Key: YARN-777
 URL: https://issues.apache.org/jira/browse/YARN-777
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-777.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-737) Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142

2013-06-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678789#comment-13678789
 ] 

Zhijie Shen commented on YARN-737:
--

Search around the source code. It seems that the following exceptions can be 
unwrapped as well.

TestLocalContainerAllocator
{code}
when(scheduler.allocate(isA(AllocateRequest.class)))
  .thenThrow(RPCUtil.getRemoteException(new IOException(forcefail)));
{code}

ContainerManagerImpl
{code}
try {
  tokenIdentifier = BuilderUtils.newContainerTokenIdentifier(token);
} catch (IOException e) {
  throw RPCUtil.getRemoteException(e);
}
{code}

{code}
  } catch (IOException e) {
throw RPCUtil.getRemoteException(e);
  }
{code}

MiniYARNCluster

{code}
  } catch (YarnException ioe) {
LOG.info(Exception in heartbeat from node  + 
request.getNodeStatus().getNodeId(), ioe);
throw RPCUtil.getRemoteException(ioe);
  }
{code}

{code}
  } catch (YarnException ioe) {
LOG.info(Exception in node registration from 
+ request.getNodeId().toString(), ioe);
throw RPCUtil.getRemoteException(ioe);
  }
{code}

 Some Exceptions no longer need to be wrapped by YarnException and can be 
 directly thrown out after YARN-142 
 

 Key: YARN-737
 URL: https://issues.apache.org/jira/browse/YARN-737
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-737.1.patch, YARN-737.2.patch, YARN-737.3.patch, 
 YARN-737.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient

2013-06-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-641:
-

Attachment: YARN-641.1.patch

In the patch, ApplicationMasterLauncher is changed to extends NMClientAsync, 
and AMLauncher is changed to make use of NMClient APIs to start/stop AM 
containers.

A number of tests that previously use ContainerManager APIs directly changed to 
use NMClient APIs instead.

Last but not least, due to the mvn dependency check issue, all the tests in 
yarn-client has been moved to server-tests, yarn-client cleans the dependency 
on server sub-projects, and then server-resourcemanager add dependency on 
yarn-client.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-641.1.patch


 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions

2013-06-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-731:
-

Attachment: YARN-731.1.patch

Added the block in RPCUtil#unwrapAndThrowException to handle the runtime 
exceptions. Corresponding tests are added.

 RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
 --

 Key: YARN-731
 URL: https://issues.apache.org/jira/browse/YARN-731
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Zhijie Shen
 Attachments: YARN-731.1.patch


 Will be required for YARN-662. Also, remote NPEs show up incorrectly for some 
 unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions

2013-06-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-731:
-

Attachment: YARN-731.2.patch

Fixed the minor nits. Thanks, Sid!

 RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
 --

 Key: YARN-731
 URL: https://issues.apache.org/jira/browse/YARN-731
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Zhijie Shen
 Attachments: YARN-731.1.patch, YARN-731.2.patch


 Will be required for YARN-662. Also, remote NPEs show up incorrectly for some 
 unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient

2013-06-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-641:
-

Attachment: YARN-641.2.patch

Update the patch to make using NMClient configurable.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-641.1.patch, YARN-641.2.patch


 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient

2013-06-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-641:
-

Attachment: YARN-641.3.patch

Fix the test failure.

 Make AMLauncher in RM Use NMClient
 --

 Key: YARN-641
 URL: https://issues.apache.org/jira/browse/YARN-641
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-641.1.patch, YARN-641.2.patch, YARN-641.3.patch


 YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions 
 with an application's AM container. AMLauncher should also replace the raw 
 ContainerManager proxy with NMClient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500

2013-06-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681838#comment-13681838
 ] 

Zhijie Shen commented on YARN-800:
--

Did a quick local test, and found the link was not broken. It seems that the 
default value has already been in yarn-default.xml
{code}
  property
descriptionThe hostname of the RM./description
nameyarn.resourcemanager.hostname/name
value0.0.0.0/value
  /property 
{code}

{code}
  property
descriptionThe address of the RM web application./description
nameyarn.resourcemanager.webapp.address/name
value${yarn.resourcemanager.hostname}:8088/value
  /property
{code}

and YarnConfiguration

{code}
  public static final String RM_WEBAPP_ADDRESS = 
RM_PREFIX + webapp.address;

  public static final int DEFAULT_RM_WEBAPP_PORT = 8088;
  public static final String DEFAULT_RM_WEBAPP_ADDRESS = 0.0.0.0: +
DEFAULT_RM_WEBAPP_PORT;
{code}

Looked into the code, it seems to be related to yarn.web-proxy.address. In 
WebAppProxyServlet,

{code}
  resp.setStatus(client.executeMethod(config, method));
{code}

tries to connect the proxy host to show the application webpage. If 
yarn.web-proxy.address is not set, RM will become the proxy, and its address 
will be $\{yarn.resourcemanager.hostname\}:8088 as well.

Maybe it is good to check the configuration of yarn.web-proxy.address

 Clicking on an AM link for a running app leads to a HTTP 500
 

 Key: YARN-800
 URL: https://issues.apache.org/jira/browse/YARN-800
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Priority: Critical

 Clicking the AM link tries to open up a page with url like
 http://hostname:8088/proxy/application_1370886527995_0645/
 and this leads to an HTTP 500

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682444#comment-13682444
 ] 

Zhijie Shen commented on YARN-803:
--

It sounds a good idea to support polymorphy of config validation. The patch 
looks almost fine. Here's two minor suggestions:

1. As all schedulers have implemented setConf, how about defining it in 
YarnScheduler as well? Therefore, the newly added scheduler in the future will 
also be forced to implement the method to validate its config (probably it have 
to do so).

2. setConf sounds a bit confusing, because the method doesn't set the config, 
but validate it. How about renaming it as validateConf?

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682652#comment-13682652
 ] 

Zhijie Shen commented on YARN-803:
--

looks good. +1

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-812) Enabling app summary logs causes 'FileNotFound' errors

2013-06-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683087#comment-13683087
 ] 

Zhijie Shen commented on YARN-812:
--

+1 for the patch.

One related question. Does hadoop.mapreduce.jobsummary.logger need to be set 
in hadoop-env.sh similarly if we want to enable it? If so, maybe it's good to 
add a similar comment for it as well.

 Enabling app summary logs causes 'FileNotFound' errors
 --

 Key: YARN-812
 URL: https://issues.apache.org/jira/browse/YARN-812
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Ramya Sunil
Assignee: Siddharth Seth
 Attachments: YARN-812.2.txt, YARN-812.txt


 RM app summary logs have been enabled as per the default config:
 {noformat}
 #
 # Yarn ResourceManager Application Summary Log 
 #
 # Set the ResourceManager summary log filename
 yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
 # Set the ResourceManager summary log level and appender
 yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
 # Appender for ResourceManager Application Summary Log
 # Requires the following properties to be set
 #- hadoop.log.dir (Hadoop Log directory)
 #- yarn.server.resourcemanager.appsummary.log.file (resource manager app 
 summary log filename)
 #- yarn.server.resourcemanager.appsummary.logger (resource manager app 
 summary log level and appender)
 log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
 log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
 log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
 log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
 log4j.appender.RMSUMMARY.MaxFileSize=256MB
 log4j.appender.RMSUMMARY.MaxBackupIndex=20
 log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
 log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
 {noformat}
 This however, throws errors while running commands as non-superuser:
 {noformat}
 -bash-4.1$ hadoop dfs -ls /
 DEPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 log4j:ERROR setFile(null,true) call failed.
 java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No 
 such file or directory)
 at java.io.FileOutputStream.openAppend(Native Method)
 at java.io.FileOutputStream.init(FileOutputStream.java:192)
 at java.io.FileOutputStream.init(FileOutputStream.java:116)
 at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
 at 
 org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
 at 
 org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
 at 
 org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
 at 
 org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
 at 
 org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
 at 
 org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
 at 
 org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
 at 
 org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
 at 
 org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
 at 
 org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
 at 
 org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
 at org.apache.log4j.LogManager.clinit(LogManager.java:127)
 at org.apache.log4j.Logger.getLogger(Logger.java:104)
 at 
 org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
 at 
 org.apache.commons.logging.impl.Log4JLogger.init(Log4JLogger.java:109)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116)
 at 
 org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858)
 at 
 org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
 at 
 

[jira] [Commented] (YARN-662) Enforce required parameters for all the protocols

2013-06-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683504#comment-13683504
 ] 

Zhijie Shen commented on YARN-662:
--

Bellow is the strategy to enforce the required parameters, check the values, 
and set the defaults:

1. As we have no control on Client and AM, the protocol objects need to be 
validated (both null check and value check) at ResourceManager and NodeManager 
when they are received.

2. The protocol objects that are constructed by ResourceManager and NodeManager 
need to validate the fields before sending them out. After YARN-753, the 
modification need to could be limited to the factory method in each protocol 
class.

3. Instead of changing the modifier in \*.proto from *optional* to *required*, 
we add the custom validation routines in Java code. See the suggestion in 
https://developers.google.com/protocol-buffers/docs/proto#simple

4. Default values are added in *.proto. Whenever an optional field that a the 
default value is not set in Java code, the default value will be picked.

 Enforce required parameters for all the protocols
 -

 Key: YARN-662
 URL: https://issues.apache.org/jira/browse/YARN-662
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Zhijie Shen

 All proto fields are marked as options. We need to mark some of them as 
 requried, or enforce these server side. Server side is likely better since 
 that's more flexible (Example deprecating a field type in favour of another - 
 either of the two must be present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-639) Make AM of Distributed Shell Use NMClient

2013-06-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683781#comment-13683781
 ] 

Zhijie Shen commented on YARN-639:
--

bq. why do we need to do getContainerStatus after successfully starting it? is 
it required?

it's not required for functioning of the distributed shell. However, since 
distributed shell somehow servers as a demo application, I'd like to have 
getContainerStatus there to show the usage of NMClientAsync and its callback 
handlers.

 Make AM of Distributed Shell Use NMClient
 -

 Key: YARN-639
 URL: https://issues.apache.org/jira/browse/YARN-639
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-639.1.patch


 YARN-422 adds NMClient. AM of Distributed Shell should use it instead of 
 using ContainerManager directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-639) Make AM of Distributed Shell Use NMClient

2013-06-14 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-639:
-

Attachment: YARN-639.2.patch

Jenkins test seemed to be killed. Resubmit the same patch and kick out the test 
again.

 Make AM of Distributed Shell Use NMClient
 -

 Key: YARN-639
 URL: https://issues.apache.org/jira/browse/YARN-639
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-639.1.patch, YARN-639.2.patch, YARN-639.2.patch


 YARN-422 adds NMClient. AM of Distributed Shell should use it instead of 
 using ContainerManager directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-826) Move Clock/SystemClock to util package

2013-06-15 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-826:


 Summary: Move Clock/SystemClock to util package
 Key: YARN-826
 URL: https://issues.apache.org/jira/browse/YARN-826
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Clock/SystemClock should belong to util.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-826) Move Clock/SystemClock to util package

2013-06-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-826:
-

Attachment: YARN-826.1.patch

Move Clock/SystemClock, patch needs to rebase when YARN-825 is checked in

 Move Clock/SystemClock to util package
 --

 Key: YARN-826
 URL: https://issues.apache.org/jira/browse/YARN-826
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-826.1.patch


 Clock/SystemClock should belong to util.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-828) Remove YarnVersionAnnotation

2013-06-16 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-828:


 Summary: Remove YarnVersionAnnotation
 Key: YARN-828
 URL: https://issues.apache.org/jira/browse/YARN-828
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


YarnVersionAnnotation is not used at all, and the version information can be 
accessed through YarnVersionInfo instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-828) Remove YarnVersionAnnotation

2013-06-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-828:
-

Attachment: YARN-828.1.patch

The patch removes YarnVersionAnnotation.

 Remove YarnVersionAnnotation
 

 Key: YARN-828
 URL: https://issues.apache.org/jira/browse/YARN-828
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-828.1.patch


 YarnVersionAnnotation is not used at all, and the version information can be 
 accessed through YarnVersionInfo instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-829) Rename RMTokenSelector to be RMDelegationTokenSelector

2013-06-16 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-829:


 Summary: Rename RMTokenSelector to be RMDelegationTokenSelector
 Key: YARN-829
 URL: https://issues.apache.org/jira/browse/YARN-829
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Therefore, the name of it will be consistent with that of 
RMDelegationTokenIdentifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-829) Rename RMTokenSelector to be RMDelegationTokenSelector

2013-06-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-829:
-

Attachment: YARN-829.1.patch

Rename RMTokenSelector, and modify its reference.

 Rename RMTokenSelector to be RMDelegationTokenSelector
 --

 Key: YARN-829
 URL: https://issues.apache.org/jira/browse/YARN-829
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-829.1.patch


 Therefore, the name of it will be consistent with that of 
 RMDelegationTokenIdentifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-828) Remove YarnVersionAnnotation

2013-06-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684597#comment-13684597
 ] 

Zhijie Shen commented on YARN-828:
--

Simple class removal, it should be fine without additional tests.

 Remove YarnVersionAnnotation
 

 Key: YARN-828
 URL: https://issues.apache.org/jira/browse/YARN-828
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-828.1.patch


 YarnVersionAnnotation is not used at all, and the version information can be 
 accessed through YarnVersionInfo instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-829) Rename RMTokenSelector to be RMDelegationTokenSelector

2013-06-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684598#comment-13684598
 ] 

Zhijie Shen commented on YARN-829:
--

Need to rebase after YARN-825 is committed.

 Rename RMTokenSelector to be RMDelegationTokenSelector
 --

 Key: YARN-829
 URL: https://issues.apache.org/jira/browse/YARN-829
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-829.1.patch


 Therefore, the name of it will be consistent with that of 
 RMDelegationTokenIdentifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-830) Refactor yarn.service and yarn.state in hadoop-yarn-common

2013-06-16 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-830:


 Summary: Refactor yarn.service and yarn.state in hadoop-yarn-common
 Key: YARN-830
 URL: https://issues.apache.org/jira/browse/YARN-830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


They are two good libraries, which are much independent and can benefit all 
hadoop sub-projects. Therefore, it is good to move them hadoop-common. In 
addition, Graph and VisualizeStateMachine, which are in util package now, 
should be moved to state package as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-830) Refactor yarn.service and yarn.state in hadoop-yarn-common

2013-06-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-830:
-

Attachment: YARN-830.1.patch

In addition to moving classes, ServiceStateException and 
InvalidStateTransitonException changes from extending YarnRuntimeException to 
extending RuntimeException directly.

Again, rebase is required after YARN-825 is committed.

 Refactor yarn.service and yarn.state in hadoop-yarn-common
 --

 Key: YARN-830
 URL: https://issues.apache.org/jira/browse/YARN-830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-830.1.patch


 They are two good libraries, which are much independent and can benefit all 
 hadoop sub-projects. Therefore, it is good to move them hadoop-common. In 
 addition, Graph and VisualizeStateMachine, which are in util package now, 
 should be moved to state package as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-830) Refactor yarn.service and yarn.state in hadoop-yarn-common

2013-06-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684729#comment-13684729
 ] 

Zhijie Shen commented on YARN-830:
--

The test failure is not related, and is filed in MAPREDUCE-5327

 Refactor yarn.service and yarn.state in hadoop-yarn-common
 --

 Key: YARN-830
 URL: https://issues.apache.org/jira/browse/YARN-830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-830.1.patch


 They are two good libraries, which are much independent and can benefit all 
 hadoop sub-projects. Therefore, it is good to move them hadoop-common. In 
 addition, Graph and VisualizeStateMachine, which are in util package now, 
 should be moved to state package as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-833) Move Graph and VisualizeStateMachine into yarn.state package

2013-06-16 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-833:


 Summary: Move Graph and VisualizeStateMachine into yarn.state 
package
 Key: YARN-833
 URL: https://issues.apache.org/jira/browse/YARN-833
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Graph and VisualizeStateMachine are only used by state machine, they should 
belong to state package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-826) Move Clock/SystemClock to util package

2013-06-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-826:
-

Attachment: YARN-826.2.patch

Rebase the patch, as YARN-825 has been checked in

 Move Clock/SystemClock to util package
 --

 Key: YARN-826
 URL: https://issues.apache.org/jira/browse/YARN-826
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-826.1.patch, YARN-826.2.patch


 Clock/SystemClock should belong to util.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-833) Move Graph and VisualizeStateMachine into yarn.state package

2013-06-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-833:
-

Attachment: YARN-833.1.patch

Move graph and visualizestatemachine, and update pom.xml accordingly.

 Move Graph and VisualizeStateMachine into yarn.state package
 

 Key: YARN-833
 URL: https://issues.apache.org/jira/browse/YARN-833
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-833.1.patch


 Graph and VisualizeStateMachine are only used by state machine, they should 
 belong to state package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-836) Cleanup Apps ConverterUtils StringHelper Times to avoid duplicate APIs

2013-06-16 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-836:


 Summary: Cleanup Apps  ConverterUtils  StringHelper  Times to 
avoid duplicate APIs
 Key: YARN-836
 URL: https://issues.apache.org/jira/browse/YARN-836
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-837) ClusterInfo.java doesn't seem to belong to org.apache.hadoop.yarn

2013-06-16 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-837:


 Summary: ClusterInfo.java doesn't seem to belong to 
org.apache.hadoop.yarn
 Key: YARN-837
 URL: https://issues.apache.org/jira/browse/YARN-837
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-837) ClusterInfo.java doesn't seem to belong to org.apache.hadoop.yarn

2013-06-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-837:
-

Attachment: YARN-837.1.patch

ClusterInfo is used by MR only, it should be moved to MR project.

 ClusterInfo.java doesn't seem to belong to org.apache.hadoop.yarn
 -

 Key: YARN-837
 URL: https://issues.apache.org/jira/browse/YARN-837
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-837.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-836) Cleanup Apps ConverterUtils StringHelper Times to avoid duplicate APIs

2013-06-17 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685020#comment-13685020
 ] 

Zhijie Shen commented on YARN-836:
--

Did some investigation. Bellow is the summary:

1. Methods in Times can be integrated to Time (hadoop-common).

2. join and percent in StringHelper are duplicate with those in StringUtils 
(hadoop-common). It's better to combine them as well.

3. toAppID in Apps is duplicate with toApplicationID in ConverterUtils

 Cleanup Apps  ConverterUtils  StringHelper  Times to avoid duplicate APIs
 

 Key: YARN-836
 URL: https://issues.apache.org/jira/browse/YARN-836
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-837) ClusterInfo.java doesn't seem to belong to org.apache.hadoop.yarn

2013-06-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-837:
-

Attachment: YARN-837.2.patch

Rebase the patch

 ClusterInfo.java doesn't seem to belong to org.apache.hadoop.yarn
 -

 Key: YARN-837
 URL: https://issues.apache.org/jira/browse/YARN-837
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-837.1.patch, YARN-837.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-836) Cleanup Apps ConverterUtils StringHelper Times to avoid duplicate APIs

2013-06-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-836:
-

Attachment: YARN-836.1.patch

The patch merge Times and StringHelpers to the corresponding classes in 
hadoop-common. Apps is combined with ConverterUtils.

 Cleanup Apps  ConverterUtils  StringHelper  Times to avoid duplicate APIs
 

 Key: YARN-836
 URL: https://issues.apache.org/jira/browse/YARN-836
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-836.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-834) Review/fix annotations for yarn-client module and clearly differentiate *Async apis

2013-06-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-834:
-

Attachment: YARN-834.2.patch

Fix TestNMClientAsync and test it locally. Will review the patch afterwards.

 Review/fix annotations for yarn-client module and clearly differentiate 
 *Async apis
 ---

 Key: YARN-834
 URL: https://issues.apache.org/jira/browse/YARN-834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-834.2.patch, YARN-834.patch, YARN-834.patch


 Review/fix annotations for yarn-client module

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-834) Review/fix annotations for yarn-client module and clearly differentiate *Async apis

2013-06-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-834:
-

Attachment: YARN-834.2.patch

Resubmit the patch, the previous one is wrong

 Review/fix annotations for yarn-client module and clearly differentiate 
 *Async apis
 ---

 Key: YARN-834
 URL: https://issues.apache.org/jira/browse/YARN-834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-834.2.patch, YARN-834.2.patch, YARN-834.patch, 
 YARN-834.patch


 Review/fix annotations for yarn-client module

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-834) Review/fix annotations for yarn-client module and clearly differentiate *Async apis

2013-06-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-834:


Assignee: Zhijie Shen  (was: Arun C Murthy)

 Review/fix annotations for yarn-client module and clearly differentiate 
 *Async apis
 ---

 Key: YARN-834
 URL: https://issues.apache.org/jira/browse/YARN-834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-834.2.patch, YARN-834.2.patch, YARN-834.patch, 
 YARN-834.patch


 Review/fix annotations for yarn-client module

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-834) Review/fix annotations for yarn-client module and clearly differentiate *Async apis

2013-06-17 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685926#comment-13685926
 ] 

Zhijie Shen commented on YARN-834:
--

Thanks, [~acmurthy] for the initial patch. I'll continue the work on it, and 
address [~vinodkv]'s comments.

 Review/fix annotations for yarn-client module and clearly differentiate 
 *Async apis
 ---

 Key: YARN-834
 URL: https://issues.apache.org/jira/browse/YARN-834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-834.2.patch, YARN-834.2.patch, YARN-834.patch, 
 YARN-834.patch


 Review/fix annotations for yarn-client module

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-834) Review/fix annotations for yarn-client module and clearly differentiate *Async apis

2013-06-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-834:
-

Attachment: YARN-834.3.patch

In the patch:

1. ResourceMgrDelegate is changed to implement YarnClient methods and contain 
an YarnClient instance inside.

2. Clear unnecessary import in UnmanagedAMLauncher.

3. Fix the findbugs and javadoc warnings.

4. Add package-info for all the packages in yarn-client.

 Review/fix annotations for yarn-client module and clearly differentiate 
 *Async apis
 ---

 Key: YARN-834
 URL: https://issues.apache.org/jira/browse/YARN-834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-834.2.patch, YARN-834.2.patch, YARN-834.3.patch, 
 YARN-834.patch, YARN-834.patch


 Review/fix annotations for yarn-client module

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-834) Review/fix annotations for yarn-client module and clearly differentiate *Async apis

2013-06-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-834:
-

Attachment: YARN-834.4.patch

Fix the test failure

 Review/fix annotations for yarn-client module and clearly differentiate 
 *Async apis
 ---

 Key: YARN-834
 URL: https://issues.apache.org/jira/browse/YARN-834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-834.2.patch, YARN-834.2.patch, YARN-834.3.patch, 
 YARN-834.4.patch, YARN-834.patch, YARN-834.patch


 Review/fix annotations for yarn-client module

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-871) Failed to run MR example against latest trunk

2013-06-21 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-871:


 Summary: Failed to run MR example against latest trunk
 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


Built the latest trunk, deployed a single node cluster and ran examples, such as

{code}
 hadoop jar 
hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
 teragen 10 out1
{code}

The job failed with the following console message:
{code}
13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
uber mode : false
13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
state FAILED due to: Application application_1371844267731_0001 failed 2 times 
due to AM Container for appattempt_1371844267731_0001_02 exited with  
exitCode: 127 due to: 
.Failing this attempt.. Failing the application.
13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
{code}




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-871) Failed to run MR example against latest trunk

2013-06-21 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-871:
-

Attachment: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log

Attach the RM log for diagnostic

 Failed to run MR example against latest trunk
 -

 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log


 Built the latest trunk, deployed a single node cluster and ran examples, such 
 as
 {code}
  hadoop jar 
 hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
  teragen 10 out1
 {code}
 The job failed with the following console message:
 {code}
 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
 uber mode : false
 13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
 state FAILED due to: Application application_1371844267731_0001 failed 2 
 times due to AM Container for appattempt_1371844267731_0001_02 exited 
 with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-872) BlockDecompressorStream#decompress will throw EOFException instead of return -1 when EOF

2013-06-21 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-872:


 Summary: BlockDecompressorStream#decompress will throw 
EOFException instead of return -1 when EOF
 Key: YARN-872
 URL: https://issues.apache.org/jira/browse/YARN-872
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


BlockDecompressorStream#decompress ultimately calls rawReadInt, which will 
throw EOFException instead of return -1 when encountering end of a stream. 
Then, decompress will be called by read. However, InputStream#read is supposed 
to return -1 instead of throwing EOFException to indicate the end of a stream. 
This explains why in LineReader,
{code}
  if (bufferPosn = bufferLength) {
startPosn = bufferPosn = 0;
if (prevCharCR)
  ++bytesConsumed; //account for CR from previous read
bufferLength = in.read(buffer);
if (bufferLength = 0)
  break; // EOF
  }
{code}
-1 is checked instead of catching EOFException.

Now the problem will occur with SnappyCodec. If an input file is compressed 
with SnappyCodec, it needs to be decompressed through BlockDecompressorStream 
when it is read. Then, if it empty, EOFException will been thrown from 
rawReadInt and break LineReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-874) Tracking YARN/MR test failures after HADOOP-9421

2013-06-21 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690904#comment-13690904
 ] 

Zhijie Shen commented on YARN-874:
--

Ran MR example on latest trunk, and saw. Not sure it is related or not.

{code}
2013-06-21 16:24:35,946 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=zshen
OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE  
DESCRIPTION=App failed with state: FAILED   PERMISSIONS=Application 
application_1371857063893_0001 failed 2 times due to Error launching 
appattempt_1371857063893_0001_02. Got exception: java.io.IOException: 
Failed on local exception: java.io.IOException: java.io.IOException: Server 
asks us to fall back to SIMPLE auth, but this client is configured to only 
allow secure connections.; Host Details : local host is: 
ZShens-MacBook-Pro.local/10.140.1.146; destination host is: localhost:9105; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
at org.apache.hadoop.ipc.Client.call(Client.java:1266)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy23.startContainer(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainer(ContainerManagementProtocolPBClientImpl.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:228)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: java.io.IOException: Server asks us to fall 
back to SIMPLE auth, but this client is configured to only allow secure 
connections.
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:589)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:552)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:635)
at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:258)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1367)
at org.apache.hadoop.ipc.Client.call(Client.java:1285)
... 9 more
Caused by: java.io.IOException: Server asks us to fall back to SIMPLE auth, but 
this client is configured to only allow secure connections.
at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:250)
at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:464)
at org.apache.hadoop.ipc.Client$Connection.access$1500(Client.java:258)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:628)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:625)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:624)
... 12 more
. Failing the application.  APPID=application_1371857063893_0001
{code}

 Tracking YARN/MR test failures after HADOOP-9421
 

 Key: YARN-874
 URL: https://issues.apache.org/jira/browse/YARN-874
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 HADOOP-9421 seems to have broken some YARN/MR tests. Tracking those..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-871) Failed to run MR example against latest trunk

2013-06-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692108#comment-13692108
 ] 

Zhijie Shen commented on YARN-871:
--

[~devaraj.k], the posted exception seems to be related to HADOOP-9421 and 
YARN-827. YARN-874 is tracking the issue.

 Failed to run MR example against latest trunk
 -

 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log


 Built the latest trunk, deployed a single node cluster and ran examples, such 
 as
 {code}
  hadoop jar 
 hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
  teragen 10 out1
 {code}
 The job failed with the following console message:
 {code}
 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
 uber mode : false
 13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
 state FAILED due to: Application application_1371844267731_0001 failed 2 
 times due to AM Container for appattempt_1371844267731_0001_02 exited 
 with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command

2013-06-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693277#comment-13693277
 ] 

Zhijie Shen commented on YARN-853:
--

The patch looks almost good, just one small nit: In constructor,
{code}
this.maxAMResourcePerQueuePercent = 
cs.getConfiguration().
getMaximumApplicationMasterResourcePerQueuePercent(getQueuePath());
{code}
maxAMResourcePerQueuePercent is set again in maxAMResourcePerQueuePercent. 
Though it does no harm now, it's still good to avoid setting the variable twice.

 maximum-am-resource-percent doesn't work after refreshQueues command
 

 Key: YARN-853
 URL: https://issues.apache.org/jira/browse/YARN-853
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: YARN-853-1.patch, YARN-853.patch


 If we update yarn.scheduler.capacity.maximum-am-resource-percent / 
 yarn.scheduler.capacity.queue-path.maximum-am-resource-percent 
 configuration and then do the refreshNodes, it uses the new config value to 
 calculate Max Active Applications and Max Active Application Per User. If we 
 add new node after issuing  'rmadmin -refreshQueues' command, it uses the old 
 maximum-am-resource-percent config value to calculate Max Active Applications 
 and Max Active Application Per User. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-07-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697185#comment-13697185
 ] 

Zhijie Shen commented on YARN-675:
--

[~sandyr], would you mind my taking this ticket over? We're trying to push the 
better error reporting tickets to be fixed ASAP. Thanks!

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-07-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697493#comment-13697493
 ] 

Zhijie Shen commented on YARN-675:
--

Take it over. Thanks!

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Zhijie Shen

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-07-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-675:


Assignee: Zhijie Shen

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Zhijie Shen

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-871) Failed to run MR example against latest trunk

2013-07-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-871.
--

Resolution: Cannot Reproduce

Thanks, [~djp]! Close it as cannot reproduce

 Failed to run MR example against latest trunk
 -

 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log


 Built the latest trunk, deployed a single node cluster and ran examples, such 
 as
 {code}
  hadoop jar 
 hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
  teragen 10 out1
 {code}
 The job failed with the following console message:
 {code}
 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
 uber mode : false
 13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
 state FAILED due to: Application application_1371844267731_0001 failed 2 
 times due to AM Container for appattempt_1371844267731_0001_02 exited 
 with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text

2013-07-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698415#comment-13698415
 ] 

Zhijie Shen commented on YARN-649:
--

Read the patch quickly. It looks almost fine to me. One minor question: why 
does getLogs not support XML?

{code}
+  @GET
+  @Path(/containerlogs/{containerid}/{filename})
+  @Produces({ MediaType.TEXT_PLAIN, MediaType.APPLICATION_JSON })
+  @Evolving
+  public Response getLogs(@PathParam(containerid) String containerIdStr,
+  @PathParam(filename) String filename) {
{code}

Here's some additional thoughts. For the long running applications, they may 
have a big log file, such that it will take a long time to download the log 
file via the RESTful API. Consequently, HTTP connection may timeout before 
downloading before downloading a complete log file. Maybe it is good to zip the 
log file before sending it, and unzip it after receiving it. Moreover, it can 
be more advanced to query the part of log which is recorded during timestamp1 
and timestamp2. Just think out loudly. Not sure it is required right now.

 Make container logs available over HTTP in plain text
 -

 Key: YARN-649
 URL: https://issues.apache.org/jira/browse/YARN-649
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649-4.patch, 
 YARN-649.patch, YARN-752-1.patch


 It would be good to make container logs available over the REST API for 
 MAPREDUCE-4362 and so that they can be accessed programatically in general.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text

2013-07-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699270#comment-13699270
 ] 

Zhijie Shen commented on YARN-649:
--

bq. Oops, leaving in MediaType.APPLICATION_JSON was a mistake. My intention was 
actually to have it only support plain text. Thoughts?

For MAPREDUCE-4362 and YARN-675, I think TEXT is enough. However, if it does no 
harm, how about leaving more media type options to users?

bq. My goal here was to implement the minimum needed to work on MAPREDUCE-4362 
and YARN-675.

Agree. Maybe the enhancement can be discussed in YARN-896 later.

 Make container logs available over HTTP in plain text
 -

 Key: YARN-649
 URL: https://issues.apache.org/jira/browse/YARN-649
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649-4.patch, 
 YARN-649.patch, YARN-752-1.patch


 It would be good to make container logs available over the REST API for 
 MAPREDUCE-4362 and so that they can be accessed programatically in general.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-07-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699488#comment-13699488
 ] 

Zhijie Shen commented on YARN-675:
--

Checked YarnClient and found one issue: ContainerId is not directly accessible 
from the aspect of YarnClient. Correct me if I'm wrong here.

One way to walk around is to use the RESTful API to request either AppInfo or 
AppAttemptInfo from RMWebServices. It contains the url to the AM container log. 
Then, we can use this url to pull the log.

Currently, this url is pointing to a webpage. After YARN-649 gets fixed, I'd 
like to update it to pointing the RESTful API of obtaining the container log, 
because IMHO, it's enough for the DAO object to just hold the log content, 
which is independent of rendering.

Thoughts, please.

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Zhijie Shen

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-675) In YarnClient, pull AM logs on AM container failure

2013-07-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699538#comment-13699538
 ] 

Zhijie Shen commented on YARN-675:
--

It's feasible to pass ContainerId through ApplicationReport, but I'm a bit 
conservative of doing API change at this point, specially when the ContainerId 
is added only for pulling the log. How do you think?

[~sandyr], BTW, the URL to the container log is constructed in 
AppInfo/AppAttemptInfo somewhat differently from what is done in YARN-649.

{code}
String url = join(HttpConfig.getSchemePrefix(),
masterContainer.getNodeHttpAddress(),
/node, /containerlogs/,
ConverterUtils.toString(masterContainer.getId()),
/, app.getUser());
{code}

user is part of the url. If this is adopted, there's no need to get the user 
through request.getRemoteUser()

 In YarnClient, pull AM logs on AM container failure
 ---

 Key: YARN-675
 URL: https://issues.apache.org/jira/browse/YARN-675
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Zhijie Shen

 Similar to MAPREDUCE-4362, when an AM container fails, it would be helpful to 
 pull its logs from the NM to the client so that they can be displayed 
 immediately to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702177#comment-13702177
 ] 

Zhijie Shen commented on YARN-502:
--

+1, the patch looks good to me

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl

2013-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702245#comment-13702245
 ] 

Zhijie Shen commented on YARN-295:
--

I agree with moving RMAppAttempt from ALLOCATED to FAILED through 
AMContainerCrashedTransition.

WRT the test, is the following not necessary?

{code}
+launchApplicationAttempt(amContainer);
+runApplicationAttempt(amContainer, host, 8042, oldtrackingurl);
{code}

See testAllocatedToFailed.

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl

2013-07-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702464#comment-13702464
 ] 

Zhijie Shen commented on YARN-296:
--

The patch should work, but IMHO, the essential problem is that APP_ACCEPTED is 
not expected at RUNNING. APP_ACCEPTED is created during ScheduleTransition of a 
RMAppAttempt, and is consumed when a RMApp moves from SUBMITTED to ACCEPTED. 
Only after the RMApp enters ACCEPTED, it can further move to RUNNING (similar 
for UnmanagedAM). Therefore, APP_ACCEPTED shouldn't be seen when the RMApp is 
at RUNNING.

Moreover, it seems impossible that APP_ACCEPTED belongs to the last 
RMAppAttempt if the RMApp is retrying, as retry can only happen after the RMApp 
enters ACCEPTED, where APP_ACCEPTED produced by the last RMAppAttempt has 
already be consumed.

[~devaraj], would you mind post more context around the 
InvalidStateTransitonException, such that we can dig more about the problem?

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING for RMAppImpl
 

 Key: YARN-296
 URL: https://issues.apache.org/jira/browse/YARN-296
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-296-trunk-1.patch, YARN-296-trunk-2.patch


 {code:xml}
 2012-12-28 11:14:47,671 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
 this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:528)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:72)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:405)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:389)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl

2013-07-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702979#comment-13702979
 ] 

Zhijie Shen commented on YARN-296:
--

I'm afraid it is different case wrt YARN-295.

In YARN-295, AFAIK, CONTAINER_FINISHED is likely to arrive as early as an 
RMAppAttempt is at SCHEDULED while its container is at ALLOCATED, where the 
container may send CONTAINER_FINISHED if it is killed or expired. Would you 
please double check it? If so, please add more missing transitions in YARN-295.

Here, theoretically, APP_ACCEPTED should not exist when an RMApp is at RUNNING 
since the event has been consumed before the RMAppImpl moves to RUNNING.

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING for RMAppImpl
 

 Key: YARN-296
 URL: https://issues.apache.org/jira/browse/YARN-296
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-296-trunk-1.patch, YARN-296-trunk-2.patch


 {code:xml}
 2012-12-28 11:14:47,671 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
 this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:528)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:72)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:405)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:389)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702987#comment-13702987
 ] 

Zhijie Shen commented on YARN-873:
--

bq. Null report can be unclear as to what happened.

IMHO, null is a reasonable indicator of unknown AppId (e.g. null is returned if 
Java Map contains no mapping for a key), unless it implies more than one cases. 
Checked the code, and found null is just returned when the RMApp is not found 
in rmContext, such that it indicates a unique case.

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong

 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-906) TestNMClient.testNMClientNoCleanupOnStop fails occasionally

2013-07-09 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-906:


 Summary: TestNMClient.testNMClientNoCleanupOnStop fails 
occasionally
 Key: YARN-906
 URL: https://issues.apache.org/jira/browse/YARN-906
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


See 
https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703018#comment-13703018
 ] 

Zhijie Shen commented on YARN-347:
--

+1, the patch looks good to me, and the test failure should be unrelated. 
Opened YARN-906 to trace the test failure, though I couldn't reproduce it 
locally.

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text

2013-07-09 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703508#comment-13703508
 ] 

Zhijie Shen commented on YARN-649:
--

bq. if support JSON/XML, how will you wrap up the logs in there?

Like other info classes in dao, we can wrap log content together with appId, 
containerId and user into a so-called ContainerLogInfo class.

bq. Is it worth?

Good question. Not sure about it. More thoughts, please.

 Make container logs available over HTTP in plain text
 -

 Key: YARN-649
 URL: https://issues.apache.org/jira/browse/YARN-649
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649-4.patch, 
 YARN-649.patch, YARN-752-1.patch


 It would be good to make container logs available over the REST API for 
 MAPREDUCE-4362 and so that they can be accessed programatically in general.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command

2013-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704264#comment-13704264
 ] 

Zhijie Shen commented on YARN-853:
--

+1. looks good to me

 maximum-am-resource-percent doesn't work after refreshQueues command
 

 Key: YARN-853
 URL: https://issues.apache.org/jira/browse/YARN-853
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853.patch


 If we update yarn.scheduler.capacity.maximum-am-resource-percent / 
 yarn.scheduler.capacity.queue-path.maximum-am-resource-percent 
 configuration and then do the refreshNodes, it uses the new config value to 
 calculate Max Active Applications and Max Active Application Per User. If we 
 add new node after issuing  'rmadmin -refreshQueues' command, it uses the old 
 maximum-am-resource-percent config value to calculate Max Active Applications 
 and Max Active Application Per User. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query on application Types

2013-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704773#comment-13704773
 ] 

Zhijie Shen commented on YARN-865:
--

The patch looks good to me, but I have two minor comments:

1. In the test, is it better to add the case of empty type string?
2. the application *Type* of the application, should it be lowercase?

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users

2013-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705285#comment-13705285
 ] 

Zhijie Shen commented on YARN-661:
--

I agree with the idea, and the patch looks generally good. Bellow are some 
comments.

* Not sure getCancelIfAnyDependentTaskFailedFlag is necessary. If deletion of 
the sub-dir fails, should deletion of the root-dir fail automatically?

* Instead of synchronizing over HashMap, how about using ConcurrentHashMap 
instead?
{code}
+if (!this.deletionTaskDependencyMap.containsKey(parentTask)) {
+  this.deletionTaskDependencyMap.put(parentTask,
+new ArrayListFileDeletion());
+}
+ListFileDeletion dependentTaskList =
+this.deletionTaskDependencyMap.get(parentTask);
{code}
can be simplified as
{code}
+ListFileDeletion dependentTaskList =
+this.deletionTaskDependencyMap.putIfAbsent(
+parentTask, new ArrayListFileDeletion());
{code}

* WRT the dependency, I'd like rather call task pair predecessor and successor 
instead of parent and child, because it's a dag not a tree. Moreover, how about 
defining public void populateFileDeletionTaskDependency(ListFileDeletion,  
FileDeletion), and populateFileDeletionTaskDependency(ListFileDeletion,  
ListFileDeletion) wraps it. In fact, the former one seems to be enough for 
the patch.
{code}
+  public void populateFileDeletionTaskDependency(ListFileDeletion 
parentTasks,
+  ListFileDeletion childDependentTasks) {
{code}

* How about calling it delete as well, just overloading the method?
{code}
+  public void deleteHelper(FileDeletion fileDeletion) {
{code}

* Please use LocalResource.newInstance instead.
{code}
+LocalResource localResource = Records.newRecord(LocalResource.class);
{code}

* There's one finbugs warning to fix.

* In the following method
{code}
+public boolean matches(Object o) {
{code}
How about refactoring the code in the following pattern, which should be more 
clear.
{code}
if (obj1 == null  obj2 != null) {
  return false;
} else if (obj1 != null  obj2 == null) {
  return false;
} else if (obj1 != null  obj2 != null) {
  // your logic
}
{code}

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-292:


Assignee: Zhijie Shen

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706108#comment-13706108
 ] 

Zhijie Shen commented on YARN-292:
--

Will look into this problem

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-865) RM webservices can't query on application Types

2013-07-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706126#comment-13706126
 ] 

Zhijie Shen commented on YARN-865:
--

+1 for the latest patch

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706432#comment-13706432
 ] 

Zhijie Shen commented on YARN-292:
--

{code}
  // Acquire the AM container from the scheduler.
  Allocation amContainerAllocation = appAttempt.scheduler.allocate(
  appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
  EMPTY_CONTAINER_RELEASE_LIST, null, null);
{code}
The above code will eventually pull the newly allocated containers in 
newlyAllocatedContainers.

Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives 
CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during 
ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. 
Therefore, pulling newlyAllocatedContainers happens when RMContainer is at 
ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when 
it is still at NEW. In conclusion, one container in the allocation is expected 
in AMContainerAllocatedTransition.

Hinted by [~nemon], the problem may happen at
{code}
FiCaSchedulerApp application = getApplication(applicationAttemptId);
if (application == null) {
  LOG.error(Calling allocate on removed  +
  or non existant application  + applicationAttemptId);
  return EMPTY_ALLOCATION;
}
{code}
EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be 
inconsistent synchronization on accessing the application map.

Suddenly be aware that [~djp] has started working on this problem. Please feel 
free to take it over. Thanks! 

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706673#comment-13706673
 ] 

Zhijie Shen commented on YARN-321:
--

bq. To start with, we will have an implementation with per-app HDFS file.

How about starting with an in-memory implementation, which is the easiest to do 
and is useful for testing.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706686#comment-13706686
 ] 

Zhijie Shen commented on YARN-321:
--

bq. ResoureManager will write per-application data to a (hopefully very) thin 
HistoryStorage layer.

Here's more ellaboration about the per-application data. There should be three 
objects to record: RMApp, RMAppAttempt and RMContainer. Bellow are properties 
of each object:

Completed Application:
* Application ID
* Application Name
* Application Type
* User
* Queue
* Submit Time
* Start Time
* Finish Time
* Diagnostics Info
* Final Application Status
* Num of Application Attempts

Completed Application Attempt:
* Application Attempt ID
* Application ID
* Host
* RPC Port
* Tracking URL
* Original Tracking URL (not sure it is necessary)
* Diagnostics Info
* Final Application Status
* Master Container ID
* Num of Containers

Completed Container:
* Container ID
* Application Attempt ID
* Final Container Status
* Resource
* Priority
* Node ID
* Log URL

Application has one-to-many relationship with Application Attempt, while 
Application Attempt has one-to-one relationship with Container.

WRT the concrete information to record, here's more idea about the interface of 
HistoryStorage. The follow APIs should be useful for RM to persist application 
history and for AHS to query it:
* IterableCompletedApplication getApplications([conditions...])
* CompletedApplication getApplication(ApplicationId)
* IterableCompletedApplicationAttempt getApplicationAttempts(ApplicationId)
* CompletedApplicationAttempt getApplicationAttempt(ApplicationAttemptId)
* CompletedContainer getContainer(ApplicationAttemptId)
* CompletedContainer getContainer(ContainerId)
* void addApplication(CompletedApplication)
* void addApplicationAttempt(CompletedApplicationAttempt)
* void addContainer(CompletedContainer)

In addition, HistoryStorage APIs may involve a lot of I/O operations such that 
the response of an API will be long. Therefore, it is likely to be good to make 
the API non-blocking.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706773#comment-13706773
 ] 

Zhijie Shen commented on YARN-321:
--

bq. Are we moving aggregated log management(i.e deletion after expiry) 
responsibility to AHS?

Sorry for misunderstanding your previous question. IMHO, in the recent future, 
we're not moving the aggregated log management, but duplicate it, which Both 
AHS and JHS can serve the same aggregated logs. However, AHS and JHS see the 
same logs from different point of views. AHS simply considers them as container 
logs, no matter what application it is, while JHS know they are the MR job 
logs. [~vinodkv], would you please confirm it?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708761#comment-13708761
 ] 

Zhijie Shen commented on YARN-744:
--

The passed in appAttemptId for an app currently seems to be the same object, 
such that it can be used to for synchronized blocks, but I agree with the idea 
of wrapper, because it is more predictable and stand-alone in 
ApplicationMasterService.

BTW, is it convenient to write a test case for concurrent allocation? Like 
TestClientRMService#testConcurrentAppSubmit.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-924) TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout

2013-07-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-924.
--

Resolution: Duplicate

It is duplicate with YARN-906.

 TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout
 --

 Key: YARN-924
 URL: https://issues.apache.org/jira/browse/YARN-924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Zhijie Shen

 Error Message
 test timed out after 18 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 18 milliseconds
   at 
 org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.println(ConsoleOutputCapture.java:87)
   at java.lang.Throwable.printStackTrace(Throwable.java:464)
   at java.lang.Throwable.printStackTrace(Throwable.java:451)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:349)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:317)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:182)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709072#comment-13709072
 ] 

Zhijie Shen commented on YARN-744:
--

bq. locking on appAttemptId in case of allocate / RegisterApplicationMaster 
call won't work. They are coming from client...can't guarantee that they are 
identical in terms grabbing a lock.. thoughts?

I meant that AMRMClient uses the same appAttemptId, but the uniqueness is not 
guaranteed, so I agreed with the self-contained locker - wrapper.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709098#comment-13709098
 ] 

Zhijie Shen commented on YARN-321:
--

bq. Is it per application or only one thread in RM?

I think it should be one thread in RM.

bq. Isn't it be a good idea that as soon as application starts we send the 
information to AHS and let AHS write all the data published by RM for that 
application.

I'm afraid a number of metrics cannot be determined when an application has 
just been started, such as the finish time and the final status.



 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-906) TestNMClient.testNMClientNoCleanupOnStop fails occasionally

2013-07-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709195#comment-13709195
 ] 

Zhijie Shen commented on YARN-906:
--

Did some investigation in this test failure. The test itself seems to have no 
problem. The test was timeout because the container state kept RUNNING after it 
was stopped, which was not expected.

Looked into the test log: after stopContainer was called, Container moved from 
LOCALIZED to KILLING, but didn't move on any more. However, looked into my 
local test log of a successful run: Container moved from LOCALIZED to KILLING, 
and then from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL, during which the major 
work is to clean the localized container resources (observed the execution of 
file deletion). However, the failed test log didn't show any file deletion. 
Therefore, I guess there's something blocking during container resources 
cleanup. Thoughts?

More investigation is needed to further locate the problem.

 TestNMClient.testNMClientNoCleanupOnStop fails occasionally
 ---

 Key: YARN-906
 URL: https://issues.apache.org/jira/browse/YARN-906
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-321) Generic application history service

2013-07-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-321:
-

Attachment: HistoryStorageDemo.java

bq. However, during the design, it would be nice to outline (at least at a 
high-level) how the plugins can work.

Good suggestion. I think there should be a way to make HistoryStorage extensive 
to store per framework information. My rough idea is to make HistoryStorage so 
general that storing RM basic information is just a special case of doing 
storage. To demonstrate the idea, I've uploaded HistoryStorageDemo.java., which 
sketches the high-level design.

We can define a schema, which can be extended by users to define the exact 
information their applications want to record. There're a bunch of default 
schemas, which are used for the information of RMApp, RMAppAttempt, and 
RMContainer. The default schemas will be loaded when HistoryStorage is 
constructed (or during init() if it's a service), while the customized schemas 
can be loaded via configuration or runtime. The methods of adding/reading a 
tuple/tuples of any schema are exposed, and the APIs that manipulate the basic 
information from RM simply wrap the aforementioned methods.

HistoryStorage owns a map of abstract file, which is the real place to persist 
the history information of a specific schema. We can implement different types 
of this file, such as InMemoryFile. When a schema is loaded, a file should be 
prepared. The file should expose some basic APIs, such as appending a tuple, 
reading all tuples, and seeking for a particular tuple.

Any thoughts?

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
 Attachments: HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710132#comment-13710132
 ] 

Zhijie Shen commented on YARN-875:
--

+1 onError(Throwable). However, does it belong to the incompatible API changes?

Should we catch Throwable t? For example, RuntimeException can also break the 
callback thread. 
{code}
+} catch (Exception ex) {
{code}
 
Another question is how AMRMClientAsync wants to handle RuntimeException from 
CallbackHandler APIs, though they're not supposed to throw Exception. What 
NMClientAsync does is to wrap try-catch for each calling of CallbackHanlder 
APIs, catching Throwable, logging and then ignoring it.

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710207#comment-13710207
 ] 

Zhijie Shen commented on YARN-875:
--

IMHO, users know best about the implementations of CallbackHandler APIs. 
They're supposed to handle all the possible exceptions with the scope of the 
callback methods. In contrast, AMRMClientAsync cannot judge how severe the 
runtime exception is. I'm afraid stopping the callback thread is too harsh 
given a minor or fixable exception. In addition, for example, if onError() 
throws a RuntimeException for some reason and the exception is not ignored, 
onError() will keep handle and throw a new one circularly.

Another consideration is that CallbackHandlers of AMRMClientAsync and 
NMClientAsync are good to have consistent behavior.

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-931) Overlapping classes across hadoop-yarn-api and hadoop-yarn-common

2013-07-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710280#comment-13710280
 ] 

Zhijie Shen commented on YARN-931:
--

hm... the packages have been defined in both sub-projects, such that 
package-info in these packages are overlapping.

How about merging the packages by moving the classes from hadoop-yarn-api to 
hadoop-yarn-common, as RecordFactory, RecordFactoryProvider and Records sound 
not like tipical API stuff?

Otherwise, package-info of either hadoop-yarn-api or hadoop-yarn-common needs 
to be removed. Notice that org.apache.hadoop.yarn.api in hadoop-yarn-api has 
package-info, but that in hadoop-yarn-common doesn't.

 Overlapping classes across hadoop-yarn-api and hadoop-yarn-common
 -

 Key: YARN-931
 URL: https://issues.apache.org/jira/browse/YARN-931
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah

 hadoop-yarn-common-3.0.0-SNAPSHOT.jar, hadoop-yarn-api-3.0.0-SNAPSHOT.jar 
 define 3 overlappping classes: 
 [WARNING]   - org.apache.hadoop.yarn.factories.package-info
 [WARNING]   - org.apache.hadoop.yarn.util.package-info
 [WARNING]   - org.apache.hadoop.yarn.factory.providers.package-info

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-930) Bootstrap ApplicationHistoryService module

2013-07-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710351#comment-13710351
 ] 

Zhijie Shen commented on YARN-930:
--

Thanks, Vinod. Overall the patch looks nice. There're some minor comments:

1. Should AHSClientService implements ApplicationHistoryProtocol directly, as 
ClientRMService, ApplicationMasterService and so on did?

2. Shall we have AHSClient and AHSClientAsync in hadoop-yarn-client?

 Bootstrap ApplicationHistoryService module
 --

 Key: YARN-930
 URL: https://issues.apache.org/jira/browse/YARN-930
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-930-20130716.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-906) TestNMClient.testNMClientNoCleanupOnStop fails occasionally

2013-07-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710475#comment-13710475
 ] 

Zhijie Shen commented on YARN-906:
--

Did some further investigation, and found Container wasn't even able to enter 
the stage of cleaner container resources, because CONTAINER_KILLED_ON_REQUEST 
was not received. This event should be emitted in ContainerLaunch.call(). 
However, the execution of this method was not logged (It was logged in my local 
log of a successful test run). Lacking CONTAINER_KILLED_ON_REQUEST, the 
container was stuck at KILLING.

In detail, as mentioned in the previous comment, the container was stopped, 
such that it moved from LOCALIZED to KILLING, KillTransition was executed, 
CLEANUP_CONTAINER was handled by ContainersLauncher. Here's a piece of code:
{code}
if (rContainer != null 
 !rContainer.isDone()) {
  // Cancel the future so that it won't be launched 
  // if it isn't already.
  rContainer.cancel(false);
}
{code}
It tried to cancel the execution of ContainerLaunch.call() which was scheduled 
when handling LAUNCH_CONTAINER. If ContainerLaunch.call() is unfortunately 
still not started, it will be canceled here. Therefore, the following code in 
ContainerLaunch.call() will not be executed.
{code}
if (ret == ExitCode.FORCE_KILLED.getExitCode()
|| ret == ExitCode.TERMINATED.getExitCode()) {
  // If the process was killed, Send container_cleanedup_after_kill and
  // just break out of this method.
  dispatcher.getEventHandler().handle(
new ContainerExitEvent(containerID,
ContainerEventType.CONTAINER_KILLED_ON_REQUEST, ret,
Container exited with a non-zero exit code  + ret));
  return ret;
}
{code}
The container will then never receive CONTAINER_KILLED_ON_REQUEST to trigger 
the next transition.

I'll work on a patch to fix the problem

 TestNMClient.testNMClientNoCleanupOnStop fails occasionally
 ---

 Key: YARN-906
 URL: https://issues.apache.org/jira/browse/YARN-906
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   6   7   8   9   10   >