[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246426#comment-14246426
 ] 

Rohith commented on YARN-2340:
--

Thanks [~nishan] for reporting this issue. I too encountered with similar 
situation while testing on trunk code and later RM remain in stand by.



> NPE thrown when RM restart after queue is STOPPED
> -
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246442#comment-14246442
 ] 

Rohith commented on YARN-2340:
--

Scenario executed
# Start Yarn cluster, and submit long running application to Queue to 
default.Initially, RM1 is active
# *Stop the queue default* in both RM1 and RM2 using -refreshQueue. Queue can 
be stopped even when application is running, but wont accept new application 
submissions.
# Switch the RM, let RM2 transitionedToActive. But here application recovery 
fails since queue already stopped. Below logs shows the failure, but *RMAppImpl 
state is updated as FAILED RMAppAttempt remain as null*. RM remain in standby
{noformat}
2014-12-15 11:01:17,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1418620667348_0001 with 1 attempts and final state = null
2014-12-15 11:01:17,814 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1418620667348_0001_01 with final state: null
/.
/
2014-12-15 11:01:17,824 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Queue root.default is STOPPED. Cannot accept submission of application: 
application_1418620667348_0001
2014-12-15 11:01:17,825 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to submit application application_1418620667348_0001 to queue default 
from user rohith
org.apache.hadoop.security.AccessControlException: Queue root.default is 
STOPPED. Cannot accept submission of application: application_1418620667348_0001
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.submitApplication(LeafQueue.java:575)

2014-12-15 11:01:17,939 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering app attempt : appattempt_1418620667348_0001_01
2014-12-15 11:01:17,941 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
application application_1418620667348_0001 with final state: FAILED
{noformat}
# After restart , Final state in RMApp=FAILED and RMAppImpl=null as shown 
below. RM can not recover the applications, and continuously fails. 
{noformat}
2014-12-15 11:01:41,493 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1418620667348_0001 with 1 attempts and final state = FAILED
2014-12-15 11:01:41,494 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1418620667348_0001_01 with final state: null
{noformat}

> NPE thrown when RM restart after queue is STOPPED
> -
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246475#comment-14246475
 ] 

Devaraj K commented on YARN-2356:
-

{code:xml}
-1 findbugs. The patch appears to introduce 10 new Findbugs (version 2.0.3) 
warnings.
{code}
These findbugs will be handled by YARN-2940.

{code:xml}
-1 core tests. The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:
org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
{code}
These test failures are unrelated to the patch and handled by YARN-2782, 
YARN-2783.

> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoo

[jira] [Updated] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-2356:

Hadoop Flags: Reviewed

+1, latest patch looks good to me

> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
>  Application with id 'application_1402668848165_0015' doesn't exist in RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246491#comment-14246491
 ] 

Hudson commented on YARN-2356:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6717 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6717/])
YARN-2356. yarn status command for non-existent application/application 
(devaraj: rev fae3e8614f4f9a42904e39c51ca68b0d1e67469f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.

[jira] [Updated] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-15 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2340:
-
Target Version/s: 2.7.0
 Summary: NPE thrown when RM restart after queue is STOPPED. There 
after RM can not recovery application's and remain in standby  (was: NPE thrown 
when RM restart after queue is STOPPED)

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246557#comment-14246557
 ] 

Sunil G commented on YARN-2356:
---

Thank you [~devaraj.k] for reviewing and committing same, thank you [~jianhe] 
for reviewing the patch.



> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
>  Application with id 'application_1402668848165_0015' doesn't exist in RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246555#comment-14246555
 ] 

Rohith commented on YARN-2340:
--

Some thoughts for fixing this issue either of below 2 are
1. Straight away invoke KILL event for application if application is submitting 
into STOPPED queue during recovering applications. KILL event smothly 
transition RMApp/RMAppAttempt to KILLED state. But throw exception while 
killing master container since Either NM's were not registered to RM OR 
"Connection Refused" when NM is down.
{{CS#addApplication}}
{code}
   // Submit to the queue
try {
  queue.submitApplication(applicationId, user, queueName);
} catch (AccessControlException ace) {
  LOG.info("Failed to submit application " + applicationId + " to queue "
  + queueName + " from user " + user, ace);
  if (isAppRecovering) {
LOG.info("Killing the application " + applicationId);
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.KILL));
  } else {
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, ace.toString()));
  }
  return;
}
{code}

{{CS#addApplicationAttempt}}
{code}
SchedulerApplication application =
applications.get(applicationAttemptId.getApplicationId());
if (application == null && isAttemptRecovering) {
  LOG.info("Attempt is recovering from an application where Queue is 
stopped."
  + applicationAttemptId);
  return;
}
{code}

2. Introduce new event type like APP_RECOVERY_FAILED  or 
APP_SCHEDULER_RECOVERY_FAILED and trigger from Scheduler if app is submitted to 
stopped queue while recovering. Transitions would be like below
AppAttempt : {{NEW to LAUNCHED}}
App : {{NEW to ACCEPTED}}
App : {{ACCEPTED to FINAL_SAVING}} on event APP_RECOVERY_FAILED  or 
APP_SCHEDULER_RECOVERY_FAILED 
AppAttempt : {{LAUNCHED to FINAL_SAVING}}
AppAttempt : {{FINAL_SAVING to FAILED}}
App : {{FINAL_SAVING to FAILED}}

Please give your suggestions/thoughts.


> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246581#comment-14246581
 ] 

Hudson commented on YARN-2356:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/42/])
YARN-2356. yarn status command for non-existent application/application 
(devaraj: rev fae3e8614f4f9a42904e39c51ca68b0d1e67469f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.u

[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246583#comment-14246583
 ] 

Hudson commented on YARN-2356:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #776 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/776/])
YARN-2356. yarn status command for non-existent application/application 
(devaraj: rev fae3e8614f4f9a42904e39c51ca68b0d1e67469f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRu

[jira] [Updated] (YARN-2949) Add documentation for CGroups

2014-12-15 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2949:

Attachment: apache-yarn-2949.0.patch

Initial document.

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2949.0.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246684#comment-14246684
 ] 

Hudson commented on YARN-2356:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1974 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/])
YARN-2356. yarn status command for non-existent application/application 
(devaraj: rev fae3e8614f4f9a42904e39c51ca68b0d1e67469f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.Tool

[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246686#comment-14246686
 ] 

Hudson commented on YARN-2356:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #39 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/39/])
YARN-2356. yarn status command for non-existent application/application 
(devaraj: rev fae3e8614f4f9a42904e39c51ca68b0d1e67469f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* hadoop-yarn-project/CHANGES.txt


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.u

[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246695#comment-14246695
 ] 

Hadoop QA commented on YARN-2949:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687233/apache-yarn-2949.0.patch
  against trunk revision fae3e86.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6113//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6113//console

This message is automatically generated.

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2949.0.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246727#comment-14246727
 ] 

Hudson commented on YARN-2356:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #43 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/43/])
YARN-2356. yarn status command for non-existent application/application 
(devaraj: rev fae3e8614f4f9a42904e39c51ca68b0d1e67469f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apach

[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246744#comment-14246744
 ] 

Hudson commented on YARN-2356:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1993 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1993/])
YARN-2356. yarn status command for non-existent application/application 
(devaraj: rev fae3e8614f4f9a42904e39c51ca68b0d1e67469f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, 
> 0003-YARN-2356.patch, 0004-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop

[jira] [Commented] (YARN-2959) Fair Scheduler "fifo" option can violate FIFO behavior and cause deadlock among jobs

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246753#comment-14246753
 ] 

Karthik Kambatla commented on YARN-2959:


Thanks for reporting this, Ashwin. A couple of follow-up questions:
# Is preemption enabled? I would have expected Job B to have containers 
preempted and handed over to Job A.
# If Job A was submitted before Job B, we should probably investigate why Job 
B's AM came up first? Are these MR jobs (or managed AMs)? If they are managed 
AMs and the order of requests was not honored, considering the AppAttempt 
start/register time might not help us much. 

> Fair Scheduler "fifo" option can violate FIFO behavior and cause deadlock 
> among jobs
> 
>
> Key: YARN-2959
> URL: https://issues.apache.org/jira/browse/YARN-2959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>
> We have a cluster which run jobs in fifo order(due to the nature of those 
> jobs) using Fair scheduler's "fifo" option.
> Recently we found jobs deadlocked in the cluster, here is what happened :
> There were two jobs,say A and B. A was submitted before B.
> Both were in PENDING state since the cluster was busy.
> When containers freed up, the two pending jobs got their AM containers at 
> about the same time. 
> However Job B's AM or appattempt1 registered with RM a little earlier than 
> Job A and grabbed available containers at that time, and satisfied a fraction 
> of its requirement. Note, JobB can't make progress until it gets all its 
> requirement satisfied.
> Next, JobA's appattempt1 registered with RM and since JobA was submitted 
> earlier, RM stops allocating containers to JobB and starts allocating to 
> JobA, satisfying a fraction of its requirement as well.
> Now together jobA,jobB hold the entire cluster, but neither can progress and 
> are deadlocked since their resource requests are partially satisfied.
> Note:Above is an example with 2 jobs, however the deadlock can happen with n 
> jobs : J1..Jn if the sequence of AM registration is Jn, J(n-1),..J1.
>  
> Solution : one proposed solution is to order the fifo queue by appattempt 
> start/register time instead of app submit time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2014-12-15 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2962:
--

 Summary: ZKRMStateStore: Limit the number of znodes under a znode
 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Priority: Critical


We ran into this issue where we were hitting the default ZK server message size 
configs, primarily because the message had too many znodes even though they 
individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2962:
---
Component/s: resourcemanager

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Priority: Critical
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2782) TestResourceTrackerOnHA fails on trunk

2014-12-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246765#comment-14246765
 ] 

Steve Loughran commented on YARN-2782:
--

Happens repeatedly on Java 8.

Looks like a race condition at startup; the NM registration code needs to have 
a longer wait/retry to cope with slower RM startup

> TestResourceTrackerOnHA fails on trunk
> --
>
> Key: YARN-2782
> URL: https://issues.apache.org/jira/browse/YARN-2782
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Zhijie Shen
>
> {code}
> Running org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 12.684 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> testResourceTrackerOnHA(org.apache.hadoop.yarn.client.TestResourceTrackerOnHA)
>   Time elapsed: 12.518 sec  <<< ERROR!
> java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 
> to asf905.gq1.ygridcore.net:28031 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
>   at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1438)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy87.registerNodeManager(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>   at com.sun.proxy.$Proxy88.registerNodeManager(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA.testResourceTrackerOnHA(TestResourceTrackerOnHA.java:64)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-565) AM UI does not show splits/locality info

2014-12-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246858#comment-14246858
 ] 

Vinod Kumar Vavilapalli commented on YARN-565:
--

This is a MR issue, moving it.

> AM UI does not show splits/locality info 
> -
>
> Key: YARN-565
> URL: https://issues.apache.org/jira/browse/YARN-565
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gopal V
>Priority: Minor
>  Labels: usability
>
> The AM UI currently shows the tasks without indicating the locality or 
> speculation of a task.
> This information is available by reading it out of the logs later, but while 
> tracking a slow/straggler task, this is invaluable in finding separating the 
> locality misses from other data-sensitive slow-downs or skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2014-12-15 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reopened YARN-2890:
---

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 2.7.0
>
> Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2014-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246865#comment-14246865
 ] 

Hitesh Shah commented on YARN-2890:
---

[~zjshen] [~mitdesai] This broke compatibility with hadoop 2.6. It should be 
reverted as the no. of arguments to the MiniYARNCluster have been changed. 

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 2.7.0
>
> Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2014-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246867#comment-14246867
 ] 

Hitesh Shah commented on YARN-2890:
---

Also, can this be ported back all the way to 2.4.x? 

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 2.7.0
>
> Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-12-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246882#comment-14246882
 ] 

Naganarasimha G R commented on YARN-2495:
-


Hi [~cwelch]
Thanks for taking time and reviewing.
bq. This can already be accomplished using the admin cli from a script or 
calling a web service from the node
IIUC the current way to accomplish running the script is through configuring 
the label script in health monitor script ? please correct me if i am wrong.
If it is so i see following problems
* It seems more like a workaround solution(or more like break in the health 
check functionality) to get the same thing acheived 
* One important point as [~aw] mentioned developer needs to take care of 
security issues which might be too much effort
* Feels like extra effort needs to be put by the developer to get it done which 
takes care of security, error handling , retries etc ... apart from the actual 
script which might be very simple
* Documentation and conveying the functionality to the users would be difficult
* What if admin wants both health script and node labels script ? 


bq. Is this all just to support putting labels into the node-managers 
configuration file and introducing them that way? Do we have a solid need for 
that? It's not needed for the dynamic script case, which is all I've seen 
discussed here from a "requirements" perspective (putting it into the config 
file / adding it to the heartbeat is implementation, I don't see a requirement 
for it as such).
As [~aw] mentioned for this, main idea is not to support through configuration 
file but run the configured script periodically and to directly update the RM 
when change in the labels. Also it can do the error handling as required if RM 
doesn't accept it.

bq. but I don't see any reason to have conditional enablement of the node 
manager side of things as well as a provider specification and I think it just 
adds unnecessary complexity and possible surprise at configuration time
Well this topic is debatable, but few points which i feel to be addressed if we 
want to support both central and distributed simultaneously :
# There will be always confusion which labels needs to be taken suppose user 
updates some labels for a node and NM label script doesnt have them then which 
do we give priority ?
# Do we need to "And" or "OR" when the labels are updated either by script /by 
admin CLI / REST web service

AFAIU [~gp.leftnoteasy] wanted to do these in stages and in the first stage to 
avoid confusion it would be better to have mutually exclusive i.e either 
central or distributed will be supported and if its distributed CLI 
modifications will be disabled (YARN-2728) and if central, labels from 
heartbeat will be discarded.

+1 for changes which you mentioned in {{NodeHeartbeatRequest}} & 
{{RegisterNodeLabelManagerResponse}} but for {{RegisterNodeManagerRequest}} yes 
it would be easy to read the interface if both are same but as explained in 
previous point if either one of them(central/distributed) then in case of 
distributed it anyway needs to send what ever it has and non distibuted case RM 
doesnt care abt it. So there was no need for it.

It would be helpful if  [~gp.leftnoteasy] & [~vinodkv] can further provide 
their views on the requirement level comments which has been raised by [~cwelch]

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer

2014-12-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246892#comment-14246892
 ] 

Naganarasimha G R commented on YARN-2919:
-

Hi [~kasha] , can you provide your opinion on the approach which i have 
mentioned in this sample patch ?

> Potential race between renew and cancel in DelegationTokenRenwer 
> -
>
> Key: YARN-2919
> URL: https://issues.apache.org/jira/browse/YARN-2919
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2919.20141209-1.patch
>
>
> YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a 
> race because of which a renewal in flight isn't interrupted by a cancel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2014-12-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246919#comment-14246919
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe], kindly review the patch.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.0
>
> Attachments: YARN-2902.002.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2014-12-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2962:
--

Assignee: Varun Saxena

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()

2014-12-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246927#comment-14246927
 ] 

Varun Saxena commented on YARN-2914:


[~kasha], kindly review this one.


> Potential race condition in 
> SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()
> 
>
> Key: YARN-2914
> URL: https://issues.apache.org/jira/browse/YARN-2914
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Ted Yu
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2914.002.patch, YARN-2914.patch
>
>
> {code}
>   public static ClientSCMMetrics getInstance() {
> ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
> if (topMetrics == null) {
>   throw new IllegalStateException(
> {code}
> getInstance() doesn't hold lock on Singleton.this
> This may result in IllegalStateException being thrown prematurely.
> [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
> race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246958#comment-14246958
 ] 

Karthik Kambatla commented on YARN-2919:


Was out most of last week. Just got back, will try and get to this today. 

> Potential race between renew and cancel in DelegationTokenRenwer 
> -
>
> Key: YARN-2919
> URL: https://issues.apache.org/jira/browse/YARN-2919
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2919.20141209-1.patch
>
>
> YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a 
> race because of which a renewal in flight isn't interrupted by a cancel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2914:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-1492

> Potential race condition in 
> SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()
> 
>
> Key: YARN-2914
> URL: https://issues.apache.org/jira/browse/YARN-2914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ted Yu
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2914.002.patch, YARN-2914.patch
>
>
> {code}
>   public static ClientSCMMetrics getInstance() {
> ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
> if (topMetrics == null) {
>   throw new IllegalStateException(
> {code}
> getInstance() doesn't hold lock on Singleton.this
> This may result in IllegalStateException being thrown prematurely.
> [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
> race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project

2014-12-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2784:
---
Component/s: (was: scripts)
 test

> Yarn project module names in POM needs to consistent acros hadoop project
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2014-12-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247050#comment-14247050
 ] 

Jian He commented on YARN-2890:
---

the easiest to resolve this right away is to revert the patch. 
I'm reverting the patch now..

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 2.7.0
>
> Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247067#comment-14247067
 ] 

Hudson commented on YARN-2890:
--

ABORTED: Integrated in Hadoop-trunk-Commit #6720 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6720/])
Revert "YARN-2890. MiniYARNCluster should start the timeline server based on 
the configuration. Contributed by Mit Desai." (jianhe: rev 
a4f2995b9ec8347612b7aeeb5a3a8b7191278790)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java


> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 2.7.0
>
> Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247126#comment-14247126
 ] 

Karthik Kambatla commented on YARN-2618:


Looks good. Ran findbugs locally, and didn't see any new issues. The tests pass 
locally as well.

bq. If the vdisks in the Resource class represents a share of disk operations, 
can we change the name in the Resource class as well to reflect this(from 
vdisks to something else)?
Spoke to Varun offline. I created a sub-task earlier to revisit the config 
names and it is a blocker for the merge. Let us look into all the configs 
together there. 

+1. Committing this. 

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247144#comment-14247144
 ] 

Karthik Kambatla commented on YARN-2914:


+1

> Potential race condition in 
> SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance()
> 
>
> Key: YARN-2914
> URL: https://issues.apache.org/jira/browse/YARN-2914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ted Yu
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2914.002.patch, YARN-2914.patch
>
>
> {code}
>   public static ClientSCMMetrics getInstance() {
> ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
> if (topMetrics == null) {
>   throw new IllegalStateException(
> {code}
> getInstance() doesn't hold lock on Singleton.this
> This may result in IllegalStateException being thrown prematurely.
> [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
> race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2914) Potential race condition in Singleton implementation of SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2914:
---
Summary: Potential race condition in Singleton implementation of 
SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics  (was: Potential 
race condition in 
SharedCacheUploaderMetrics/CleanerMetrics/ClientSCMMetrics#getInstance())

> Potential race condition in Singleton implementation of 
> SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics
> 
>
> Key: YARN-2914
> URL: https://issues.apache.org/jira/browse/YARN-2914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ted Yu
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2914.002.patch, YARN-2914.patch
>
>
> {code}
>   public static ClientSCMMetrics getInstance() {
> ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
> if (topMetrics == null) {
>   throw new IllegalStateException(
> {code}
> getInstance() doesn't hold lock on Singleton.this
> This may result in IllegalStateException being thrown prematurely.
> [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
> race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247159#comment-14247159
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6723 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6723/])
YARN-2914. [YARN-1492] Potential race condition in Singleton implementation of 
SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics. (Varun Saxena via 
kasha) (kasha: rev e597249d361bbe8383fb9b564eacda7c990b781d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/CleanerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/SharedCacheUploaderMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/ClientProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/TestCleanerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/ClientSCMMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/CleanerMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheUploaderService.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2914) Potential race condition in Singleton implementation of SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics

2014-12-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247160#comment-14247160
 ] 

Hudson commented on YARN-2914:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6723 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6723/])
YARN-2914. [YARN-1492] Potential race condition in Singleton implementation of 
SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics. (Varun Saxena via 
kasha) (kasha: rev e597249d361bbe8383fb9b564eacda7c990b781d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/CleanerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/SharedCacheUploaderMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/ClientProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/TestCleanerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/ClientSCMMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/metrics/CleanerMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheUploaderService.java


> Potential race condition in Singleton implementation of 
> SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics
> 
>
> Key: YARN-2914
> URL: https://issues.apache.org/jira/browse/YARN-2914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ted Yu
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2914.002.patch, YARN-2914.patch
>
>
> {code}
>   public static ClientSCMMetrics getInstance() {
> ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
> if (topMetrics == null) {
>   throw new IllegalStateException(
> {code}
> getInstance() doesn't hold lock on Singleton.this
> This may result in IllegalStateException being thrown prematurely.
> [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
> race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247244#comment-14247244
 ] 

Karthik Kambatla commented on YARN-2944:


IIUC, we want to make sure that every implementation of SCMStore has a no-arg 
constructor. Is it possible to add a parametrized test-case for this? 

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247254#comment-14247254
 ] 

Karthik Kambatla commented on YARN-2203:


Looks generally good. We still need to address security; [~ctrezzo] - mind 
adding a qualified TODO to address this as part of the rest of the security 
work? Also, do we want unit tests to ensure the added fields are all present in 
future versions? And, let us annotate the newly added classes @Private 

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247257#comment-14247257
 ] 

Karthik Kambatla commented on YARN-2919:


I think we want to add the "canceling" state to {{Token}} itself, and update 
the renew code to reject renew requests when "canceling" is true.

> Potential race between renew and cancel in DelegationTokenRenwer 
> -
>
> Key: YARN-2919
> URL: https://issues.apache.org/jira/browse/YARN-2919
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2919.20141209-1.patch
>
>
> YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a 
> race because of which a renewal in flight isn't interrupted by a cancel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247258#comment-14247258
 ] 

Hadoop QA commented on YARN-2762:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12686760/YARN-2762.7.patch
  against trunk revision e597249.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6114//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6114//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6114//console

This message is automatically generated.

> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2963) Helper library that allows requesting containers from multiple queues

2014-12-15 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2963:
--

 Summary: Helper library that allows requesting containers from 
multiple queues
 Key: YARN-2963
 URL: https://issues.apache.org/jira/browse/YARN-2963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


As proposed on the mailing list (yarn-dev), it would be nice to have a way for 
YARN applications to request containers from multiple queues. 

e.g. Oozie might want to run a single AM for all user jobs and request one 
container per launcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247271#comment-14247271
 ] 

Jian He commented on YARN-2762:
---

[~rohithsharma], could you check if the test failure is related?

> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247282#comment-14247282
 ] 

Karthik Kambatla commented on YARN-2738:


Do we see items like reservation-system, reservation-planner, 
reservation-policy be per-queue in the future? If not, I would think yarn-site 
is a better location for them. [~subru], [~curino] - what do you guys think? 

> Add FairReservationSystem for FairScheduler
> ---
>
> Key: YARN-2738
> URL: https://issues.apache.org/jira/browse/YARN-2738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
> YARN-2738.003.patch, YARN-2738.004.patch
>
>
> Need to create a FairReservationSystem that will implement ReservationSystem 
> for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-12-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247369#comment-14247369
 ] 

Carlo Curino commented on YARN-2738:


They could be. The idea is that different organization can have different 
portions of their capacity managed via reservations, and the 
policies/agents/etc.. could be radically different based on business needs. At 
the moment we only have one or two example of each, but we have ongoing work 
which might produce a broader set, and I can definitely see others wanted to 
experiment here. I would keep them on a per-queue basis. 

> Add FairReservationSystem for FairScheduler
> ---
>
> Key: YARN-2738
> URL: https://issues.apache.org/jira/browse/YARN-2738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
> YARN-2738.003.patch, YARN-2738.004.patch
>
>
> Need to create a FairReservationSystem that will implement ReservationSystem 
> for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-15 Thread Daryn Sharp (JIRA)
Daryn Sharp created YARN-2964:
-

 Summary: RM prematurely cancels tokens for jobs that submit jobs 
(oozie)
 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Priority: Critical


The RM used to globally track the unique set of tokens for all apps.  It 
remembered the first job that was submitted with the token.  The first job 
controlled the cancellation of the token.  This prevented completion of 
sub-jobs from canceling tokens used by the main job.

As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
notion of the first/main job.  This results in sub-jobs canceling tokens and 
failing the main job and other sub-jobs.  It also appears to schedule multiple 
redundant renewals.

The issue is not immediately obvious because the RM will cancel tokens ~10 min 
(NM livelyness interval) after log aggregation completes.  The result is an 
oozie job, ex. pig, that will launch many sub-jobs over time will fail if any 
sub-jobs are launched >10 min after any sub-job completes.  If all other 
sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247400#comment-14247400
 ] 

Daryn Sharp commented on YARN-2964:
---

[~vinodkv], can you take a look at this?

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247415#comment-14247415
 ] 

Jian He commented on YARN-2340:
---

Today, the semantics to stop a queue is to let the existing applications run 
into completion. We should retain the same semantics for RM restart as well. In 
this case, I think we need to ignore this exception and continue because the 
application was accepted before the queue is changed to stopped. Similar 
problem could happen if we change the application acl and restart RM while 
application is running. 

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2964:
---
Priority: Blocker  (was: Critical)

Thanks for reporting this, Daryn. Bumping it to a Blocker. 

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2745:

Component/s: nodemanager
Description: In this umbrella JIRA we propose an extension to existing 
scheduling techniques, which accounts for all resources used by a task (CPU, 
memory, disk, network) and it is able to achieve three competing objectives: 
fairness, improve cluster utilization and reduces average job completion time.  
(was: In this umbrella JIRA we propose a new pluggable scheduler, which 
accounts for all resources used by a task (CPU, memory, disk, network) and it 
is able to achieve three competing objectives: fairness, improve cluster 
utilization and reduces average job completion time.)
Summary: Extend YARN to support multi-resource packing of tasks  (was: 
YARN new pluggable scheduler which does multi-resource packing)

Summary of main changes
* Update the container allocation logic in the RM scheduler. This change is the 
core. It enables “packing” tasks, preferring jobs with less remaining work and 
trades off fairness for efficiency. 

* Expand the AM->RM resource ask to expose tasks’ disk and network resource 
demands to the scheduler. 

*Support for cluster-wide resource tracking: want per-machine resource usage 
information available at the RM.  


> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)
Robert Grandl created YARN-2965:
---

 Summary: Enhance Node Managers to monitor and report the resource 
usage on the machines
 Key: YARN-2965
 URL: https://issues.apache.org/jira/browse/YARN-2965
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Robert Grandl


This JIRA is about augmenting Node Managers to monitor the resource usage on 
the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2965:

Attachment: ddoc_RT.docx

Attached proposed design document. 

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2965:

Attachment: (was: ddoc_RT.docx)

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2965:

Attachment: ddoc_RT.pdf

Proposed design document attached. 

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.pdf
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247444#comment-14247444
 ] 

Vinod Kumar Vavilapalli commented on YARN-2964:
---

I checked the code, doubt if there is a bug.

bq. The first job controlled the cancellation of the token.
Correct.

bq. This prevented completion of sub-jobs from canceling tokens used by the 
main job.
Only, partially true. More common case to avoid was the completion of the 
launcher job itself canceling tokens to be used by the sub-jobs.

bq. As of YARN-2704, the RM now tracks tokens on a per-app basis. There is no 
notion of the first/main job. This results in sub-jobs canceling tokens and 
failing the main job and other sub-jobs.
AFAIR, this code never had the concept of a first job. An app submits tokens, 
there was a flat list of tokens, everytime an app finishes, RM will check if 
the CancelTokensWhenComplete flag is set, and ignore the cancelation of this 
app if the flag is set. The token gets expired after 7 days. This continues to 
be the case even after YARN-2704.

bq. It also appears to schedule multiple redundant renewals.
Specific references?

bq. If all other sub-jobs complete within that 10 min window, then the issue 
goes unnoticed.
I doubt if this issue happens at all. Are you seeing it on a cluster or is it a 
theory? IAC, [~jianhe], we can write a test-case which proves or disproves 
this? 


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Robert Grandl (JIRA)
Robert Grandl created YARN-2966:
---

 Summary: Extend ask request to include additional fields
 Key: YARN-2966
 URL: https://issues.apache.org/jira/browse/YARN-2966
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Grandl


This JIRA is about extending the ask request from AM to RM to include 
additional information that describe tasks' resource requirements other than 
cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2966:

Attachment: ddoc_expanded_ask.docx

> Extend ask request to include additional fields
> ---
>
> Key: YARN-2966
> URL: https://issues.apache.org/jira/browse/YARN-2966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: ddoc_expanded_ask.docx
>
>
> This JIRA is about extending the ask request from AM to RM to include 
> additional information that describe tasks' resource requirements other than 
> cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2965:

Attachment: ddoc_RT.docx

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2965:

Attachment: (was: ddoc_RT.pdf)

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2967) New task matching logic at the RM

2014-12-15 Thread Robert Grandl (JIRA)
Robert Grandl created YARN-2967:
---

 Summary: New task matching logic at the RM 
 Key: YARN-2967
 URL: https://issues.apache.org/jira/browse/YARN-2967
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Grandl


This sub-JIRA changes the matching logic at the RM. We expect different 
extensions to both the CS and the FS schedulers. These changes should work 
independent of the other changes. That is, with just CPU and memory in the 
asks, as is the case today, the matching logic should still work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2967) New task matching logic at the RM

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2967:

Attachment: ddoc_matching_logic.docx

new matching logic design document attached

> New task matching logic at the RM 
> --
>
> Key: YARN-2967
> URL: https://issues.apache.org/jira/browse/YARN-2967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: ddoc_matching_logic.docx
>
>
> This sub-JIRA changes the matching logic at the RM. We expect different 
> extensions to both the CS and the FS schedulers. These changes should work 
> independent of the other changes. That is, with just CPU and memory in the 
> asks, as is the case today, the matching logic should still work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2745:

Attachment: tetris_design_doc.docx

added design document

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247469#comment-14247469
 ] 

Karthik Kambatla commented on YARN-2965:


Super useful. The design makes sense. Relaying the utilization information 
through NM-RM heartbeat should be okay. 

bq. potential timing overhead in waiting for RT Client to provide a ResourceObj 
on every heartbeat invocation, which might be expensive if the RT Client 
implementation is slow
The NM heartbeat should just send the latest update it has from the RTClient 
without waiting. 


> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247474#comment-14247474
 ] 

Karthik Kambatla commented on YARN-2965:


I am assuming there will be a config to turn on/off resource-tracking; for now, 
we need not expose this config to the end-user. In addition to this config, I 
think it might make sense to add a config to turn reporting to the RM on/off; 
people might just want to log this information through metrics. 

Slightly orthogonal - we were thinking of tracking the resource usage per 
container as well. Do you think this work can be extended to collect but not 
report that information as well? 

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2965:

Attachment: (was: ddoc_RT.docx)

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2967) New task matching logic at the RM

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2967:

Attachment: (was: ddoc_matching_logic.docx)

> New task matching logic at the RM 
> --
>
> Key: YARN-2967
> URL: https://issues.apache.org/jira/browse/YARN-2967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>
> This sub-JIRA changes the matching logic at the RM. We expect different 
> extensions to both the CS and the FS schedulers. These changes should work 
> independent of the other changes. That is, with just CPU and memory in the 
> asks, as is the case today, the matching logic should still work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2966:

Attachment: (was: ddoc_expanded_ask.docx)

> Extend ask request to include additional fields
> ---
>
> Key: YARN-2966
> URL: https://issues.apache.org/jira/browse/YARN-2966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>
> This JIRA is about extending the ask request from AM to RM to include 
> additional information that describe tasks' resource requirements other than 
> cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2745:

Attachment: (was: tetris_design_doc.docx)

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2965:

Attachment: ddoc_RT.docx

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247490#comment-14247490
 ] 

Hadoop QA commented on YARN-2920:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12686715/YARN-2920.6.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 27 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6115//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6115//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6115//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6115//console

This message is automatically generated.

> CapacityScheduler should be notified when labels on nodes changed
> -
>
> Key: YARN-2920
> URL: https://issues.apache.org/jira/browse/YARN-2920
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2920.1.patch, YARN-2920.2.patch, YARN-2920.3.patch, 
> YARN-2920.4.patch, YARN-2920.5.patch, YARN-2920.6.patch
>
>
> Currently, labels on nodes changes will only be handled by 
> RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
> - Scheduler should be able to do take actions to running containers. (Like 
> kill/preempt/do-nothing)
> - Used / available capacity in scheduler should be updated for future 
> planning.
> We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2745:

Attachment: tetris_design_doc.docx

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2966:

Attachment: ddoc_expanded_ask.docx

> Extend ask request to include additional fields
> ---
>
> Key: YARN-2966
> URL: https://issues.apache.org/jira/browse/YARN-2966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: ddoc_expanded_ask.docx
>
>
> This JIRA is about extending the ask request from AM to RM to include 
> additional information that describe tasks' resource requirements other than 
> cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2967) New task matching logic at the RM

2014-12-15 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2967:

Attachment: ddoc_matching_logic.docx

> New task matching logic at the RM 
> --
>
> Key: YARN-2967
> URL: https://issues.apache.org/jira/browse/YARN-2967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: ddoc_matching_logic.docx
>
>
> This sub-JIRA changes the matching logic at the RM. We expect different 
> extensions to both the CS and the FS schedulers. These changes should work 
> independent of the other changes. That is, with just CPU and memory in the 
> asks, as is the case today, the matching logic should still work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2014-12-15 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247515#comment-14247515
 ] 

Gera Shegalov commented on YARN-2745:
-

Thanks for filing this JIRA, [~rgrandl]! We have a number of use cases where we 
need to schedule by NW bandwidth instead of memory/cores.

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time

2014-12-15 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868.002.patch

- Removed protected keyword from QueueMetrics
- Add metric to specific queue as well

> Add metric for initial container launch time
> 
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2952) Incorrect version check in RMStateStore

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247573#comment-14247573
 ] 

Hadoop QA commented on YARN-2952:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687061/0001-YARN-2952.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 45 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6116//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6116//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6116//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6116//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6116//console

This message is automatically generated.

> Incorrect version check in RMStateStore
> ---
>
> Key: YARN-2952
> URL: https://issues.apache.org/jira/browse/YARN-2952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Rohith
> Attachments: 0001-YARN-2952.patch
>
>
> In RMStateStore#checkVersion:  if we modify  tCURRENT_VERSION_INFO to 2.0, 
> it'll still store the version as 1.0 which is incorrect; The same thing might 
> happen to NM store, timeline store.
> {code}
> // if there is no version info, treat it as 1.0;
> if (loadedVersion == null) {
>   loadedVersion = Version.newInstance(1, 0);
> }
> if (loadedVersion.isCompatibleTo(getCurrentVersion())) {
>   LOG.info("Storing RM state version info " + getCurrentVersion());
>   storeVersion();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2952) Incorrect version check in RMStateStore

2014-12-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247581#comment-14247581
 ] 

Jian He commented on YARN-2952:
---

looks good, +1

> Incorrect version check in RMStateStore
> ---
>
> Key: YARN-2952
> URL: https://issues.apache.org/jira/browse/YARN-2952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Rohith
> Attachments: 0001-YARN-2952.patch
>
>
> In RMStateStore#checkVersion:  if we modify  tCURRENT_VERSION_INFO to 2.0, 
> it'll still store the version as 1.0 which is incorrect; The same thing might 
> happen to NM store, timeline store.
> {code}
> // if there is no version info, treat it as 1.0;
> if (loadedVersion == null) {
>   loadedVersion = Version.newInstance(1, 0);
> }
> if (loadedVersion.isCompatibleTo(getCurrentVersion())) {
>   LOG.info("Storing RM state version info " + getCurrentVersion());
>   storeVersion();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time

2014-12-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247620#comment-14247620
 ] 

Hadoop QA commented on YARN-2868:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687360/YARN-2868.002.patch
  against trunk revision a095622.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6117//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6117//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6117//console

This message is automatically generated.

> Add metric for initial container launch time
> 
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247638#comment-14247638
 ] 

Rohith commented on YARN-2762:
--

I have checked test failures. These are not introduced by this patch.These 
tests fails randomly.

> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore<->ZKRMStateStore

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247672#comment-14247672
 ] 

Rohith commented on YARN-2946:
--

Thanks [~jianhe] and [~varun_saxena] for your suggestions.

[~jianhe] , I am trying to understand before implementing state machine for DT 
keys updates on store,  is there any specific reason why state machine was not 
implemented? Does state machine for updating DT keys cause any potential issues?

> DeadLocks in RMStateStore<->ZKRMStateStore
> --
>
> Key: YARN-2946
> URL: https://issues.apache.org/jira/browse/YARN-2946
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, 
> RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
> TestYARN2946.java
>
>
> Found one deadlock in ZKRMStateStore.
> # Initial stage zkClient is null because of zk disconnected event.
> # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
> re establish zookeeper connection either via synconnected or expired event, 
> it is highly possible that any other thred can obtain lock on 
> {{ZKRMStateStore.this}} from state machine transition events. This cause 
> Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247678#comment-14247678
 ] 

Rohith commented on YARN-2762:
--

YARN-2783 YARN-2710 are the corresponding issues for test failure.

> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2968) yarn shell script has unresolved merge conflicts in checkin

2014-12-15 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created YARN-2968:
--

 Summary: yarn shell script has unresolved merge conflicts in 
checkin
 Key: YARN-2968
 URL: https://issues.apache.org/jira/browse/YARN-2968
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prakash Ramachandran


in the branch-2 yarn command has unresolved merge conflicts
https://github.com/apache/hadoop/blob/branch-2/hadoop-yarn-project/hadoop-yarn/bin/yarn

commit - aadd0c392be66080f 

see line 203, 106



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-12-15 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247798#comment-14247798
 ] 

Abin Shahab commented on YARN-1964:
---

Hi [~airbots], Did this fix your issue?
Abin

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Fix For: 2.6.0
>
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2968) yarn shell script has unresolved merge conflicts in checkin

2014-12-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2968:
--

Assignee: Varun Saxena

> yarn shell script has unresolved merge conflicts in checkin
> ---
>
> Key: YARN-2968
> URL: https://issues.apache.org/jira/browse/YARN-2968
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Saxena
>
> in the branch-2 yarn command has unresolved merge conflicts
> https://github.com/apache/hadoop/blob/branch-2/hadoop-yarn-project/hadoop-yarn/bin/yarn
> commit - aadd0c392be66080f 
> see line 203, 106



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2966:
--

Assignee: Varun Saxena

> Extend ask request to include additional fields
> ---
>
> Key: YARN-2966
> URL: https://issues.apache.org/jira/browse/YARN-2966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Varun Saxena
> Attachments: ddoc_expanded_ask.docx
>
>
> This JIRA is about extending the ask request from AM to RM to include 
> additional information that describe tasks' resource requirements other than 
> cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore<->ZKRMStateStore

2014-12-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247828#comment-14247828
 ] 

Jian He commented on YARN-2946:
---

it was not there because we need to do the update tokens/keys synchronously. I 
think with state-machine, we can achieve the same result by calling each 
transition synchronously.

> DeadLocks in RMStateStore<->ZKRMStateStore
> --
>
> Key: YARN-2946
> URL: https://issues.apache.org/jira/browse/YARN-2946
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, 
> RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
> TestYARN2946.java
>
>
> Found one deadlock in ZKRMStateStore.
> # Initial stage zkClient is null because of zk disconnected event.
> # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
> re establish zookeeper connection either via synconnected or expired event, 
> it is highly possible that any other thred can obtain lock on 
> {{ZKRMStateStore.this}} from state machine transition events. This cause 
> Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLocks in RMStateStore<->ZKRMStateStore

2014-12-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247839#comment-14247839
 ] 

Rohith commented on YARN-2946:
--

Thanks [~jianhe] for your confirmation. I will go ahead with implementations

> DeadLocks in RMStateStore<->ZKRMStateStore
> --
>
> Key: YARN-2946
> URL: https://issues.apache.org/jira/browse/YARN-2946
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, 
> RM_BeforeFix_Deadlock_cycle_1.png, RM_BeforeFix_Deadlock_cycle_2.png, 
> TestYARN2946.java
>
>
> Found one deadlock in ZKRMStateStore.
> # Initial stage zkClient is null because of zk disconnected event.
> # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
> re establish zookeeper connection either via synconnected or expired event, 
> it is highly possible that any other thred can obtain lock on 
> {{ZKRMStateStore.this}} from state machine transition events. This cause 
> Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247844#comment-14247844
 ] 

Karthik Kambatla commented on YARN-2966:


[~varun_saxena] - I believe [~rgrandl] already has an implementation for all 
three sub-tasks here, as evident from the Tetris paper they published recently 
and the design doc here. 

> Extend ask request to include additional fields
> ---
>
> Key: YARN-2966
> URL: https://issues.apache.org/jira/browse/YARN-2966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Varun Saxena
> Attachments: ddoc_expanded_ask.docx
>
>
> This JIRA is about extending the ask request from AM to RM to include 
> additional information that describe tasks' resource requirements other than 
> cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247851#comment-14247851
 ] 

Peng Zhang commented on YARN-2965:
--

{quote}
we were thinking of tracking the resource usage per container as well
{quote}
Any issue to track this now? I think this is very useful for audit and 
accounting.

And I also wonder whether monitor in this issue should distinguish YARN 
services and other services(such as HDFS, Storm that not running on YARN)?
IMHO, if machine's resource is isolated between YARN and non-YARN services by 
cgroups (described in Cloudera Manager docs as static resource pool),  monitor 
here should only track each  container's resource and then aggregate them to RT 
Master.

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2968) yarn shell script has unresolved merge conflicts in checkin

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2968:
---
Priority: Blocker  (was: Major)

> yarn shell script has unresolved merge conflicts in checkin
> ---
>
> Key: YARN-2968
> URL: https://issues.apache.org/jira/browse/YARN-2968
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Saxena
>Priority: Blocker
>
> in the branch-2 yarn command has unresolved merge conflicts
> https://github.com/apache/hadoop/blob/branch-2/hadoop-yarn-project/hadoop-yarn/bin/yarn
> commit - aadd0c392be66080f 
> see line 203, 106



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2968) yarn shell script has unresolved merge conflicts in checkin

2014-12-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247854#comment-14247854
 ] 

Karthik Kambatla commented on YARN-2968:


Pretty bad on my part. Resolved another conflict at commit time, but looks like 
I missed this one. Taking care of it with an addendum patch to the same JIRA. 

> yarn shell script has unresolved merge conflicts in checkin
> ---
>
> Key: YARN-2968
> URL: https://issues.apache.org/jira/browse/YARN-2968
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Saxena
>Priority: Blocker
>
> in the branch-2 yarn command has unresolved merge conflicts
> https://github.com/apache/hadoop/blob/branch-2/hadoop-yarn-project/hadoop-yarn/bin/yarn
> commit - aadd0c392be66080f 
> see line 203, 106



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on the machines

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-2965:
--

Assignee: Robert Grandl

> Enhance Node Managers to monitor and report the resource usage on the machines
> --
>
> Key: YARN-2965
> URL: https://issues.apache.org/jira/browse/YARN-2965
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_RT.docx
>
>
> This JIRA is about augmenting Node Managers to monitor the resource usage on 
> the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2967) New task matching logic at the RM

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-2967:
--

Assignee: Robert Grandl

> New task matching logic at the RM 
> --
>
> Key: YARN-2967
> URL: https://issues.apache.org/jira/browse/YARN-2967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_matching_logic.docx
>
>
> This sub-JIRA changes the matching logic at the RM. We expect different 
> extensions to both the CS and the FS schedulers. These changes should work 
> independent of the other changes. That is, with just CPU and memory in the 
> asks, as is the case today, the matching logic should still work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-2745:
--

Assignee: Robert Grandl

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2189) Admin service for cache manager

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reopened YARN-2189:


As reported on YARN-2968, looks like I messed up the cherry-pick to branch-2. 
Reopening to fix that. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, 
> YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, 
> YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2189) Admin service for cache manager

2014-12-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2189:
---
Attachment: yarn-2189-branch2.addendum-1.patch

Addendum patch to branch-2 to fix bin/yarn script. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, 
> YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, 
> YARN-2189-trunk-v6.patch, YARN-2189-trunk-v7.patch, 
> yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2966:
--

Assignee: Robert Grandl  (was: Varun Saxena)

> Extend ask request to include additional fields
> ---
>
> Key: YARN-2966
> URL: https://issues.apache.org/jira/browse/YARN-2966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_expanded_ask.docx
>
>
> This JIRA is about extending the ask request from AM to RM to include 
> additional information that describe tasks' resource requirements other than 
> cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2966) Extend ask request to include additional fields

2014-12-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247877#comment-14247877
 ] 

Varun Saxena commented on YARN-2966:


[~kasha], sorry for that. Sounded interesting and it was unassigned from some 
time, so thought can work on some part of it. 
Should have asked. Have reassigned it to [~rgrandl] 

> Extend ask request to include additional fields
> ---
>
> Key: YARN-2966
> URL: https://issues.apache.org/jira/browse/YARN-2966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: ddoc_expanded_ask.docx
>
>
> This JIRA is about extending the ask request from AM to RM to include 
> additional information that describe tasks' resource requirements other than 
> cpu and memory.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >