[jira] [Commented] (YARN-370) CapacityScheduler app submission fails when min alloc size not multiple of AM size
[ https://issues.apache.org/jira/browse/YARN-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572318#comment-13572318 ] Hudson commented on YARN-370: - Integrated in Hadoop-Yarn-trunk #119 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/119/]) YARN-370. Fix SchedulerUtils to correctly round up the resource for containers. Contributed by Zhijie Shen. (Revision 1442840) Result = FAILURE acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1442840 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java CapacityScheduler app submission fails when min alloc size not multiple of AM size -- Key: YARN-370 URL: https://issues.apache.org/jira/browse/YARN-370 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-370-branch-2_1.patch, YARN-370-branch-2.patch I was running 2.0.3-SNAPSHOT with the capacity scheduler configured with minimum allocation size 1G. The AM size was set to 1.5G. I didn't specify resource calculator so it was using DefaultResourceCalculator. The am launch failed with the error below: Application application_1359688216672_0001 failed 1 times due to Error launching appattempt_1359688216672_0001_01. Got exception: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unauthorized request to start container. Expected resource memory:2048, vCores:1 but found memory:1536, vCores:1 at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39) at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:383) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:400) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:68) at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:123) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:111) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:255) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) . Failing the application. It looks like the launchcontext for the app didn't have the resources rounded
[jira] [Commented] (YARN-370) CapacityScheduler app submission fails when min alloc size not multiple of AM size
[ https://issues.apache.org/jira/browse/YARN-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572389#comment-13572389 ] Hudson commented on YARN-370: - Integrated in Hadoop-Hdfs-trunk #1308 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1308/]) YARN-370. Fix SchedulerUtils to correctly round up the resource for containers. Contributed by Zhijie Shen. (Revision 1442840) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1442840 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java CapacityScheduler app submission fails when min alloc size not multiple of AM size -- Key: YARN-370 URL: https://issues.apache.org/jira/browse/YARN-370 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-370-branch-2_1.patch, YARN-370-branch-2.patch I was running 2.0.3-SNAPSHOT with the capacity scheduler configured with minimum allocation size 1G. The AM size was set to 1.5G. I didn't specify resource calculator so it was using DefaultResourceCalculator. The am launch failed with the error below: Application application_1359688216672_0001 failed 1 times due to Error launching appattempt_1359688216672_0001_01. Got exception: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unauthorized request to start container. Expected resource memory:2048, vCores:1 but found memory:1536, vCores:1 at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39) at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:383) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:400) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:68) at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:123) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:111) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:255) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) . Failing the application. It looks like the launchcontext for the app didn't have the resources rounded
[jira] [Commented] (YARN-370) CapacityScheduler app submission fails when min alloc size not multiple of AM size
[ https://issues.apache.org/jira/browse/YARN-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572421#comment-13572421 ] Hudson commented on YARN-370: - Integrated in Hadoop-Mapreduce-trunk #1336 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1336/]) YARN-370. Fix SchedulerUtils to correctly round up the resource for containers. Contributed by Zhijie Shen. (Revision 1442840) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1442840 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java CapacityScheduler app submission fails when min alloc size not multiple of AM size -- Key: YARN-370 URL: https://issues.apache.org/jira/browse/YARN-370 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-370-branch-2_1.patch, YARN-370-branch-2.patch I was running 2.0.3-SNAPSHOT with the capacity scheduler configured with minimum allocation size 1G. The AM size was set to 1.5G. I didn't specify resource calculator so it was using DefaultResourceCalculator. The am launch failed with the error below: Application application_1359688216672_0001 failed 1 times due to Error launching appattempt_1359688216672_0001_01. Got exception: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unauthorized request to start container. Expected resource memory:2048, vCores:1 but found memory:1536, vCores:1 at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39) at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:383) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:400) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:68) at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:123) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:111) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:255) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) . Failing the application. It looks like the launchcontext for the app didn't have the resources
[jira] [Updated] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-382: --- Description: In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. was: In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because before it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. SchedulerUtils improve way normalizeRequest sets the resource capabilities -- Key: YARN-382 URL: https://issues.apache.org/jira/browse/YARN-382 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Thomas Graves In YARN-370, we changed it from setting the capability to directly setting memory and cores: -ask.setCapability(normalized); +ask.getCapability().setMemory(normalized.getMemory()); +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-3) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572493#comment-13572493 ] Hudson commented on YARN-3: --- Integrated in Hadoop-trunk-Commit #3329 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3329/]) YARN-3. Merged to branch-2. (Revision 1443011) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1443011 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Add support for CPU isolation/monitoring of containers -- Key: YARN-3 URL: https://issues.apache.org/jira/browse/YARN-3 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: mapreduce-4334-design-doc.txt, mapreduce-4334-design-doc-v2.txt, MAPREDUCE-4334-executor-v1.patch, MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, YARN-3-lce_only-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-357) App submission should not be synchronized
[ https://issues.apache.org/jira/browse/YARN-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572545#comment-13572545 ] Hadoop QA commented on YARN-357: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12568245/YARN-357.branch-23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/384//console This message is automatically generated. App submission should not be synchronized - Key: YARN-357 URL: https://issues.apache.org/jira/browse/YARN-357 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: YARN-357.branch-23.patch, YARN-357.patch, YARN-357.patch, YARN-357.txt MAPREDUCE-2953 fixed a race condition with querying of app status by making {{RMClientService#submitApplication}} synchronously invoke {{RMAppManager#submitApplication}}. However, the {{synchronized}} keyword was also added to {{RMAppManager#submitApplication}} with the comment: bq. I made the submitApplication synchronized to keep it consistent with the other routines in RMAppManager although I do not believe it needs it since the rmapp datastructure is already a concurrentMap and I don't see anything else that would be an issue. It's been observed that app submission latency is being unnecessarily impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-355) RM app submission jams under load
[ https://issues.apache.org/jira/browse/YARN-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated YARN-355: - Attachment: YARN-355.patch YARN-355.branch-23.patch Thanks Sid. Updated patches to set TSM service after client has started. RM app submission jams under load - Key: YARN-355 URL: https://issues.apache.org/jira/browse/YARN-355 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha, 0.23.6 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-355.branch-23.patch, YARN-355.branch-23.patch, YARN-355.branch-23.patch, YARN-355.patch, YARN-355.patch, YARN-355.patch The RM performs a loopback connection to itself to renew its own tokens. If app submissions consume all RPC handlers for {{ClientRMProtocol}}, then app submissions block because it cannot loopback to itself to do the renewal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-355) RM app submission jams under load
[ https://issues.apache.org/jira/browse/YARN-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-355: --- Target Version/s: 2.0.3-alpha, 0.23.7 (was: 2.0.3-alpha, 0.23.6) RM app submission jams under load - Key: YARN-355 URL: https://issues.apache.org/jira/browse/YARN-355 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha, 0.23.6 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-355.branch-23.patch, YARN-355.branch-23.patch, YARN-355.branch-23.patch, YARN-355.patch, YARN-355.patch, YARN-355.patch The RM performs a loopback connection to itself to renew its own tokens. If app submissions consume all RPC handlers for {{ClientRMProtocol}}, then app submissions block because it cannot loopback to itself to do the renewal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-383) AMRMClientImpl should handle null rmClient in stop()
[ https://issues.apache.org/jira/browse/YARN-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-383: - Attachment: YARN-383.1.patch Trivial patch. AMRMClientImpl should handle null rmClient in stop() Key: YARN-383 URL: https://issues.apache.org/jira/browse/YARN-383 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: YARN-383.1.patch 2013-02-06 09:31:33,813 INFO [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605) at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.yarn.app.ampool.AMPoolAppMaster.stop(AMPoolAppMaster.java:171) at org.apache.hadoop.yarn.app.ampool.AMPoolAppMaster$AMPoolAppMasterShutdownHook.run(AMPoolAppMaster.java:196) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-40) Provide support for missing yarn commands
[ https://issues.apache.org/jira/browse/YARN-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-40: -- Fix Version/s: 0.23.7 Provide support for missing yarn commands - Key: YARN-40 URL: https://issues.apache.org/jira/browse/YARN-40 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.0-alpha Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.0.3-alpha, 0.23.7 Attachments: MAPREDUCE-4155-1.patch, MAPREDUCE-4155.patch, YARN-40-1.patch, YARN-40-20120917.1.txt, YARN-40-20120917.txt, YARN-40-20120924.txt, YARN-40-20121008.txt, YARN-40.patch 1. status app-id 2. kill app-id (Already issue present with Id : MAPREDUCE-3793) 3. list-apps [all] 4. nodes-report -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-249) Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)
[ https://issues.apache.org/jira/browse/YARN-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-249: -- Attachment: YARN-249.branch-0.23.patch Updated patch for branch-0.23 Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x) --- Key: YARN-249 URL: https://issues.apache.org/jira/browse/YARN-249 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.0.2-alpha, 3.0.0, 0.23.5 Reporter: Ravi Prakash Assignee: Ravi Prakash Labels: scheduler, web-ui Attachments: YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.png On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-249) Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)
[ https://issues.apache.org/jira/browse/YARN-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-249: -- Attachment: YARN-249.patch Updated patch for trunk Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x) --- Key: YARN-249 URL: https://issues.apache.org/jira/browse/YARN-249 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.0.2-alpha, 3.0.0, 0.23.5 Reporter: Ravi Prakash Assignee: Ravi Prakash Labels: scheduler, web-ui Attachments: YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.png On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-355) RM app submission jams under load
[ https://issues.apache.org/jira/browse/YARN-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572695#comment-13572695 ] Hudson commented on YARN-355: - Integrated in Hadoop-trunk-Commit # (See [https://builds.apache.org/job/Hadoop-trunk-Commit//]) YARN-355. Fixes a bug where RM app submission could jam under load. Contributed by Daryn Sharp. (Revision 1443131) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1443131 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/security/RMDelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/resources * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java RM app submission jams under load - Key: YARN-355 URL: https://issues.apache.org/jira/browse/YARN-355 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha, 0.23.6 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Fix For: 2.0.3-alpha, 0.23.7 Attachments: YARN-355.branch-23.patch, YARN-355.branch-23.patch, YARN-355.branch-23.patch, YARN-355.patch, YARN-355.patch, YARN-355.patch The RM performs a loopback connection to itself to renew its own tokens. If app submissions consume all RPC handlers for {{ClientRMProtocol}}, then app submissions block because it cannot loopback to itself to do the renewal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-150) AppRejectedTransition does not unregister app from master service and scheduler
[ https://issues.apache.org/jira/browse/YARN-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-150: --- Fix Version/s: 0.23.7 AppRejectedTransition does not unregister app from master service and scheduler --- Key: YARN-150 URL: https://issues.apache.org/jira/browse/YARN-150 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 2.0.3-alpha, 0.23.7 Attachments: MAPREDUCE-4436.1.patch AttemptStartedTransition() adds the app to the ApplicationMasterService and scheduler. when the scheduler rejects the app then AppRejectedTransition() forgets to unregister it from the ApplicationMasterService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-249) Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)
[ https://issues.apache.org/jira/browse/YARN-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-249: -- Attachment: YARN-249.branch-0.23.patch Updated docs to include active and pending applications Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x) --- Key: YARN-249 URL: https://issues.apache.org/jira/browse/YARN-249 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.0.2-alpha, 3.0.0, 0.23.5 Reporter: Ravi Prakash Assignee: Ravi Prakash Labels: scheduler, web-ui Attachments: YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.png On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-249) Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)
[ https://issues.apache.org/jira/browse/YARN-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572709#comment-13572709 ] Ravi Prakash commented on YARN-249: --- Thanks a lot for your review Tom! I've incorporated all your suggestions in these updated patches for branch-0.23 and trunk. I chose to put the % in a span element, so that when you mouse over it, it shows what that %ge is based on. Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x) --- Key: YARN-249 URL: https://issues.apache.org/jira/browse/YARN-249 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.0.2-alpha, 3.0.0, 0.23.5 Reporter: Ravi Prakash Assignee: Ravi Prakash Labels: scheduler, web-ui Attachments: YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.png On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-249) Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)
[ https://issues.apache.org/jira/browse/YARN-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572742#comment-13572742 ] Hadoop QA commented on YARN-249: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12568277/YARN-249.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/388//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/388//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/388//console This message is automatically generated. Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x) --- Key: YARN-249 URL: https://issues.apache.org/jira/browse/YARN-249 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.0.2-alpha, 3.0.0, 0.23.5 Reporter: Ravi Prakash Assignee: Ravi Prakash Labels: scheduler, web-ui Attachments: YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.branch-0.23.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.patch, YARN-249.png On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-383) AMRMClientImpl should handle null rmClient in stop()
[ https://issues.apache.org/jira/browse/YARN-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572892#comment-13572892 ] Hadoop QA commented on YARN-383: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12568299/YARN-383.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/389//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/389//console This message is automatically generated. AMRMClientImpl should handle null rmClient in stop() Key: YARN-383 URL: https://issues.apache.org/jira/browse/YARN-383 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: YARN-383.1.patch, YARN-383.2.patch, YARN-383.3.patch 2013-02-06 09:31:33,813 INFO [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605) at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-385) ResourceRequestPBImpl's toString() is missing location and # containers
Sandy Ryza created YARN-385: --- Summary: ResourceRequestPBImpl's toString() is missing location and # containers Key: YARN-385 URL: https://issues.apache.org/jira/browse/YARN-385 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza ResourceRequestPBImpl's toString method includes priority and resource capability, but omits location and number of containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-359) NodeManager container-related tests fail on branch-trunk-win
[ https://issues.apache.org/jira/browse/YARN-359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-359. -- Resolution: Fixed I just committed this to branch-trunk-win. Thanks Chris! NodeManager container-related tests fail on branch-trunk-win Key: YARN-359 URL: https://issues.apache.org/jira/browse/YARN-359 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-359-branch-trunk-win.1.patch, YARN-359-branch-trunk-win.2.patch On branch-trunk-win, there are test failures in {{TestContainerManager}}, {{TestNodeManagerShutdown}}, {{TestContainerLaunch}}, and {{TestContainersMonitor}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-359) NodeManager container-related tests fail on branch-trunk-win
[ https://issues.apache.org/jira/browse/YARN-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573116#comment-13573116 ] Bikas Saha commented on YARN-359: - The main reason for moving these to Shell was to reduce the number of places where OS specific forks happen in code and limit all such behavior to the Shell object that mainly performs OS dependent tasks that cannot be done in Java. NodeManager container-related tests fail on branch-trunk-win Key: YARN-359 URL: https://issues.apache.org/jira/browse/YARN-359 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-359-branch-trunk-win.1.patch, YARN-359-branch-trunk-win.2.patch On branch-trunk-win, there are test failures in {{TestContainerManager}}, {{TestNodeManagerShutdown}}, {{TestContainerLaunch}}, and {{TestContainersMonitor}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nemon lou updated YARN-20: -- Attachment: YARN-20.patch Adding annotation just as Harsh J said.Sorry for comming back so late.No test case is added since it's only a trivial document change. More information for yarn.resourcemanager.webapp.address in yarn-default.xml -- Key: YARN-20 URL: https://issues.apache.org/jira/browse/YARN-20 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: nemon lou Priority: Trivial Attachments: YARN-20.patch Original Estimate: 1h Remaining Estimate: 1h The parameter yarn.resourcemanager.webapp.address in yarn-default.xml is in host:port format,which is noted in the cluster set up guide (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html). When i read though the code,i find host format is also supported. In host format,the port will be random. So we may add more documentation in yarn-default.xml for easy understood. I will submit a patch if it's helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-111) Application level priority in Resource Manager Schedulers
[ https://issues.apache.org/jira/browse/YARN-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573142#comment-13573142 ] nemon lou commented on YARN-111: Finally i use two Queues in Capacity Scheduler to basically meet our needs. Both queue has a Absolute Max Capacity of 100% .The queue with higher priority has more Absolute Capacity configured(85%). Job which need high priority will be submitted to the queue which has more Absolute Capacity configured. Application level priority in Resource Manager Schedulers - Key: YARN-111 URL: https://issues.apache.org/jira/browse/YARN-111 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.1-alpha Reporter: nemon lou We need application level priority for Hadoop 2.0,both in FIFO scheduler and Capacity Scheduler. In Hadoop 1.0.x,job priority is supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-362) Unexpected extra results when using the task attempt table search
[ https://issues.apache.org/jira/browse/YARN-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-362: -- Attachment: YARN-362.branch-0.23.patch Thanks for the review Jason! I hadn't realized that I hadn't jsonified the attempts table. I'm doing so in this patch. I've also fixed the pollution of search results, along with some minor code improvements. Unexpected extra results when using the task attempt table search - Key: YARN-362 URL: https://issues.apache.org/jira/browse/YARN-362 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Ravi Prakash Priority: Minor Attachments: MAPREDUCE-4960.patch, YARN-362.branch-0.23.patch, YARN-362.patch When using the search box on the web UI to search for a specific task number (e.g.: 0831), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results. It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-362) Unexpected extra results when using the task attempt table search
[ https://issues.apache.org/jira/browse/YARN-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-362: -- Attachment: YARN-362.patch The patch ported to trunk Unexpected extra results when using the task attempt table search - Key: YARN-362 URL: https://issues.apache.org/jira/browse/YARN-362 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Ravi Prakash Priority: Minor Attachments: MAPREDUCE-4960.patch, YARN-362.branch-0.23.patch, YARN-362.patch When using the search box on the web UI to search for a specific task number (e.g.: 0831), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results. It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-111) Application level priority in Resource Manager Schedulers
[ https://issues.apache.org/jira/browse/YARN-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573157#comment-13573157 ] Vinod Kumar Vavilapalli commented on YARN-111: -- So, can we close this as won't fix? Though it is a useful feature, it has many dangerous pitfalls as noted and clearly also has alternative means of achieving it. Application level priority in Resource Manager Schedulers - Key: YARN-111 URL: https://issues.apache.org/jira/browse/YARN-111 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.1-alpha Reporter: nemon lou We need application level priority for Hadoop 2.0,both in FIFO scheduler and Capacity Scheduler. In Hadoop 1.0.x,job priority is supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication
[ https://issues.apache.org/jira/browse/YARN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573165#comment-13573165 ] nemon lou commented on YARN-374: Thanks for the information. But why not have one more API like gracefullyKillApplication(or just change force kill's behavior). With this method,RM will ask AM to kill the app itself, a force kill will be triggered if AM haven't killed itself during some period. Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication -- Key: YARN-374 URL: https://issues.apache.org/jira/browse/YARN-374 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.0.1-alpha Reporter: nemon lou After i kill a app by typing bin/yarn rmadmin app -kill APP_ID, no job info is kept on JHS web page. However, when i kill a job by typing bin/mapred job -kill JOB_ID , i can see a killed job left on JHS. Some hive users are confused by that their jobs been killed but nothing left on JHS ,and killed app's info on RM web page is not enough.(They kill job by clientRMProtocol) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-359) NodeManager container-related tests fail on branch-trunk-win
[ https://issues.apache.org/jira/browse/YARN-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573166#comment-13573166 ] Vinod Kumar Vavilapalli commented on YARN-359: -- bq. The main reason for moving these to Shell was to reduce the number of places where OS specific forks happen in code and limit all such behavior to the Shell object that mainly performs OS dependent tasks that cannot be done in Java. Sure, I found only one usage and didn't see this arguments otherwise, so suggested moving it out. As I mentioned, if we already have other uses, we can promote it. NodeManager container-related tests fail on branch-trunk-win Key: YARN-359 URL: https://issues.apache.org/jira/browse/YARN-359 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-359-branch-trunk-win.1.patch, YARN-359-branch-trunk-win.2.patch On branch-trunk-win, there are test failures in {{TestContainerManager}}, {{TestNodeManagerShutdown}}, {{TestContainerLaunch}}, and {{TestContainersMonitor}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-362) Unexpected extra results when using the task attempt table search
[ https://issues.apache.org/jira/browse/YARN-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573168#comment-13573168 ] Hadoop QA commented on YARN-362: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12568368/YARN-362.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/391//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/391//console This message is automatically generated. Unexpected extra results when using the task attempt table search - Key: YARN-362 URL: https://issues.apache.org/jira/browse/YARN-362 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Ravi Prakash Priority: Minor Attachments: MAPREDUCE-4960.patch, YARN-362.branch-0.23.patch, YARN-362.patch When using the search box on the web UI to search for a specific task number (e.g.: 0831), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results. It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-236) RM should point tracking URL to RM web page when app fails to start
[ https://issues.apache.org/jira/browse/YARN-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573184#comment-13573184 ] Vinod Kumar Vavilapalli commented on YARN-236: -- I agree that the null check can hit before the app starts where redirecting is useful. But for crashing AMs: Shouldn't YARN-165 have already fixed the original tracking url to point to RM web age already in case of crashing AMs? I just checked the patch and seems so. RM should point tracking URL to RM web page when app fails to start --- Key: YARN-236 URL: https://issues.apache.org/jira/browse/YARN-236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.4 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-236.patch Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful The requested application exited before setting a tracking URL. Usually the diagnostic string on the RM app page has something useful, so we might as well point there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-359) NodeManager container-related tests fail on branch-trunk-win
[ https://issues.apache.org/jira/browse/YARN-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573183#comment-13573183 ] Chris Nauroth commented on YARN-359: Thanks for the commit. Sorry, Bikas. I had forgotten the earlier discussion on YARN-233 when we chose to place these methods in Shell, so I forgot to point this out to Vinod during his review of this patch. We don't currently have other uses for these methods. However, a potential argument for moving them back to Shell is that if a need arises, then developers are far more likely to look in Shell for a utility method than to remember to promote something out of the nodemanager codebase. I'd be happy to do more refactoring if you want to discuss further. NodeManager container-related tests fail on branch-trunk-win Key: YARN-359 URL: https://issues.apache.org/jira/browse/YARN-359 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-359-branch-trunk-win.1.patch, YARN-359-branch-trunk-win.2.patch On branch-trunk-win, there are test failures in {{TestContainerManager}}, {{TestNodeManagerShutdown}}, {{TestContainerLaunch}}, and {{TestContainersMonitor}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-209) Capacity scheduler can leave application in pending state
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573188#comment-13573188 ] Vinod Kumar Vavilapalli commented on YARN-209: -- Haven't looked at the code yet, trying to understand the scenario. So, in other words, if an application gets submitted to the RM before any NM registered, the application will be stuck in pending state. Right? If so, we can write a test like that. Capacity scheduler can leave application in pending state - Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 3.0.0 Attachments: YARN-209.1.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-387) Fix inconsistent protocol naming
Vinod Kumar Vavilapalli created YARN-387: Summary: Fix inconsistent protocol naming Key: YARN-387 URL: https://issues.apache.org/jira/browse/YARN-387 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. We should fix these before we go beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-387) Fix inconsistent protocol naming
[ https://issues.apache.org/jira/browse/YARN-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-387: - Labels: incompatible (was: ) This is going to be an incompatible change for existing users of the alpha releases. Fix inconsistent protocol naming Key: YARN-387 URL: https://issues.apache.org/jira/browse/YARN-387 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Labels: incompatible We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. We should fix these before we go beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-387) Fix inconsistent protocol naming
[ https://issues.apache.org/jira/browse/YARN-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573206#comment-13573206 ] Vinod Kumar Vavilapalli commented on YARN-387: -- I propose we do the following conversions: Main protocols: - client_RM_protocol.proto - client_rm_protocol.proto - AM_RM_protocol.proto - am_rm_protocol.proto - container_manager.proto - am_nm_protocol.proto - ResourceTracker.proto - rm_nm_protocol.proto - LocalizationProtocol.proto - nm_localizer_protocol.proto - RMAdminProtocol.proto - rm_admin_protocol.proto Misc: - yarnprototunnelrpc.proto - yarn_rpc_tunnel_protos.proto In addition, we should - similarly rename all the java API classes backing the above protocols - add comments to all the proto files description as to what they do and can contain. Thoughts? Fix inconsistent protocol naming Key: YARN-387 URL: https://issues.apache.org/jira/browse/YARN-387 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. We should fix these before we go beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-387) Fix inconsistent protocol naming
[ https://issues.apache.org/jira/browse/YARN-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573234#comment-13573234 ] Sandy Ryza commented on YARN-387: - +1 to the proposal If I understand the RMAdminProtocol correctly, RPCs are sent to the RM? Would it make sense to call it the AdminRMProtocol to reflect this in line with the ordering in the other protocols? I think it would also be helpful to add/go over the comments for the java protocol classes, as that is the first place many developers will go when trying to understand how YARN works and how to program against it. Not sure if that's in the scope of this JIRA or not? Fix inconsistent protocol naming Key: YARN-387 URL: https://issues.apache.org/jira/browse/YARN-387 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Labels: incompatible We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. We should fix these before we go beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-374) Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication
[ https://issues.apache.org/jira/browse/YARN-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573238#comment-13573238 ] nemon lou commented on YARN-374: Agree that YARN-321 will help. Job History Server doesn't show jobs which killed by ClientRMProtocol.forceKillApplication -- Key: YARN-374 URL: https://issues.apache.org/jira/browse/YARN-374 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.0.1-alpha Reporter: nemon lou After i kill a app by typing bin/yarn rmadmin app -kill APP_ID, no job info is kept on JHS web page. However, when i kill a job by typing bin/mapred job -kill JOB_ID , i can see a killed job left on JHS. Some hive users are confused by that their jobs been killed but nothing left on JHS ,and killed app's info on RM web page is not enough.(They kill job by clientRMProtocol) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-365) Each NM heartbeat should not generate and event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573244#comment-13573244 ] Siddharth Seth commented on YARN-365: - This isn't very different from configuring all nodes to have a higher heartbeat interval. With a high heartbeat interval, the NM would send a batch of updates over to the RM, and this heartbeat would trigger a scheduling pass. This change de-links RM scheduling passes from NM heartbeats. The NM can continue to provide node updates with a smaller interval, and the RM handles these, along with a scheduling pass, as and when it chooses to. In this particular case, the scheduler queue ends up with a single scheduling event per node - but will attempt a scheduling run only on the next heartbeat from that node. At a later point, the scheduling could be changed to be triggered by the arrival of a new application - or to just run in a tight loop. If the scheduler cannot keep up, it ends up scheduling as fast as it can - without node heartbeats affecting the queue size. Also, completed container information from heartbeats is processed earlier (instead of waiting for the event in the queue to be processed) - making each scheduler pass more efficient. bq. I can see cases where the all at once is actually worse as it will spend more time on a single heartbeat and potentially not get to other things in the queue like apps added as fast. The event should not be delayed more than the time required to complete one scheduling pass across all nodes. I don't think this will be much better in the case of a growing scheduler queue. bq. The only way I can see this being beneficial is if we can aggregate the heartbeats and have the scheduler process less. Do you mean somehow aggregating heartbeats across nodes ? This approach does aggregate heartbeats for a single node. Each NM heartbeat should not generate and event for the Scheduler - Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-387) Fix inconsistent protocol naming
[ https://issues.apache.org/jira/browse/YARN-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573273#comment-13573273 ] Karthik Kambatla commented on YARN-387: --- Good idea! Fix inconsistent protocol naming Key: YARN-387 URL: https://issues.apache.org/jira/browse/YARN-387 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Labels: incompatible We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. We should fix these before we go beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira