[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.1.patch Had forgotten to remove the resource when the application finishes - updated patch which does. I think that this actually needs to be a per cluster (rather than a per-queue) limit, based on the name & the behavior most seem to expect - except that there can be a per-queue override to the value, and most other "values like it" end up being evaluated at the queue level. It seems as though either this should be a global value or possibly based on a portion of the cluster (perhaps the queues baseline portion of the cluster, then adjusted). Most likely, the right approach is to make the "usedAMResources" a single per-cluster value by attaching it to the parent queue (so, abstract cs queue instance of the root queue) - which wouldn't be difficult - and then it would be per-cluster as it probably should be. > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Assignee: Craig Welch >Priority: Critical > Attachments: YARN-2637.0.patch, YARN-2637.1.patch > > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217569#comment-14217569 ] Tsuyoshi OZAWA commented on YARN-2800: -- Thanks for your comment, Vinod and thanks for the patch, Wangda. +1 for removing MemoryNodeLabelsStore. My comments: * MemoryRMNodeLabelsManager for tests do nothing in new patch. How about renaming MemoryRMNodeLabelsManager to NullRMNodeLabelsManager for the consistency with RMStateStore? * Maybe not related to this JIRA, but it's better to add testing RMRestart with NodeLabelManager to avoid regressions. > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.0.patch Attaching a roughish but I think serviceable work in progress patch - based on manual testing/checking the logs it looks to work as it should - still need to write some unit tests & validate it against the existing tests... > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Assignee: Craig Welch >Priority: Critical > Attachments: YARN-2637.0.patch > > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch reassigned YARN-2637: - Assignee: Craig Welch > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Assignee: Craig Welch >Priority: Critical > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217516#comment-14217516 ] Hadoop QA commented on YARN-2878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682353/YARN-1964-docs.patch against trunk revision 79301e8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5877//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5877//console This message is automatically generated. > Fix DockerContainerExecutor.apt.vm formatting > - > > Key: YARN-2878 > URL: https://issues.apache.org/jira/browse/YARN-2878 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-1964-docs.patch > > > The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2738: Attachment: YARN-2738.003.patch Removed configurability in fairscheduler configuration as per discussion. > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch, YARN-2738.002.patch, > YARN-2738.003.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-2878: -- Attachment: YARN-1964-docs.patch > Fix DockerContainerExecutor.apt.vm formatting > - > > Key: YARN-2878 > URL: https://issues.apache.org/jira/browse/YARN-2878 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-1964-docs.patch > > > The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
Abin Shahab created YARN-2878: - Summary: Fix DockerContainerExecutor.apt.vm formatting Key: YARN-2878 URL: https://issues.apache.org/jira/browse/YARN-2878 Project: Hadoop YARN Issue Type: Improvement Reporter: Abin Shahab Assignee: Abin Shahab The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217448#comment-14217448 ] Tsuyoshi OZAWA commented on YARN-2865: -- [~rohithsharma], thanks for taking this issue. I'd like to +1 for adding Private and Unstable annotation to the methods defined in RMActiveServiceContext as Karthik mentioned. Otherwise points looks good to me. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217362#comment-14217362 ] Rohith commented on YARN-2865: -- Thanks Karthik and Jian He for review. I will update the patch. bq. In RMActiveServices, some are using rmContext#setter, some are using activeServiceContext#setter, we may make it consistent to use the latter RMContext has 5 setter methods. I used those methods to set from RMActiveService just to retain interface implementation. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217343#comment-14217343 ] Hadoop QA commented on YARN-2800: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682325/YARN-2800-20141118-2.patch against trunk revision 79301e8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5875//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5875//console This message is automatically generated. > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217337#comment-14217337 ] Tsuyoshi OZAWA commented on YARN-2404: -- [~jianhe], Thanks for your pinging me. I'll update a patch soon. > Remove ApplicationAttemptState and ApplicationState class in RMStateStore > class > > > Key: YARN-2404 > URL: https://issues.apache.org/jira/browse/YARN-2404 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, > YARN-2404.4.patch > > > We can remove ApplicationState and ApplicationAttemptState class in > RMStateStore, given that we already have ApplicationStateData and > ApplicationAttemptStateData records. we may just replace ApplicationState > with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217334#comment-14217334 ] Jian He commented on YARN-2356: --- [~sunilg], thanks for working on this. Patch looks good, one minor comment: {{doesn't exist in RM or History Server.}}, we may say {{TimeLineServer}} instead of {{History Server}} > yarn status command for non-existent application/application > attempt/container is too verbose > -- > > Key: YARN-2356 > URL: https://issues.apache.org/jira/browse/YARN-2356 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Sunil G >Assignee: Sunil G >Priority: Minor > Attachments: Yarn-2356.1.patch > > > *yarn application -status* or *applicationattempt -status* or *container > status* commands can suppress exception such as ApplicationNotFound, > ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in > RM or History Server. > For example, below exception can be suppressed better > sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status > application_1402668848165_0015 > No GC_PROFILE is given. Defaults to medium. > 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at > /10.18.40.77:45022 > Exception in thread "main" > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1402668848165_0015' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at $Proxy12.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): > Application with id 'application_1402668848165_0015' doesn't exist in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217308#comment-14217308 ] Hadoop QA commented on YARN-2404: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668577/YARN-2404.4.patch against trunk revision 79301e8. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5876//console This message is automatically generated. > Remove ApplicationAttemptState and ApplicationState class in RMStateStore > class > > > Key: YARN-2404 > URL: https://issues.apache.org/jira/browse/YARN-2404 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, > YARN-2404.4.patch > > > We can remove ApplicationState and ApplicationAttemptState class in > RMStateStore, given that we already have ApplicationStateData and > ApplicationAttemptStateData records. we may just replace ApplicationState > with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217302#comment-14217302 ] Jian He commented on YARN-2404: --- [~ozawa], sorry for the late response, patch not applying any more. mind updating the patch ? > Remove ApplicationAttemptState and ApplicationState class in RMStateStore > class > > > Key: YARN-2404 > URL: https://issues.apache.org/jira/browse/YARN-2404 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, > YARN-2404.4.patch > > > We can remove ApplicationState and ApplicationAttemptState class in > RMStateStore, given that we already have ApplicationStateData and > ApplicationAttemptStateData records. we may just replace ApplicationState > with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2206) Update document for applications REST API response examples
[ https://issues.apache.org/jira/browse/YARN-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217280#comment-14217280 ] Jian He commented on YARN-2206: --- [~kj-ki], thanks for working on this. Patch not applying any more, mind updating the patch ? I can commit once updated. > Update document for applications REST API response examples > --- > > Key: YARN-2206 > URL: https://issues.apache.org/jira/browse/YARN-2206 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.4.0 >Reporter: Kenji Kikushima >Assignee: Kenji Kikushima >Priority: Minor > Attachments: YARN-2206.patch > > > In ResourceManagerRest.apt.vm, Applications API responses are missing some > elements. > - JSON response should have "applicationType" and "applicationTags". > - XML response should have "applicationTags". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217268#comment-14217268 ] Akira AJISAKA commented on YARN-2157: - Thank you, [~jianhe]! > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Attachment: YARN-2800-20141118-2.patch Updated patch, fixed UT > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217225#comment-14217225 ] Hadoop QA commented on YARN-2800: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682292/YARN-2800-20141118-1.patch against trunk revision fbf81fb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5874//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5874//console This message is automatically generated. > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217207#comment-14217207 ] Sriram Rao commented on YARN-2877: -- The proposal: # Extend the NM to support task queueing. AM's can queue tasks directly at the NM's and the NM's will execute those tasks opportunistically. # Extend the type of containers that YARN exposes: #* CONSERVATIVE: This corresponds to containers allocated by YARN today. #* OPTIMISTIC: This corresponds to a new class of containers, which will be queued for execution at the NM. This extension allows AM's to control what type of container they are requesting from the RM framework. # Extend the NM with a "local RM" (i.e., a local Resource Manager) which uses local policies for deciding when an "OPTIMISTIC container" can be executed. We are exploring using timed leases for OPTIMISTIC containers to ensure minimum duration of execution. On the other hand, this mechanism allows NM's to free up resources and thus guarantee predictable start times for CONSERVATIVE containers. There are additional motivations for the uses of this feature and we will discuss them in follow-up comments. > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2877) Extend YARN to support distributed scheduling
Sriram Rao created YARN-2877: Summary: Extend YARN to support distributed scheduling Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217188#comment-14217188 ] Hadoop QA commented on YARN-2802: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682281/YARN-2802.005.patch against trunk revision fbf81fb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5873//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5873//console This message is automatically generated. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, > YARN-2802.005.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2522) AHSClient may be not necessary
[ https://issues.apache.org/jira/browse/YARN-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2522: -- Target Version/s: 2.7.0 > AHSClient may be not necessary > -- > > Key: YARN-2522 > URL: https://issues.apache.org/jira/browse/YARN-2522 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Per discussion in > [YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073], > it may be not necessary to have a separate AHSClient. The methods can be > incorporated into TimelineClient. APPLICATION_HISTORY_ENABLED is also useless > then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217126#comment-14217126 ] Hadoop QA commented on YARN-2876: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682279/YARN-2876.v1.patch against trunk revision fbf81fb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5871//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5871//console This message is automatically generated. > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-2876.v1.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217122#comment-14217122 ] Hudson commented on YARN-2157: -- FAILURE: Integrated in Hadoop-trunk-Commit #6570 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6570/]) YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA (jianhe: rev 90a968d6757511b6d89538516db0e699129d854c) * hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm * hadoop-yarn-project/CHANGES.txt > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 2.7.0 > > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217105#comment-14217105 ] Hudson commented on YARN-2870: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6569 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6569/]) YARN-2870. Updated the command to run the timeline server it the document. Contributed by Masatake Iwasaki. (zjshen: rev ef38fb9758f230c3021e70b749d7a11f8bac03f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm * hadoop-yarn-project/CHANGES.txt > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217084#comment-14217084 ] Hadoop QA commented on YARN-2157: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682280/YARN-2157.3.patch against trunk revision fbf81fb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5872//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5872//console This message is automatically generated. > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217074#comment-14217074 ] Zhijie Shen commented on YARN-2870: --- It's better to completely update the document (YARN-2854). Anyway, the patch is ready now, let's commit it. Thanks for the contribution, [~iwasakims]! > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2870) Update examples in document of Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2870: -- Assignee: Masatake Iwasaki > Update examples in document of Timeline Server > -- > > Key: YARN-2870 > URL: https://issues.apache.org/jira/browse/YARN-2870 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, timelineserver >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Trivial > Attachments: YARN-2870.1.patch > > > YARN-1982 renamed historyserver to timelineserver but there is still > deprecated name in docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Attachment: YARN-2800-20141118-1.patch Uploaded patch and kick Jenkins > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, > YARN-2800-20141118-1.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217049#comment-14217049 ] Zhijie Shen commented on YARN-2375: --- bq. Do you mean that we should not check for TIMELINE_SERVICE_ENABLED flag in the Application Master and rather have it work same way that it was doing before and only check that flag while sending data to timeline server? I think the logic could be: when TIMELINE_SERVICE_ENABLED == true, read the domain env var and construct the timeline client. Only if the timeline client is not null, the AM will send the data to timeline server where it should do it. > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217036#comment-14217036 ] Wei Yan commented on YARN-2876: --- Oh, you're right, I misunderstood it. > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-2876.v1.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217034#comment-14217034 ] Siqi Li commented on YARN-2876: --- No, If parent queue is also not configured, it will keep querying their ancestor queue until one of them has configured maxResource. Unless root is also not configured, then this method will return UNBOUNDED. > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-2876.v1.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: YARN-2802.005.patch > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, > YARN-2802.005.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217012#comment-14217012 ] Wei Yan commented on YARN-2876: --- {code} +if (maxShare.equals(Resources.unbounded()) && parent != null) { +return parent.getMaxShare(); {code} If the parent queue is also not configured, so it still returns UNBOUNDED? > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-2876.v1.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217010#comment-14217010 ] Mit Desai commented on YARN-2375: - Thanks for reviewing [~zjshen]. One clarification. bq. 1. We should still let DS work when the timeline service is disable, and we just need to prevent sending the timeline data to the timeline server while the DS app is running. Do you mean that we should not check for TIMELINE_SERVICE_ENABLED flag in the Application Master and rather have it work same way that it was doing before and only check that flag while sending data to timeline server? > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217008#comment-14217008 ] Siqi Li commented on YARN-2876: --- Hi [~sandyr], Can you take a look at this? Although, this problem only affects observability, it would be great if we can get this right. So that, it would be less worrisome for users that things might go wrong with hierarchical queue structure. > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-2876.v1.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217006#comment-14217006 ] Jian He commented on YARN-2157: --- thanks [~ajisakaa], sorry for the late feedback. looks good to me. just made some very minor edits myself. pending jenkins. > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2157: -- Attachment: YARN-2157.3.patch > Document YARN metrics > - > > Key: YARN-2157 > URL: https://issues.apache.org/jira/browse/YARN-2157 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch > > > YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: YARN-2876.v1.patch > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li > Attachments: YARN-2876.v1.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: screenshot-1.png > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li > Attachments: YARN-2876.v1.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Description: If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and Scheduler UI will display the entire cluster capacity as its maxResource instead of its parent queue's maxResource. > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216973#comment-14216973 ] Zhijie Shen commented on YARN-2375: --- [~mitdesai], thanks for the patch. Two suggestions: 1. We should still let DS work when the timeline service is disable, and we just need to prevent sending the timeline data to the timeline server while the DS app is running. 2. In JobHistoryEventHandler we need to check both the global config and the mr specific config to decide whether we emit MR history events. > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
Siqi Li created YARN-2876: - Summary: In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues Key: YARN-2876 URL: https://issues.apache.org/jira/browse/YARN-2876 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216945#comment-14216945 ] Jian He commented on YARN-2301: --- thanks [~Naganarasimha], looked at the latest patch, some comments: - we can just use containerReport.getFinishTime(), as it internally is checking “>0” already. {code} (containerReport.getFinishTime() == 0 ? "N/A" : Times .format(containerReport.getFinishTime())), {code} - the scheme could be https also, we should use WebAppUtils#getHttpSchemePrefix {code} "http://"; + WebAppUtils.getRunningLogURL(container.getNodeHttpAddress(), ConverterUtils.toString(containerId), user); {code} > Improve yarn container command > -- > > Key: YARN-2301 > URL: https://issues.apache.org/jira/browse/YARN-2301 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Naganarasimha G R > Labels: usability > Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2303.patch > > > While running yarn container -list command, some > observations: > 1) the scheme (e.g. http/https ) before LOG-URL is missing > 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to > print as time format. > 3) finish-time is 0 if container is not yet finished. May be "N/A" > 4) May have an option to run as yarn container -list OR yarn > application -list-containers also. > As attempt Id is not shown on console, this is easier for user to just copy > the appId and run it, may also be useful for container-preserving AM > restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216915#comment-14216915 ] Anubhav Dhoot commented on YARN-2802: - LGTM > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216914#comment-14216914 ] Zhijie Shen commented on YARN-2165: --- [~vasanthkumar], thanks for your contribution! Some comments about the patch. 1. TIMELINE_SERVICE_CLIENT_MAX_RETRIES can be -1 for endless retry. It's good to make it clear in yarn-default.xml too. 2. Instead of {{" property value should be positive and non-zero"}}, can we simply say {{" property value should be greater than zero}}? 3. You can use {{com.google.common.base.Preconditions.checkArgument}}. 4. Multiple lines are longer than 80 chars. 5. TIMELINE_SERVICE_LEVELDB_READ_CACHE_SIZE can be zero. 6. TIMELINE_SERVICE_LEVELDB_START_TIME_READ_CACHE_SIZE and TIMELINE_SERVICE_LEVELDB_START_TIME_WRITE_CACHE_SIZE seems to be > 0 because LRUMap requires this. However, ideally we should be able to disable cache completely. Let's deal with it separately. > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > - > > Key: YARN-2165 > URL: https://issues.apache.org/jira/browse/YARN-2165 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Karam Singh >Assignee: Vasanth kumar RJ > Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.patch > > > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > Currently if set yarn.timeline-service.ttl-ms=0 > Or yarn.timeline-service.ttl-ms=-86400 > Timeline server start successfully with complaining > {code} > 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore > (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl > -60480 and cycle interval 30 > {code} > At starting timelinserver should that yarn.timeline-service-ttl-ms > 0 > otherwise specially for -ive value discard oldvalues timestamp will be set > future value. Which may lead to inconsistancy in behavior > {code} > public void run() { > while (true) { > long timestamp = System.currentTimeMillis() - ttl; > try { > discardOldEntities(timestamp); > Thread.sleep(ttlInterval); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216897#comment-14216897 ] Wangda Tan commented on YARN-2800: -- [~vinodkv], [~ozawa], thanks for your comments, I've edited the title/desc of this JIRA, will upload a patch soon. > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216893#comment-14216893 ] Jian He commented on YARN-2865: --- looks good to me too. minor thing: In RMActiveServices, some are using {{rmContext#setter}}, some are using {{activeServiceContext#setter}}, we may make it consistent to use the latter > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Description: In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad experiecne. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot start if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, all operations trying to modify/use node labels will throw exception. was:Even though we have documented this, but it will be better to explicitly print a message in both RM/RMAdminCLI side to explicitly say that the node label being added will be lost across RM restart. > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad > experiecne. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot start > if we have some queue uses labels, and the labels don't exist in cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, all operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Description: In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad use experience. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot get started if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, any operations trying to modify/use node labels will throw exception. was: In the past, we have a MemoryNodeLabelStore, mostly for user to try this feature without configuring where to store node labels on file system. It seems convenient for user to try this, but actually it causes some bad experiecne. User may add/remove labels, and edit capacity-scheduler.xml. After RM restart, labels will gone, (we store it in mem). And RM cannot start if we have some queue uses labels, and the labels don't exist in cluster. As what we discussed, we should have an explicitly way to let user specify if he/she wants this feature or not. If node label is disabled, all operations trying to modify/use node labels will throw exception. > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > In the past, we have a MemoryNodeLabelStore, mostly for user to try this > feature without configuring where to store node labels on file system. It > seems convenient for user to try this, but actually it causes some bad use > experience. User may add/remove labels, and edit capacity-scheduler.xml. > After RM restart, labels will gone, (we store it in mem). And RM cannot get > started if we have some queue uses labels, and the labels don't exist in > cluster. > As what we discussed, we should have an explicitly way to let user specify if > he/she wants this feature or not. If node label is disabled, any operations > trying to modify/use node labels will throw exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Summary: Remove MemoryNodeLabelsStore and add a way to disable node labels feature (was: Remove MemoryNodeLabelsStore and add a way to disable ) > Remove MemoryNodeLabelsStore and add a way to disable node labels feature > - > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Summary: Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature (was: Remove MemoryNodeLabelsStore and add a way to disable node labels feature) > Remove MemoryNodeLabelsStore and add a way to enable/disable node labels > feature > > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to disable
[ https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2800: - Summary: Remove MemoryNodeLabelsStore and add a way to disable (was: Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled) > Remove MemoryNodeLabelsStore and add a way to disable > -- > > Key: YARN-2800 > URL: https://issues.apache.org/jira/browse/YARN-2800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch > > > Even though we have documented this, but it will be better to explicitly > print a message in both RM/RMAdminCLI side to explicitly say that the node > label being added will be lost across RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216865#comment-14216865 ] Hadoop QA commented on YARN-2375: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682195/YARN-2375.patch against trunk revision bcd402a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5870//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5870//console This message is automatically generated. > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216853#comment-14216853 ] Hadoop QA commented on YARN-2802: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682228/YARN-2802.004.patch against trunk revision bcd402a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5869//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5869//console This message is automatically generated. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2745) YARN new pluggable scheduler which does multi-resource packing
[ https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216828#comment-14216828 ] Wangda Tan commented on YARN-2745: -- [~srikanthkandula], Exactly, but I think YARN-314 is permit one priority to multiple resource, but locality might be different. Thanks, > YARN new pluggable scheduler which does multi-resource packing > -- > > Key: YARN-2745 > URL: https://issues.apache.org/jira/browse/YARN-2745 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, scheduler >Reporter: Robert Grandl > Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf > > > In this umbrella JIRA we propose a new pluggable scheduler, which accounts > for all resources used by a task (CPU, memory, disk, network) and it is able > to achieve three competing objectives: fairness, improve cluster utilization > and reduces average job completion time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216806#comment-14216806 ] Karthik Kambatla commented on YARN-2865: Patch looks mostly good. Minor comments - we should reduce the visibility of the class and its methods to package-private, mark it @Private @Unstable, and add comments that this class is expected to be used only by RMContext and ResourceManager. I just want to guard against new code using this instead of RMContext; we might want this to be accessible in the future, but we should probably keep the changes small in this JIRA. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216765#comment-14216765 ] Hadoop QA commented on YARN-2165: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682226/YARN-2165.2.patch against trunk revision bcd402a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.yarn.server.timeline.security.TestTimelineAuthenticationFilter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5868//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5868//console This message is automatically generated. > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > - > > Key: YARN-2165 > URL: https://issues.apache.org/jira/browse/YARN-2165 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Karam Singh >Assignee: Vasanth kumar RJ > Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.patch > > > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > Currently if set yarn.timeline-service.ttl-ms=0 > Or yarn.timeline-service.ttl-ms=-86400 > Timeline server start successfully with complaining > {code} > 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore > (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl > -60480 and cycle interval 30 > {code} > At starting timelinserver should that yarn.timeline-service-ttl-ms > 0 > otherwise specially for -ive value discard oldvalues timestamp will be set > future value. Which may lead to inconsistancy in behavior > {code} > public void run() { > while (true) { > long timestamp = System.currentTimeMillis() - ttl; > try { > discardOldEntities(timestamp); > Thread.sleep(ttlInterval); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216741#comment-14216741 ] zhihai xu commented on YARN-2802: - Hi [~adhoot], thanks for the review. Good finding, I uploaded a new patch YARN-2802.004.patch, which addressed your comments. thanks zhihai > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2802: Attachment: YARN-2802.004.patch > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasanth kumar RJ updated YARN-2165: --- Attachment: YARN-2165.2.patch [~airbots] Implemented your suggestion and attached patch > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > - > > Key: YARN-2165 > URL: https://issues.apache.org/jira/browse/YARN-2165 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Karam Singh >Assignee: Vasanth kumar RJ > Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.patch > > > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > Currently if set yarn.timeline-service.ttl-ms=0 > Or yarn.timeline-service.ttl-ms=-86400 > Timeline server start successfully with complaining > {code} > 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore > (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl > -60480 and cycle interval 30 > {code} > At starting timelinserver should that yarn.timeline-service-ttl-ms > 0 > otherwise specially for -ive value discard oldvalues timestamp will be set > future value. Which may lead to inconsistancy in behavior > {code} > public void run() { > while (true) { > long timestamp = System.currentTimeMillis() - ttl; > try { > discardOldEntities(timestamp); > Thread.sleep(ttlInterval); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216693#comment-14216693 ] Subru Krishnan commented on YARN-2738: -- [~adhoot], [~kasha] the approach suggested by you to global defaults & just have the ability to mark queues as usable by reservation system sounds good to me. > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch, YARN-2738.002.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216674#comment-14216674 ] Wangda Tan commented on YARN-1963: -- Thanks [~sunilg], [~vinodkv] for your great effort on this! I've just read through the design doc, some comments: 1) yarn.app.priority How this get to be implemented? Does this means, any YARN application doesn't need change a line of their code, can specify priority when submit the app using yarn CLI? I think if this can be done, we should extend to other YARN parameters like queue, node-label-expression, etc. 2) Specify only highest priority for queue and user I found there are property like {{yarn.scheduler.root..priority_label=high,low}} and {{yarn.scheduler.capacity.root...acl=user1,user2}}. I would perfer just specify only highest priority for queue and user. For example, it doesn't make sense to me if priority = \{high,mid,low\}, and a queue can access \{high,low\} only. Is there any benefit to specify individual priorities instead of highest priority? 3) User limit and priority I think we shouldn't consider user limit within priority level, because the priority is not a specific kind of resource. Comparing to node label, you cannot say, user-X of queue-A used 8G highest priority resource, but you can say, user-X of queue-A used 8G resource in node with label=GPU. There's no difference for a 2G resource allocated to highest/lowest priority. If we want to implement this, bq. it will not be fair to schedule resources in a uniform manner for all application in a queue with respect to user limits. I suggest to add preemption within queue considering priority. Upon YARN-2069, we can considering user-limit and priority together -- while enforcing user-limit, we always preempt from lower priority applications. Any thoughts? Thanks, Wangda > Support priorities across applications within the same queue > - > > Key: YARN-1963 > URL: https://issues.apache.org/jira/browse/YARN-1963 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Reporter: Arun C Murthy >Assignee: Sunil G > Attachments: YARN Application Priorities Design.pdf > > > It will be very useful to support priorities among applications within the > same queue, particularly in production scenarios. It allows for finer-grained > controls without having to force admins to create a multitude of queues, plus > allows existing applications to continue using existing queues which are > usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2874: --- Target Version/s: 2.7.0 Affects Version/s: (was: 2.4.1) (was: 2.5.0) > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216635#comment-14216635 ] Anubhav Dhoot commented on YARN-2802: - This metric probably will not catch issues with queue configuration, so cluster wide might be fine. But in general adding to queue metrics should give both queue specific metrics plus cluster wide metrics by using the root queue metrics. Just gives us more granular data. For this patch, 1. Can we please use separate variables for the two uses of launchAMStartTime. In AMLaunchedTransition we should use a new variable at {{appAttempt.launchAMStartTime = System.currentTimeMillis();}} 2. In TestQueueMetrics we are adding to the metrics but not checking it. It would be good to check the value back is possible. > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > -- > > Key: YARN-2802 > URL: https://issues.apache.org/jira/browse/YARN-2802 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2802.000.patch, YARN-2802.001.patch, > YARN-2802.002.patch, YARN-2802.003.patch > > > add AM container launch and register delay metrics in QueueMetrics to help > diagnose performance issue. > Added two metrics in QueueMetrics: > aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH > to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. > aMRegisterDelay: the time waiting from receiving event > RMAppAttemptEventType.LAUNCHED to receiving event > RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) > in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2874: --- Priority: Blocker (was: Critical) > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0, 2.4.1, 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use
[ https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216556#comment-14216556 ] Allen Wittenauer commented on YARN-2351: We really just need to move getconf to common. > YARN CLI should provide a command to list the configurations in use > --- > > Key: YARN-2351 > URL: https://issues.apache.org/jira/browse/YARN-2351 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.6.0 >Reporter: Zhijie Shen >Assignee: Varun Saxena > > To more easily understand the expected behavior of a yarn component, it is > good have the command line to be able to print the configurations in use for > RM, NM and timeline server daemons, as what we can do now via the web > interfaces: > {code} > http://:/conf > {code} > The command line could be something like: > {code} > yarn conf resourcemanager|nodemanager|timelineserver [host] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5
Tim Robertson created YARN-2875: --- Summary: Bump SLF4J to 1.7.7 from 1.7.5 Key: YARN-2875 URL: https://issues.apache.org/jira/browse/YARN-2875 Project: Hadoop YARN Issue Type: Bug Reporter: Tim Robertson Priority: Minor hadoop-yarn-common [uses log4j directly|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml#L167] and when trying to redirect that through an SLF4J bridge version 1.7.5 has issues, due to use of methods missing in log4j-over-slf4j version 1.7.5. This is documented on the [1.7.6 release notes|http://www.slf4j.org/news.html] but 1.7.7 should be suitable. This is applicable to all the projects using Hadoop motherpom, but Yarn appears to be bringing Log4J in, rather than coding to the SLF4J API. The issue shows in the logs as follows in Yarn MR apps, which is painful to diagnose. {code} WARN [2014-11-18 09:58:06,390+0100] [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Caught exception in callback postStart java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_71] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_71] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) ~[job.jar:0.22-SNAPSHOT] at com.sun.proxy.$Proxy2.postStart(Unknown Source) [na:na] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:157) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1036) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478) [job.jar:0.22-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474) [job.jar:0.22-SNAPSHOT] at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) [job.jar:0.22-SNAPSHOT] Caused by: java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_71] at java.lang.ClassLoader.defineClass(ClassLoader.java:800) ~[na:1.7.0_71] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[na:1.7.0_71] at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) ~[na:1.7.0_71] at java.net.URLClassLoader.access$100(URLClassLoader.java:71) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ~[na:1.7.0_71] at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_71] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_71] at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_71] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_71] at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_71] at org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:183) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:100) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) ~[job.jar:0.22-SNAPSHOT] at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) ~[job.jar:0.22-SNAPSHOT] at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) ~[na:1.7.0_71]
[jira] [Assigned] (YARN-2351) YARN CLI should provide a command to list the configurations in use
[ https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-2351: -- Assignee: Varun Saxena > YARN CLI should provide a command to list the configurations in use > --- > > Key: YARN-2351 > URL: https://issues.apache.org/jira/browse/YARN-2351 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.6.0 >Reporter: Zhijie Shen >Assignee: Varun Saxena > > To more easily understand the expected behavior of a yarn component, it is > good have the command line to be able to print the configurations in use for > RM, NM and timeline server daemons, as what we can do now via the web > interfaces: > {code} > http://:/conf > {code} > The command line could be something like: > {code} > yarn conf resourcemanager|nodemanager|timelineserver [host] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use
[ https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216465#comment-14216465 ] Rohith commented on YARN-2351: -- hdfs has command like {{hdfs -namenodes}} to determine cluster NameNodes. Similarly, if yarn supports command to get yarn cluster detail.like {{yarn getConf -resourcemanager}} and other commands will be good. > YARN CLI should provide a command to list the configurations in use > --- > > Key: YARN-2351 > URL: https://issues.apache.org/jira/browse/YARN-2351 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.6.0 >Reporter: Zhijie Shen > > To more easily understand the expected behavior of a yarn component, it is > good have the command line to be able to print the configurations in use for > RM, NM and timeline server daemons, as what we can do now via the web > interfaces: > {code} > http://:/conf > {code} > The command line could be something like: > {code} > yarn conf resourcemanager|nodemanager|timelineserver [host] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2873) improve LevelDB error handling for missing files DBException to avoid NM start failure.
[ https://issues.apache.org/jira/browse/YARN-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216464#comment-14216464 ] zhihai xu commented on YARN-2873: - Hi [~jlowe], I agree with you. The root cause is the Sorted Tables(*.sst) and MANIFEST file being deleted. If these files are stored away from tmp directory, it may solve the problem. thanks zhihai > improve LevelDB error handling for missing files DBException to avoid NM > start failure. > --- > > Key: YARN-2873 > URL: https://issues.apache.org/jira/browse/YARN-2873 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2873.000.patch, YARN-2873.001.patch > > > improve LevelDB error handling for missing files DBException to avoid NM > start failure. > We saw the following three level DB exceptions, all these exceptions cause NM > start failure. > DBException 1 in ShuffleHandler > {code} > INFO org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state STARTED; cause: > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 > missing files; e.g.: > /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/05.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 > missing files; e.g.: > /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/05.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:159) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:441) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:261) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:446) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 1 missing files; e.g.: > /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/05.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.mapred.ShuffleHandler.startStore(ShuffleHandler.java:475) > at > org.apache.hadoop.mapred.ShuffleHandler.recoverState(ShuffleHandler.java:443) > at > org.apache.hadoop.mapred.ShuffleHandler.serviceStart(ShuffleHandler.java:379) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {code} > DBException 2 in NMLeveldbStateStoreService: > {code} > Error starting NodeManager > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 > missing files; e.g.: > /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/05.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:152) > > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:190) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445) > > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492) > > Caused by: org.fusesource.level
[jira] [Resolved] (YARN-2863) ResourceManager will shutdown when job's queuename is empty
[ https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved YARN-2863. - Resolution: Invalid > ResourceManager will shutdown when job's queuename is empty > --- > > Key: YARN-2863 > URL: https://issues.apache.org/jira/browse/YARN-2863 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: yangping wu > Labels: hadoop > Fix For: 3.0.0 > > Original Estimate: 8h > Remaining Estimate: 8h > > When I submit a job to hadoop cluster, but don't specified a queuename as > follow > {code} > $HADOOP_HOME/bin/hadoop jar statistics.jar com.iteblog.Sts > -Dmapreduce.job.queuename= > {code} > and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by > user(default is true), then QueueManager will call createLeafQueue method to > create the queue, because mapreduce.job.queuename is empty and cann't find it > in QueueManager .But this will throw MetricsException > {code} > 2014-11-14 16:07:57,358 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ADDED to the scheduler > org.apache.hadoop.metrics2.MetricsException: Metrics source > QueueMetrics,q0=root already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.(FSQueue.java:57) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.(FSLeafQueue.java:57) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > 2014-11-14 16:07:57,359 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216452#comment-14216452 ] zhihai xu commented on YARN-2675: - Hi [~vinodkv], Are you ok with the new patch? The new patch passed Hadoop QA test. thanks zhihai > the containersKilled metrics is not updated when the container is killed > during localization. > - > > Key: YARN-2675 > URL: https://issues.apache.org/jira/browse/YARN-2675 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2675.000.patch, YARN-2675.001.patch, > YARN-2675.002.patch, YARN-2675.003.patch > > > The containersKilled metrics is not updated when the container is killed > during localization. We should add KILLING state in finished of > ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2375: Attachment: YARN-2375.patch Attaching the patch. [~jeagles], [~zjshen] can you see if this approach is good? > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch, YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216422#comment-14216422 ] Mit Desai commented on YARN-2375: - Some changes missed. I will upload another patch > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2375: Attachment: YARN-2375.patch Attaching the patch > Allow enabling/disabling timeline server per framework > -- > > Key: YARN-2375 > URL: https://issues.apache.org/jira/browse/YARN-2375 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Mit Desai > Attachments: YARN-2375.patch > > > This JIRA is to remove the ats enabled flag check within the > TimelineClientImpl. Example where this fails is below. > While running secure timeline server with ats flag set to disabled on > resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216413#comment-14216413 ] Chen He commented on YARN-2165: --- Agree with [~zhijie shen] to combine parameter checking together. > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > - > > Key: YARN-2165 > URL: https://issues.apache.org/jira/browse/YARN-2165 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Karam Singh >Assignee: Vasanth kumar RJ > Attachments: YARN-2165.1.patch, YARN-2165.patch > > > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > Currently if set yarn.timeline-service.ttl-ms=0 > Or yarn.timeline-service.ttl-ms=-86400 > Timeline server start successfully with complaining > {code} > 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore > (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl > -60480 and cycle interval 30 > {code} > At starting timelinserver should that yarn.timeline-service-ttl-ms > 0 > otherwise specially for -ive value discard oldvalues timestamp will be set > future value. Which may lead to inconsistancy in behavior > {code} > public void run() { > while (true) { > long timestamp = System.currentTimeMillis() - ttl; > try { > discardOldEntities(timestamp); > Thread.sleep(ttlInterval); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216407#comment-14216407 ] Chen He commented on YARN-2165: --- Hi [~vasanthkumar], thank you for the patch. A small nit in the unit test code. Once the unit test gets expected exception, it will be great to verify which parameter produces this exception. eg. verify the message in exception to check which parameter causes this exception. > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > - > > Key: YARN-2165 > URL: https://issues.apache.org/jira/browse/YARN-2165 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Karam Singh >Assignee: Vasanth kumar RJ > Attachments: YARN-2165.1.patch, YARN-2165.patch > > > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > Currently if set yarn.timeline-service.ttl-ms=0 > Or yarn.timeline-service.ttl-ms=-86400 > Timeline server start successfully with complaining > {code} > 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore > (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl > -60480 and cycle interval 30 > {code} > At starting timelinserver should that yarn.timeline-service-ttl-ms > 0 > otherwise specially for -ive value discard oldvalues timestamp will be set > future value. Which may lead to inconsistancy in behavior > {code} > public void run() { > while (true) { > long timestamp = System.currentTimeMillis() - ttl; > try { > discardOldEntities(timestamp); > Thread.sleep(ttlInterval); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2874: Attachment: YARN-2874.20141118-2.patch Updated patch with fixes for review comment > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0, 2.4.1, 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216315#comment-14216315 ] Naganarasimha G R commented on YARN-2874: - Sorry was suppose to remove synchronized from both methods signature (RenewalTimerTask.run and RenewalTimerTask.cancel ). Will reupload the patch again ... > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0, 2.4.1, 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2874.20141118-1.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216307#comment-14216307 ] Jason Lowe commented on YARN-2874: -- Thanks for the report and the patch, [~Naganarasimha]. However I don't see how the provided patch will resolve the deadlock issue. It only changed a boolean to an AtomicBoolean which in itself won't resolve an existing deadlock scenario. With the introduction of a concurrent data structure like AtomicBoolean I would expect some existing locks to be removed as a result. > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0, 2.4.1, 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2874.20141118-1.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216294#comment-14216294 ] Hudson commented on YARN-2690: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #9 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/9/]) YARN-2690. [YARN-2574] Make ReservationSystem and its dependent classes independent of Scheduler type. (Anubhav Dhoot via kasha) (kasha: rev 2fce6d61412843f0447f60cfe02326f769edae25) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/NoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/Planner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SharingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityReservationSystem.java > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch, > YARN-2690.004.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216300#comment-14216300 ] Hudson commented on YARN-2574: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #9 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/9/]) YARN-2690. [YARN-2574] Make ReservationSystem and its dependent classes independent of Scheduler type. (Anubhav Dhoot via kasha) (kasha: rev 2fce6d61412843f0447f60cfe02326f769edae25) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/NoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/Planner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SharingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityReservationSystem.java > Add support for FairScheduler to the ReservationSystem > -- > > Key: YARN-2574 > URL: https://issues.apache.org/jira/browse/YARN-2574 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Subru Krishnan >Assignee: Anubhav Dhoot > > YARN-1051 introduces the ReservationSystem and the current implementation is > based on CapacityScheduler. This JIRA proposes adding support for > FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216293#comment-14216293 ] Hudson commented on YARN-2414: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #9 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/9/]) YARN-2414. RM web UI: app page will crash if app is failed before any attempt has been created. Contributed by Wangda Tan (jlowe: rev 81c9d17af84ed87b9ded7057cb726a3855ddd32d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestAppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/QueueACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/CHANGES.txt > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > Fix For: 2.7.0 > > Attachments: YARN-2414.20141104-1.patch, YARN-2414.20141104-2.patch, > YARN-2414.patch > > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.j
[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216289#comment-14216289 ] Hadoop QA commented on YARN-2165: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682170/YARN-2165.1.patch against trunk revision 9dd5d67. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.yarn.server.timeline.security.TestTimelineAuthenticationFilter {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5867//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5867//console This message is automatically generated. > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > - > > Key: YARN-2165 > URL: https://issues.apache.org/jira/browse/YARN-2165 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Karam Singh >Assignee: Vasanth kumar RJ > Attachments: YARN-2165.1.patch, YARN-2165.patch > > > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > Currently if set yarn.timeline-service.ttl-ms=0 > Or yarn.timeline-service.ttl-ms=-86400 > Timeline server start successfully with complaining > {code} > 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore > (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl > -60480 and cycle interval 30 > {code} > At starting timelinserver should that yarn.timeline-service-ttl-ms > 0 > otherwise specially for -ive value discard oldvalues timestamp will be set > future value. Which may lead to inconsistancy in behavior > {code} > public void run() { > while (true) { > long timestamp = System.currentTimeMillis() - ttl; > try { > discardOldEntities(timestamp); > Thread.sleep(ttlInterval); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216272#comment-14216272 ] Hudson commented on YARN-2414: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1961 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1961/]) YARN-2414. RM web UI: app page will crash if app is failed before any attempt has been created. Contributed by Wangda Tan (jlowe: rev 81c9d17af84ed87b9ded7057cb726a3855ddd32d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/QueueACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestAppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/CHANGES.txt > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > Fix For: 2.7.0 > > Attachments: YARN-2414.20141104-1.patch, YARN-2414.20141104-2.patch, > YARN-2414.patch > > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.s
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216273#comment-14216273 ] Hudson commented on YARN-2690: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1961 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1961/]) YARN-2690. [YARN-2574] Make ReservationSystem and its dependent classes independent of Scheduler type. (Anubhav Dhoot via kasha) (kasha: rev 2fce6d61412843f0447f60cfe02326f769edae25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/Planner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/NoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SharingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch, > YARN-2690.004.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216279#comment-14216279 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1961 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1961/]) YARN-2690. [YARN-2574] Make ReservationSystem and its dependent classes independent of Scheduler type. (Anubhav Dhoot via kasha) (kasha: rev 2fce6d61412843f0447f60cfe02326f769edae25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/Planner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/NoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SharingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java > Add support for FairScheduler to the ReservationSystem > -- > > Key: YARN-2574 > URL: https://issues.apache.org/jira/browse/YARN-2574 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Subru Krishnan >Assignee: Anubhav Dhoot > > YARN-1051 introduces the ReservationSystem and the current implementation is > based on CapacityScheduler. This JIRA proposes adding support for > FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
[ https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2874: Attachment: YARN-2874.20141118-1.patch > Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further > apps > - > > Key: YARN-2874 > URL: https://issues.apache.org/jira/browse/YARN-2874 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0, 2.4.1, 2.5.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2874.20141118-1.patch > > > When token renewal fails and the application finishes this dead lock can occur > Jstack dump : > {quote} > Found one Java-level deadlock: > = > "DelegationTokenRenewer #181865": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > "DelayedTokenCanceller": > waiting to lock monitor 0x04141718 (object 0xc7eae720, a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), > which is held by "Timer-4" > "Timer-4": > waiting to lock monitor 0x00900918 (object 0xc18a9998, a > java.util.Collections$SynchronizedSet), > which is held by "DelayedTokenCanceller" > > Java stack information for the threads listed above: > === > "DelegationTokenRenewer #181865": > at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > "DelayedTokenCanceller": > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) > - waiting to lock <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) > - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) > at java.lang.Thread.run(Thread.java:745) > "Timer-4": > at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) > - waiting to lock <0xc18a9998> (a > java.util.Collections$SynchronizedSet) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) > - locked <0xc7eae720> (a > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2873) improve LevelDB error handling for missing files DBException to avoid NM start failure.
[ https://issues.apache.org/jira/browse/YARN-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216259#comment-14216259 ] Jason Lowe commented on YARN-2873: -- I have some serious concerns about this approach. As I mentioned during the discussions on YARN-2816, this is trying to recover from a completely invalid setup. If something is coming along and deleting (i.e.: corrupting) parts of the database then _that_ is the problem that needs to be corrected rather than worked around in the NM. Reaching into the internals of the leveldb files and assuming we can just delete some files and the database can open isn't a general solution. At that point arbitrary state has been lost, potentially entire container/application lifecycles, and who knows what will happen. Rather than assume we know how leveldb internals work (which could completely change if we upgrade the leveldb dependency and invalidate our assumptions), we should use JniDBFactory.factory.repair to try to repair the database rather than delete files here and there ourselves. Arguably if leveldb's own repair doesn't work and we're insistent that the NM must come up at all costs then we should just nuke the database and start without state. Of course the log should be filled with all sorts of errors to indicate this was in no way a normal startup. > improve LevelDB error handling for missing files DBException to avoid NM > start failure. > --- > > Key: YARN-2873 > URL: https://issues.apache.org/jira/browse/YARN-2873 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2873.000.patch, YARN-2873.001.patch > > > improve LevelDB error handling for missing files DBException to avoid NM > start failure. > We saw the following three level DB exceptions, all these exceptions cause NM > start failure. > DBException 1 in ShuffleHandler > {code} > INFO org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state STARTED; cause: > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 > missing files; e.g.: > /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/05.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 > missing files; e.g.: > /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/05.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:159) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:441) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:261) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:446) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 1 missing files; e.g.: > /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/05.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.mapred.ShuffleHandler.startStore(ShuffleHandler.java:475) > at > org.apache.hadoop.mapred.ShuffleHandler.recoverState(ShuffleHandler.java:443) > at > org.apache.hadoop.mapred.ShuffleHandler.serviceStart(ShuffleHandler.java:379) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 10 more > {code} > DBException 2 in NMLe
[jira] [Created] (YARN-2874) Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps
Naganarasimha G R created YARN-2874: --- Summary: Dead lock in "DelegationTokenRenewer" which blocks RM to execute any further apps Key: YARN-2874 URL: https://issues.apache.org/jira/browse/YARN-2874 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1, 2.4.1, 2.5.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Critical When token renewal fails and the application finishes this dead lock can occur Jstack dump : {quote} Found one Java-level deadlock: = "DelegationTokenRenewer #181865": waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by "DelayedTokenCanceller" "DelayedTokenCanceller": waiting to lock monitor 0x04141718 (object 0xc7eae720, a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask), which is held by "Timer-4" "Timer-4": waiting to lock monitor 0x00900918 (object 0xc18a9998, a java.util.Collections$SynchronizedSet), which is held by "DelayedTokenCanceller" Java stack information for the threads listed above: === "DelegationTokenRenewer #181865": at java.util.Collections$SynchronizedCollection.add(Collections.java:1636) - waiting to lock <0xc18a9998> (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "DelayedTokenCanceller": at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443) - waiting to lock <0xc7eae720> (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558) - locked <0xc18a9998> (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599) at java.lang.Thread.run(Thread.java:745) "Timer-4": at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639) - waiting to lock <0xc18a9998> (a java.util.Collections$SynchronizedSet) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437) - locked <0xc7eae720> (a org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Found 1 deadlock. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216246#comment-14216246 ] Anubhav Dhoot commented on YARN-2738: - The minimal configuration we need is ability to mark queues as usable by the reservation system. We can use global defaults for the rest to begin with. > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch, YARN-2738.002.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216242#comment-14216242 ] Hadoop QA commented on YARN-2865: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682163/YARN-2865.patch against trunk revision 9dd5d67. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5866//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5866//console This message is automatically generated. > Application recovery continuously fails with "Application with id already > present. Cannot duplicate" > > > Key: YARN-2865 > URL: https://issues.apache.org/jira/browse/YARN-2865 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: YARN-2865.patch, YARN-2865.patch > > > YARN-2588 handles exception thrown while transitioningToActive and reset > activeServices. But it misses out clearing RMcontext apps/nodes details and > ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasanth kumar RJ updated YARN-2165: --- Attachment: YARN-2165.1.patch [~zjshen] Implemented your comments. Please give comments if any change required > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > - > > Key: YARN-2165 > URL: https://issues.apache.org/jira/browse/YARN-2165 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Karam Singh >Assignee: Vasanth kumar RJ > Attachments: YARN-2165.1.patch, YARN-2165.patch > > > Timelineserver should validate that yarn.timeline-service.ttl-ms is greater > than zero > Currently if set yarn.timeline-service.ttl-ms=0 > Or yarn.timeline-service.ttl-ms=-86400 > Timeline server start successfully with complaining > {code} > 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore > (LeveldbTimelineStore.java:(247)) - Starting deletion thread with ttl > -60480 and cycle interval 30 > {code} > At starting timelinserver should that yarn.timeline-service-ttl-ms > 0 > otherwise specially for -ive value discard oldvalues timestamp will be set > future value. Which may lead to inconsistancy in behavior > {code} > public void run() { > while (true) { > long timestamp = System.currentTimeMillis() - ttl; > try { > discardOldEntities(timestamp); > Thread.sleep(ttlInterval); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216213#comment-14216213 ] Hudson commented on YARN-2690: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #9 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/9/]) YARN-2690. [YARN-2574] Make ReservationSystem and its dependent classes independent of Scheduler type. (Anubhav Dhoot via kasha) (kasha: rev 2fce6d61412843f0447f60cfe02326f769edae25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/NoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SharingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/Planner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.7.0 > > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch, > YARN-2690.004.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216212#comment-14216212 ] Hudson commented on YARN-2414: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #9 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/9/]) YARN-2414. RM web UI: app page will crash if app is failed before any attempt has been created. Contributed by Wangda Tan (jlowe: rev 81c9d17af84ed87b9ded7057cb726a3855ddd32d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/QueueACLsManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestAppPage.java > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > Fix For: 2.7.0 > > Attachments: YARN-2414.20141104-1.patch, YARN-2414.20141104-2.patch, > YARN-2414.patch > > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servl
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216219#comment-14216219 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #9 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/9/]) YARN-2690. [YARN-2574] Make ReservationSystem and its dependent classes independent of Scheduler type. (Anubhav Dhoot via kasha) (kasha: rev 2fce6d61412843f0447f60cfe02326f769edae25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/NoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SharingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/Planner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java > Add support for FairScheduler to the ReservationSystem > -- > > Key: YARN-2574 > URL: https://issues.apache.org/jira/browse/YARN-2574 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Subru Krishnan >Assignee: Anubhav Dhoot > > YARN-1051 introduces the ReservationSystem and the current implementation is > based on CapacityScheduler. This JIRA proposes adding support for > FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216198#comment-14216198 ] Hudson commented on YARN-2414: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1937 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1937/]) YARN-2414. RM web UI: app page will crash if app is failed before any attempt has been created. Contributed by Wangda Tan (jlowe: rev 81c9d17af84ed87b9ded7057cb726a3855ddd32d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestAppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/QueueACLsManager.java > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > Fix For: 2.7.0 > > Attachments: YARN-2414.20141104-1.patch, YARN-2414.20141104-2.patch, > YARN-2414.patch > > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.Ses
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216205#comment-14216205 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1937 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1937/]) YARN-2690. [YARN-2574] Make ReservationSystem and its dependent classes independent of Scheduler type. (Anubhav Dhoot via kasha) (kasha: rev 2fce6d61412843f0447f60cfe02326f769edae25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/NoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SharingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/Planner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/SimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java > Add support for FairScheduler to the ReservationSystem > -- > > Key: YARN-2574 > URL: https://issues.apache.org/jira/browse/YARN-2574 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Subru Krishnan >Assignee: Anubhav Dhoot > > YARN-1051 introduces the ReservationSystem and the current implementation is > based on CapacityScheduler. This JIRA proposes adding support for > FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)