[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682496#comment-14682496 ] Xuan Gong commented on YARN-3999: - +1 lgtm. Will commit later if there are no other comments RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14687395#comment-14687395 ] Anubhav Dhoot commented on YARN-4046: - [~cnauroth] appreciate your review Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Attachments: YARN-4096.001.patch On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3045: Attachment: YARN-3045-YARN-2928.009.patch Hi [~djp], Attaching a new patching resolving your comments also have modified one approach,, for cases where we to publish the timeline entities directly (not through wrapped application or container events) like ContainerMetrics, i have added a new NMTimelineEvent which accepts the TimelineEntity and ApplicationId, this approach avoids creating new event classes and would just suffice exposing method in NMTimelinePublisher. Also have fixed the test case failures but the javac warnings seems not to be related to my modifications and findbugs dint have any issue reported in the report. will check for it in next jenkins run [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14687389#comment-14687389 ] Wangda Tan commented on YARN-4045: -- [~tgraves]/[~shahrs87], I think the case could happen when container reservation interacts with node disconnect, one example is: {code} A cluster has 6 nodes, each node has 20G resource, and usage is N1-N4, are all used N5-N6, both of them are used 10G. An app ask 15G container, assume it is reserved at N5, so total used resource = 20G * 4 + 10G * 2 + 15G (just reserved) = 115G Then, N6 disconnected, now cluster resource becomes 100G, and used resource = 105G. {code} I've just checked fixes, YARN-3361 doesn't have related fixes. And currently we don't have a fix for above corner case. Another problem is caused by DRC, from 2.7.1, we have set availableResource = max(availableResource, Resources.none()). {code} childQueue.getMetrics().setAvailableResourcesToQueue( Resources.max( calculator, clusterResource, available, Resources.none() ) ); {code} But if you're using DRC, if a resource has availableMB 0 and availableVCores 0, it could report such resource Resources.None(). We may need to fix this case as well. Thoughts? Negative avaialbleMB is being reported for root queue. -- Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Attachment: YARN-4096.001.patch Attaching patch that prefixes -- when using negative pid for kill Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Attachments: YARN-4096.001.patch On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14687398#comment-14687398 ] Jian He commented on YARN-4026: --- - why assignment.setFulfilledReservation(true); is called in Reserved state ? {code} if (result.getAllocationState() == AllocationState.RESERVED) { // This is a reserved container LOG.info(Reserved container + application= + application.getApplicationId() + resource= + allocatedResource + queue= + this.toString() + cluster= + clusterResource); assignment.getAssignmentInformation().addReservationDetails( updatedContainer.getId(), application.getCSLeafQueue().getQueuePath()); assignment.getAssignmentInformation().incrReservations(); Resources.addTo(assignment.getAssignmentInformation().getReserved(), allocatedResource); assignment.setFulfilledReservation(true); } else { {code} - I think here can always return ContainerAllocation.LOCALITY_SKIPPED as the semantics of this method is to try to allocate a container for certain locality. {code} return type == NodeType.OFF_SWITCH ? ContainerAllocation.APP_SKIPPED : ContainerAllocation.LOCALITY_SKIPPED; {code} The caller here can choose to return APP_SKIPPED if it sees the LOCALITY_SKIPPED {code} assigned = assignOffSwitchContainers(clusterResource, offSwitchResourceRequest, node, priority, reservedContainer, schedulingMode, currentResoureLimits); assigned.requestNodeType = requestType; return assigned; } {code} FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests Key: YARN-4026 URL: https://issues.apache.org/jira/browse/YARN-4026 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-4026.1.patch After YARN-3983, we have an extensible ContainerAllocator which can be used by FiCaSchedulerApp to decide how to allocate resources. While working on YARN-1651 (allocate resource to increase container), I found one thing in existing logic not flexible enough: - ContainerAllocator decides what to allocate for a given node and priority: To support different kinds of resource allocation, for example, priority as weight / skip priority or not, etc. It's better to let ContainerAllocator to choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Description: On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error. The attempts are not retried. {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} was: On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Attachments: YARN-4096.001.patch On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error. The attempts are not retried. {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service
[ https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692525#comment-14692525 ] Sangjin Lee commented on YARN-2657: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. MiniYARNCluster to (optionally) add MicroZookeeper service -- Key: YARN-2657 URL: https://issues.apache.org/jira/browse/YARN-2657 Project: Hadoop YARN Issue Type: Sub-task Components: test Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2567-001.patch, YARN-2657-002.patch This is needed for testing things like YARN-2646: add an option for the {{MiniYarnCluster}} to start a {{MicroZookeeperService}}. This is just another YARN service to create and track the lifecycle. The {{MicroZookeeperService}} publishes its binding information for direct takeup by the registry services...this can address in-VM race conditions. The default setting for this service is off -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692526#comment-14692526 ] Sangjin Lee commented on YARN-2599: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Standby RM should also expose some jmx and metrics -- Key: YARN-2599 URL: https://issues.apache.org/jira/browse/YARN-2599 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Rohith Sharma K S YARN-1898 redirects jmx and metrics to the Active. As discussed there, we need to separate out metrics displayed so the Standby RM can also be monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID
[ https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692523#comment-14692523 ] Sangjin Lee commented on YARN-2746: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. YARNDelegationTokenID misses serializing version from the common abstract ID Key: YARN-2746 URL: https://issues.apache.org/jira/browse/YARN-2746 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Vinod Kumar Vavilapalli Assignee: Jian He I found this during review of YARN-2743. bq. AbstractDTId had a version, we dropped that in the protobuf serialization. We should just write it during the serialization and read it back? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3478) FairScheduler page not performed because different enum of YarnApplicationState and RMAppState
[ https://issues.apache.org/jira/browse/YARN-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692520#comment-14692520 ] Sangjin Lee commented on YARN-3478: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. FairScheduler page not performed because different enum of YarnApplicationState and RMAppState --- Key: YARN-3478 URL: https://issues.apache.org/jira/browse/YARN-3478 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Xu Chen Attachments: YARN-3478.1.patch, YARN-3478.2.patch, YARN-3478.3.patch, screenshot-1.png Got exception from log java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1225) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.DynamicUserWebFilter$DynamicUserFilter.doFilter(DynamicUserWebFilter.java:59) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692524#comment-14692524 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 17s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 55s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 46s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 6s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 6m 4s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 55m 53s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749943/YARN-3045-YARN-2928.009.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / 07433c2 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/diffJavacWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8826/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8826/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues
[ https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692528#comment-14692528 ] Sangjin Lee commented on YARN-2457: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. FairScheduler: Handle preemption to help starved parent queues -- Key: YARN-2457 URL: https://issues.apache.org/jira/browse/YARN-2457 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't check for parent queue starvation. We need to check that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692521#comment-14692521 ] Sangjin Lee commented on YARN-2859: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster -- Key: YARN-2859 URL: https://issues.apache.org/jira/browse/YARN-2859 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah Assignee: Zhijie Shen Priority: Critical Labels: 2.6.1-candidate In mini cluster, a random port should be used. Also, the config is not updated to the host that the process got bound to. {code} 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer address: localhost:10200 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer web address: 0.0.0.0:8188 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692709#comment-14692709 ] sandflee commented on YARN-2038: I thought it's the same issue to YARN-3519, but it seems not. I'm also confusing what the purpose of this issue now Revisit how AMs learn of containers from previous attempts -- Key: YARN-2038 URL: https://issues.apache.org/jira/browse/YARN-2038 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Based on YARN-556, we need to update the way AMs learn about containers allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692729#comment-14692729 ] Xuan Gong commented on YARN-3999: - Thanks, Jian. Committed into trunk/branch-2/branch-2.7. RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.7.2 Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692412#comment-14692412 ] Hadoop QA commented on YARN-4047: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 47s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 1s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 27s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 53s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749935/YARN-4047.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7c796fd | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8824/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8824/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8824/console | This message was automatically generated. ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681416#comment-14681416 ] Hadoop QA commented on YARN-3250: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749621/0002-YARN-3250.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fa1d84a | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8820/console | This message was automatically generated. Support admin cli interface in for Application Priority --- Key: YARN-3250 URL: https://issues.apache.org/jira/browse/YARN-3250 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Rohith Sharma K S Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch Current Application Priority Manager supports only configuration via file. To support runtime configurations for admin cli and REST, a common management interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated YARN-3964: -- Attachment: YARN-3964.002.patch Support NodeLabelsProvider at Resource Manager side --- Key: YARN-3964 URL: https://issues.apache.org/jira/browse/YARN-3964 Project: Hadoop YARN Issue Type: Sub-task Reporter: Dian Fu Assignee: Dian Fu Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, YARN-3964.1.patch Currently, CLI/REST API is provided in Resource Manager to allow users to specify labels for nodes. For labels which may change over time, users will have to start a cron job to update the labels. This has the following limitations: - The cron job needs to be run in the YARN admin user. - This makes it a little complicate to maintain as users will have to make sure this service/daemon is alive. Adding a Node Labels Provider in Resource Manager will provide user more flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4037) Hadoop - failed redirect for container
[ https://issues.apache.org/jira/browse/YARN-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681316#comment-14681316 ] Mohammad Shahid Khan commented on YARN-4037: Hi Gagan, The below convention in configuring the log server url. {CODE} property nameyarn.log.server.url/name valuehttp://$jobhistoryserver.full.hostname:port/jobhistory/logs/value descriptionURL for job history server /description /property {CODE} Note: The port should be the port being used for the mapreduce.jobhistory.webapp.address. for example if the mapreduce.jobhistory.webapp.address has the value ip1:10988, then the host and port for the log server shuld ip1:10988 Please verify once your configuration. Hadoop - failed redirect for container -- Key: YARN-4037 URL: https://issues.apache.org/jira/browse/YARN-4037 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.1 Environment: Windows 7, Apache Hadoop 2.7.1 Reporter: Gagan I believe this issue has been addressed earlier in https://issues.apache.org/jira/browse/YARN-1473 though I am not sure because the description of the JIRA does not talk about the following message Failed while trying to construct the redirect url to the log server. Log Server url may not be configured java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all. Could some one look at the same and provide detail on the root cause and resolution ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused
[ https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681526#comment-14681526 ] Rohith Sharma K S commented on YARN-3924: - If I see from user side, user would be expecting that to differente exception for *connecting to Standby RM* and *connecting to invalid/not started resourceManager address*. But as per RM HA design, both the scenario's are treated as same. The reason is StandBy RM does not opens any rpc server for client communication. If the client is trying to submit a job, then client retry for certain amout of time for both configured rm.ha-ids and throw connectionRefused exception. There are 2 possibilities client might throw connection refused # Configuring wrong/invalid *ha.rm-ids* at client is user mistake, this can be rechecked by user. # Both RM's are in StandBy for long time is problem from YARN and need to find the reason for this state. Ideally if any issue with ZK, after sometime RM will shutdown. If you can share logs for both RM's in standBy would be helpful for analysis. Submitting an application to standby ResourceManager should respond better than Connection Refused -- Key: YARN-3924 URL: https://issues.apache.org/jira/browse/YARN-3924 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Dustin Cote Assignee: Ajith S Priority: Minor When submitting an application directly to a standby resource manager, the resource manager responds with 'Connection Refused' rather than indicating that it is a standby resource manager. Because the resource manager is aware of its own state, I feel like we can have the 8032 port open for standby resource managers and reject the request with something like 'Cannot process application submission from this standby resource manager'. This would be especially helpful for debugging oozie problems when users put in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM address but rather point to a specific resource manager). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681574#comment-14681574 ] Dian Fu commented on YARN-3964: --- Updated the patch with the following updates: - remove the interface modification to NodeLabelsProvider - improve Fetcher implementation to update node labels in batch Support NodeLabelsProvider at Resource Manager side --- Key: YARN-3964 URL: https://issues.apache.org/jira/browse/YARN-3964 Project: Hadoop YARN Issue Type: Sub-task Reporter: Dian Fu Assignee: Dian Fu Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, YARN-3964.1.patch Currently, CLI/REST API is provided in Resource Manager to allow users to specify labels for nodes. For labels which may change over time, users will have to start a cron job to update the labels. This has the following limitations: - The cron job needs to be run in the YARN admin user. - This makes it a little complicate to maintain as users will have to make sure this service/daemon is alive. Adding a Node Labels Provider in Resource Manager will provide user more flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681312#comment-14681312 ] Li Lu commented on YARN-3814: - Thank [~varun_saxena]! bq. So if GenericObjectMapper cannot convert the value and throws an Exception, we merely forward whatever came in the request. Hope I am not misunderstanding your comment. OK that's fine. Actually originally I meant parseKeyStrValuesStr and parseKeyStrValueStr, but not parseKeyStrValueObj, but never mind. bq. Ok. Will move these helper functions to another file. Actually my key point here is not about separating the file, but to maximize code reuse for those parse methods. Part of parseKeyStrValuesStr and parseKeyValue are quite similar. If we can reuse most of the code in parseKeyValue and not mixing it in the middle of a sequence of parse helper methods, I'm totally fine with keeping them in the same file. bq. Would a code comment be fine ? Sure, but it's certainly not harmful to generate one more log, especially when FS storage implementation is for debug only? It's up to you though. :) REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681288#comment-14681288 ] Varun Saxena commented on YARN-3814: [~gtCarrera9], bq. Why we're not using if statement but using exception handling to decide the type of pairStrs[1]? I think we can pretty much restrict the type of pairStrs[1] here, so maybe instanceof will do the work? We are not using exception handling to determine the type. readValue can throw an Exception which has been caught here. So if GenericObjectMapper cannot convert the value and throws an Exception, we merely forward whatever came in the request. Hope I am not misunderstanding your comment. bq. We may further want to move those parse methods into a helper method collection file? Ok. Will move these helper functions to another file. bq. Change its name into newEntity or initEntity? Hmm...Well this function entity is used as a shorthand for creating entity and used during assertion. Maybe name it as newEntity. bq. But still we need some rationales in the code as comment for why we can swallow the exception. It will be also helpful if we output something about the 404? I think it will be quite hard to debug if we hit an exception which cause a null return value and then a 404. Would a code comment be fine ? bq. String msg = new String(); instead of using immediate value Ok. bq. We don't really need a separate JIRA for each set of RESTful APIs since this part is relatively trivial. I agree. Even I meant to do it in a single JIRA. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681649#comment-14681649 ] Hudson commented on YARN-3873: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1014 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1014/]) YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. (Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java pendingApplications in LeafQueue should also use OrderingPolicy --- Key: YARN-3873 URL: https://issues.apache.org/jira/browse/YARN-3873 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 0006-YARN-3873.patch Currently *pendingApplications* in LeafQueue is using {{applicationComparator}} from CapacityScheduler. This can be changed and pendingApplications can use the OrderingPolicy configured in Queue level (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime
[ https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681650#comment-14681650 ] Hudson commented on YARN-3887: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1014 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1014/]) YARN-3887. Support changing Application priority during runtime. Contributed by Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java Support for changing Application priority during runtime Key: YARN-3887 URL: https://issues.apache.org/jira/browse/YARN-3887 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 0006-YARN-3887.patch After YARN-2003, adding support to change priority of an application after submission. This ticket will handle the server side implementation for same. A new RMAppEvent will be created to handle this, and will be common for all schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused
[ https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681677#comment-14681677 ] Ajith S commented on YARN-3924: --- Hi [~rohithsharma] +1 and Thanks for the input, agree with you regarding RM HA design. But however, i think what [~cotedm] is conveying is, in any scenario, if both RM nodes in HA(for whatever reason maybe) are in Standby, then client should have got back a reasonable StandbyException instead of connection refused. If i can suggest, can we change it so that rpc server can be started in standby too, but before it sends response we can check if its active or else throw StandbyException any thoughts.? Submitting an application to standby ResourceManager should respond better than Connection Refused -- Key: YARN-3924 URL: https://issues.apache.org/jira/browse/YARN-3924 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Dustin Cote Assignee: Ajith S Priority: Minor When submitting an application directly to a standby resource manager, the resource manager responds with 'Connection Refused' rather than indicating that it is a standby resource manager. Because the resource manager is aware of its own state, I feel like we can have the 8032 port open for standby resource managers and reject the request with something like 'Cannot process application submission from this standby resource manager'. This would be especially helpful for debugging oozie problems when users put in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM address but rather point to a specific resource manager). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681778#comment-14681778 ] Hadoop QA commented on YARN-3964: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 44s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 21s | The applied patch generated 2 new checkstyle issues (total was 211, now 212). | | {color:red}-1{color} | checkstyle | 2m 45s | The applied patch generated 9 new checkstyle issues (total was 0, now 9). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 15s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 0m 21s | Tests failed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 0s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 6m 38s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 56m 43s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 121m 20s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749814/YARN-3964.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fa1d84a | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8822/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8822/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8822/console | This message was automatically generated. Support NodeLabelsProvider at Resource Manager side --- Key: YARN-3964 URL: https://issues.apache.org/jira/browse/YARN-3964 Project: Hadoop YARN Issue Type: Sub-task Reporter: Dian Fu Assignee: Dian Fu
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681644#comment-14681644 ] Hudson commented on YARN-3873: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/284/]) YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. (Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java pendingApplications in LeafQueue should also use OrderingPolicy --- Key: YARN-3873 URL: https://issues.apache.org/jira/browse/YARN-3873 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 0006-YARN-3873.patch Currently *pendingApplications* in LeafQueue is using {{applicationComparator}} from CapacityScheduler. This can be changed and pendingApplications can use the OrderingPolicy configured in Queue level (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681583#comment-14681583 ] Hadoop QA commented on YARN-3999: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 2s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:red}-1{color} | checkstyle | 3m 40s | The applied patch generated 7 new checkstyle issues (total was 87, now 88). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 20s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 17s | Tests failed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 1m 54s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 53m 9s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 128m 26s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.net.TestNetUtils | | | hadoop.ha.TestZKFailoverController | | | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749775/YARN-3999.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fa1d84a | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8821/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8821/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8821/console | This message was automatically generated. RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime
[ https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681645#comment-14681645 ] Hudson commented on YARN-3887: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/284/]) YARN-3887. Support changing Application priority during runtime. Contributed by Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java Support for changing Application priority during runtime Key: YARN-3887 URL: https://issues.apache.org/jira/browse/YARN-3887 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 0006-YARN-3887.patch After YARN-2003, adding support to change priority of an application after submission. This ticket will handle the server side implementation for same. A new RMAppEvent will be created to handle this, and will be common for all schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681691#comment-14681691 ] Rohith Sharma K S commented on YARN-3999: - Thanks [~jianhe] for updating the patch.. One doubt is SystemMetricsPublisher has been moved from RMActiveServices to ResourceManager. So this service will not be reinitialized on every RM switch. Thinking that this would lead for processing stale events even after RM is in standby. If any case, the same RM becomes active SystemMetricsPublisher dispatcher publishes stale events plus recovered application events. Anyway events processing will happen in the sequential order if same RM comes back Active. But issue may can ocure when the different RM becomes active i.e # RM1 is active and publishing the events # RM1 is transitioning to standby,and some events are in the queue to be updated in the timeline sever # RM2 become active and recovered the applications. When application got finished, RM2 systempublisher publishes app status as finished. # RM1 is still processing the events for app which would process bit late i.e after RM2 processed. Doesn't it cause problem? Any thoughts? RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681702#comment-14681702 ] Sunil G commented on YARN-3873: --- Thank you very much [~leftnoteasy] for the review and commit! pendingApplications in LeafQueue should also use OrderingPolicy --- Key: YARN-3873 URL: https://issues.apache.org/jira/browse/YARN-3873 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 0006-YARN-3873.patch Currently *pendingApplications* in LeafQueue is using {{applicationComparator}} from CapacityScheduler. This can be changed and pendingApplications can use the OrderingPolicy configured in Queue level (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime
[ https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681920#comment-14681920 ] Hudson commented on YARN-3887: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #281 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/281/]) YARN-3887. Support changing Application priority during runtime. Contributed by Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java Support for changing Application priority during runtime Key: YARN-3887 URL: https://issues.apache.org/jira/browse/YARN-3887 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 0006-YARN-3887.patch After YARN-2003, adding support to change priority of an application after submission. This ticket will handle the server side implementation for same. A new RMAppEvent will be created to handle this, and will be common for all schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681916#comment-14681916 ] Hudson commented on YARN-3873: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2230 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2230/]) YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. (Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java pendingApplications in LeafQueue should also use OrderingPolicy --- Key: YARN-3873 URL: https://issues.apache.org/jira/browse/YARN-3873 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 0006-YARN-3873.patch Currently *pendingApplications* in LeafQueue is using {{applicationComparator}} from CapacityScheduler. This can be changed and pendingApplications can use the OrderingPolicy configured in Queue level (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4037) Hadoop - failed redirect for container
[ https://issues.apache.org/jira/browse/YARN-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gagan updated YARN-4037: Attachment: mapred-site.xml yarn-site.xml Hi Mohammad, May be I am making a fundamental mistake. My target is to run - wordcount example jar with 2.7.1 on yarn in a psuedo distributed mode (on my laptop). When I run it I get an exception, in order to check what it is I am trying to look for logs at http://garima-pc:8088/cluster/apps/FAILED to http://garima-pc:8088/cluster/app/application_1439303739376_0001 to logs link and after implementing what you have described it takes me to a link http://garima-pc:19888/jobhistory/logs/Garima-PC:50415/container_1439303739376_0001_02_01/container_1439303739376_0001_02_01/Garima which is broken. seems like port 50415 port is getting dynamically generated. I am attaching my configuration xml files. Hadoop - failed redirect for container -- Key: YARN-4037 URL: https://issues.apache.org/jira/browse/YARN-4037 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.1 Environment: Windows 7, Apache Hadoop 2.7.1 Reporter: Gagan Attachments: mapred-site.xml, yarn-site.xml I believe this issue has been addressed earlier in https://issues.apache.org/jira/browse/YARN-1473 though I am not sure because the description of the JIRA does not talk about the following message Failed while trying to construct the redirect url to the log server. Log Server url may not be configured java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all. Could some one look at the same and provide detail on the root cause and resolution ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime
[ https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681917#comment-14681917 ] Hudson commented on YARN-3887: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2230 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2230/]) YARN-3887. Support changing Application priority during runtime. Contributed by Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java Support for changing Application priority during runtime Key: YARN-3887 URL: https://issues.apache.org/jira/browse/YARN-3887 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 0006-YARN-3887.patch After YARN-2003, adding support to change priority of an application after submission. This ticket will handle the server side implementation for same. A new RMAppEvent will be created to handle this, and will be common for all schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime
[ https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681964#comment-14681964 ] Hudson commented on YARN-3887: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2211 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2211/]) YARN-3887. Support changing Application priority during runtime. Contributed by Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java Support for changing Application priority during runtime Key: YARN-3887 URL: https://issues.apache.org/jira/browse/YARN-3887 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 0006-YARN-3887.patch After YARN-2003, adding support to change priority of an application after submission. This ticket will handle the server side implementation for same. A new RMAppEvent will be created to handle this, and will be common for all schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime
[ https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681998#comment-14681998 ] Hudson commented on YARN-3887: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #273 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/273/]) YARN-3887. Support changing Application priority during runtime. Contributed by Sunil G (jianhe: rev fa1d84ae2739a1e76f58b9c96d1378f9453cc0d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java Support for changing Application priority during runtime Key: YARN-3887 URL: https://issues.apache.org/jira/browse/YARN-3887 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3887.patch, 0002-YARN-3887.patch, 0003-YARN-3887.patch, 0004-YARN-3887.patch, 0005-YARN-3887.patch, 0006-YARN-3887.patch After YARN-2003, adding support to change priority of an application after submission. This ticket will handle the server side implementation for same. A new RMAppEvent will be created to handle this, and will be common for all schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681997#comment-14681997 ] Hudson commented on YARN-3873: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #273 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/273/]) YARN-3873. PendingApplications in LeafQueue should also use OrderingPolicy. (Sunil G via wangda) (wangda: rev cf9d3c925608e8bc650d43975382ed3014081057) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyForNodePartitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java pendingApplications in LeafQueue should also use OrderingPolicy --- Key: YARN-3873 URL: https://issues.apache.org/jira/browse/YARN-3873 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 0003-YARN-3873.patch, 0004-YARN-3873.patch, 0005-YARN-3873.patch, 0006-YARN-3873.patch Currently *pendingApplications* in LeafQueue is using {{applicationComparator}} from CapacityScheduler. This can be changed and pendingApplications can use the OrderingPolicy configured in Queue level (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682011#comment-14682011 ] Jian He commented on YARN-3999: --- I talked to [~zjshen] about this too. I think it's fine as the event processing order is not that critical. Also each timeline entity has a timestamp which itself indicates the order of the event too.IMO, this is similar to multiple containers writing to ATS at the same time. There's no guarantee that the earliest generated event gets published into ATS first. RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682044#comment-14682044 ] Junping Du commented on YARN-3212: -- Can someone give it a review? With this patch get in, the basic flow for gracefully decommission can work now. Thanks! RMNode State Transition Update with DECOMMISSIONING state - Key: YARN-3212 URL: https://issues.apache.org/jira/browse/YARN-3212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Junping Du Assignee: Junping Du Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition from “running” state triggered by a new event - “decommissioning”. This new state can be transit to state of “decommissioned” when Resource_Update if no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event (after timeout from CLI). In addition, it can back to “running” if user decides to cancel previous decommission by calling recommission on the same node. The reaction to other events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server
Sunil G created YARN-4044: - Summary: Running applications information changes such as movequeue is not published to TimeLine server Key: YARN-4044 URL: https://issues.apache.org/jira/browse/YARN-4044 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Priority: Critical SystemMetricsPublisher need to expose an appUpdated api to update any change for a running application. Events can be - change of queue for a running application. - change of application priority for a running application. This ticket intends to handle both RM and timeline side changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server
[ https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682033#comment-14682033 ] Sunil G commented on YARN-4044: --- Timeline v2 changes can be tracked in separate ticket once api changes are done. I will file a ticket under V2 umbrella jira if no issues. Running applications information changes such as movequeue is not published to TimeLine server -- Key: YARN-4044 URL: https://issues.apache.org/jira/browse/YARN-4044 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Priority: Critical SystemMetricsPublisher need to expose an appUpdated api to update any change for a running application. Events can be - change of queue for a running application. - change of application priority for a running application. This ticket intends to handle both RM and timeline side changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682082#comment-14682082 ] Sunil G commented on YARN-3212: --- Hi [~djp] I have one doubt in this. For {{StatusUpdateWhenHealthyTransition}}, if state of node is DECOMMISSIONING at init state, now we move to DECOMMISIONED directly. Cud we give a chance to move it to UNHEALTHY here , so later after some rounds we can mark as DECOMMISIONED if it cannot be revived. Your thoughts? RMNode State Transition Update with DECOMMISSIONING state - Key: YARN-3212 URL: https://issues.apache.org/jira/browse/YARN-3212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Junping Du Assignee: Junping Du Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition from “running” state triggered by a new event - “decommissioning”. This new state can be transit to state of “decommissioned” when Resource_Update if no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event (after timeout from CLI). In addition, it can back to “running” if user decides to cancel previous decommission by calling recommission on the same node. The reaction to other events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4045) Negative avaialbleMB is being reported for root queue.
Rushabh S Shah created YARN-4045: Summary: Negative avaialbleMB is being reported for root queue. Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682118#comment-14682118 ] Rushabh S Shah commented on YARN-4045: -- bq. Thanks Rushabh S Shah for reporting this. One doubt, Which ResourceCalculator is used here? Is it Dominant RC. yes. Negative avaialbleMB is being reported for root queue. -- Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:44,486 [ResourceManager Event
[jira] [Commented] (YARN-4023) Publish Application Priority to TimelineServer
[ https://issues.apache.org/jira/browse/YARN-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682077#comment-14682077 ] Rohith Sharma K S commented on YARN-4023: - +1 for the latest patch, If no objection will commit it.. Publish Application Priority to TimelineServer -- Key: YARN-4023 URL: https://issues.apache.org/jira/browse/YARN-4023 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-4023.patch, 0001-YARN-4023.patch, ApplicationPage.png, TimelineserverMainpage.png Publish Application priority details to Timeline Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3924) Submitting an application to standby ResourceManager should respond better than Connection Refused
[ https://issues.apache.org/jira/browse/YARN-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682099#comment-14682099 ] Rohith Sharma K S commented on YARN-3924: - I agree with the concern that user should be able to obtain standby exception.I am not sure whether this point was discussed when initially RM HA was designed. keeping cc:\ [~ka...@cloudera.com] [~jianhe] [~xgong] [~vinodkv] for more discussion on this. Submitting an application to standby ResourceManager should respond better than Connection Refused -- Key: YARN-3924 URL: https://issues.apache.org/jira/browse/YARN-3924 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Dustin Cote Assignee: Ajith S Priority: Minor When submitting an application directly to a standby resource manager, the resource manager responds with 'Connection Refused' rather than indicating that it is a standby resource manager. Because the resource manager is aware of its own state, I feel like we can have the 8032 port open for standby resource managers and reject the request with something like 'Cannot process application submission from this standby resource manager'. This would be especially helpful for debugging oozie problems when users put in the wrong address for the 'jobtracker' (i.e. they don't put the logical RM address but rather point to a specific resource manager). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3906) split the application table from the entity table
[ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682102#comment-14682102 ] Junping Du commented on YARN-3906: -- Thanks [~sjlee0] for the patch work and [~gtCarrera9] for review! Latest patch LGTM. However, I will wait for our decision on sequence of YARN-4025. split the application table from the entity table - Key: YARN-3906 URL: https://issues.apache.org/jira/browse/YARN-3906 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3906-YARN-2928.001.patch, YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch Per discussions on YARN-3815, we need to split the application entities from the main entity table into its own table (application). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682110#comment-14682110 ] Sunil G commented on YARN-4045: --- Thanks [~shahrs87] for reporting this. One doubt, Which ResourceCalculator is used here? Is it Dominant RC. Negative avaialbleMB is being reported for root queue. -- Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682112#comment-14682112 ] Rohith Sharma K S commented on YARN-3999: - thank [~jianhe] for the explanation. Overall patch looks good to me.. RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682115#comment-14682115 ] Thomas Graves commented on YARN-4045: - I remember seeing that this was fixed in branch-2 by some of the capacity scheduler work for labels. I thought this might be fixed by https://issues.apache.org/jira/browse/YARN-3243 but that is included. This might be fixed as part of https://issues.apache.org/jira/browse/YARN-3361 which is probably to big to backport totally. [~leftnoteasy] Do you remember this issue? Note that it also shows up in capacity scheduler UI as root queue going over 100%. I remember when I was testing YARN-3434 it wasn't occurring for me on branch-2 (2.8) and I thought it was one of the above jiras that fixed. Negative avaialbleMB is being reported for root queue. -- Key: YARN-4045 URL: https://issues.apache.org/jira/browse/YARN-4045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Rushabh S Shah We recently deployed 2.7 in one of our cluster. We are seeing negative availableMB being reported for queue=root. This is from the jmx output: {noformat} clusterMetrics ... availableMB-163328/availableMB ... /clusterMetrics {noformat} The following is the RM log: {noformat} 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 absoluteUsedCapacity=1.0029854 used=memory:5332480, vCores:6202 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 absoluteUsedCapacity=1.0032743 used=memory:5334016, vCores:6212 cluster=memory:5316608, vCores:28320 2015-08-10 14:42:43,056 [ResourceManager Event Processor]
[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM
[ https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682240#comment-14682240 ] Rohith Sharma K S commented on YARN-3979: - I had look at the RM logs shared, I strongly suspect that it is because of the same reason in YARN-3990. From the shared log, I see below logs which indicates that asyncdispatcher is overloaded with unnecessary events. May be you can use patch of YARN-3990 and test it. {noformat} 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: BJHC-HERA-18352.hadoop.jd.local:50086 Node Transitioned from RUNNING to LOST 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved BJHC-HADOOP-HERA-17280.jd.local to /rack/rack4065 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2515000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2515000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node BJHC-HADOOP-HERA-17280.jd.local(cmPort: 50086 httpPort: 8042) registered with capability: memory:57344, vCores:28, assigned nodeId BJHC-HADOOP-HERA-17280.jd.local:50086 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved BJHC-HERA-164102.hadoop.jd.local to /rack/rack41007 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node BJHC-HERA-164102.hadoop.jd.local(cmPort: 50086 httpPort: 8042) registered with capability: memory:57344, vCores:28, assigned nodeId BJHC-HERA-164102.hadoop.jd.local:50086 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2516000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2516000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node not found resyncing BJHC-HERA-18043.hadoop.jd.local:50086 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2517000 2015-07-29 01:58:27,112 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2517000 2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2518000 2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2518000 2015-07-29 01:58:27,113 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 2519000 {noformat} Am in ResourceLocalizationService hang 10 min cause RM kill AM --- Key: YARN-3979 URL: https://issues.apache.org/jira/browse/YARN-3979 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Environment: CentOS 6.5 Hadoop-2.2.0 Reporter: zhangyubiao Attachments: ERROR103.log 2015-07-27 02:46:17,348 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1437735375558 _104282_01_01 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE) 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_1437735375558_104282_0 1 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682284#comment-14682284 ] Anubhav Dhoot commented on YARN-4046: - The error in NodeManager shows {noformat} 2015-08-10 15:14:05,567 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e45_1439244348718_0001_01_01 java.io.IOException: Timeout while waiting for exit code from container_e45_1439244348718_0001_01_01 at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Looking under the debugger the actual shell command to check if container is alive fails because the kill command syntax kill -0 -20773 fails. {noformat} his = {org.apache.hadoop.util.Shell$ShellCommandExecutor@6740} kill -0 -20773 builder = {java.lang.ProcessBuilder@6789} command = {java.util.ArrayList@6813} size = 3 directory = null environment = null redirectErrorStream = false redirects = null timeOutTimer = null timeoutTimerTask = null errReader = {java.io.BufferedReader@6830} inReader = {java.io.BufferedReader@6833} errMsg = {java.lang.StringBuffer@6836} kill: invalid option -- '2'\n\nUsage:\n kill [options] pid [...]\n\nOptions:\n pid [...]send signal to every pid listed\n -signal, -s, --signal signal\n specify the signal to be sent\n -l, --list=[signal] list all signal names, or convert one to a name\n -L, --tablelist all signal names in a nice table\n\n -h, --help display this help and exit\n -V, --version output version information and exit\n\nFor more details see kill(1).\n errThread = {org.apache.hadoop.util.Shell$1@6839} Thread[Thread-102,5,] line = null exitCode = 1 completed = {java.util.concurrent.atomic.AtomicBoolean@6806} true {noformat} This causes DefaultContainerExecutor#containerIsAlive to catch ExitCodeException thrown by ShellCommandExecutor.execute making it assume the container is lost. Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682293#comment-14682293 ] Anubhav Dhoot commented on YARN-4046: - As per GNU linux [documentation|http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html#kill-invocation] -- may not be needed, but looks like all distros (Debian) do not support not having --. {noformat} If a negative pid argument is desired as the first one, it should be preceded by --. However, as a common extension to POSIX, -- is not required with ‘kill -signal -pid’. {noformat} So a fix is to prefix -- always to match the recommendation. Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4023) Publish Application Priority to TimelineServer
[ https://issues.apache.org/jira/browse/YARN-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682360#comment-14682360 ] Hadoop QA commented on YARN-4023: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 24m 12s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 57s | Site still builds. | | {color:red}-1{color} | checkstyle | 2m 38s | The applied patch generated 1 new checkstyle issues (total was 16, now 16). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 1m 53s | Tests failed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 13s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 53m 22s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 123m 59s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749303/0001-YARN-4023.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / 1fc3c77 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8823/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8823/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8823/console | This message was automatically generated. Publish Application Priority to TimelineServer -- Key: YARN-4023 URL: https://issues.apache.org/jira/browse/YARN-4023 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-4023.patch, 0001-YARN-4023.patch, ApplicationPage.png, TimelineserverMainpage.png Publish Application priority details to Timeline Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4046) NM container recovery is broken on some linux distro because of syntax of signal
Anubhav Dhoot created YARN-4046: --- Summary: NM container recovery is broken on some linux distro because of syntax of signal Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Summary: Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST (was: NM container recovery is broken on some linux distro because of syntax of signal) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-4047: Assignee: Jason Lowe In OOZIE-1729 Oozie started calling getApplications to look for applications with specific tags. This significantly increases the utilization of this method on a cluster that makes heavy use of Oozie. One quick fix for the Oozie use-case may be to swap the filter order. Rather than doing the expensive checkAccess call first, we can do all the other filtering first and finally verify the user has access before adding the app to the response. In the Oozie scenario most apps will be filtered by the tag check before we ever get to the checkAccess call. ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-4047: -- Labels: 2.6.1-candidate (was: ) ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
Jason Lowe created YARN-4047: Summary: ClientRMService getApplications has high scheduler lock contention Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4047: - Attachment: YARN-4047.001.patch Patch that performs the checkAccess filter last rather than first. ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended
[ https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682379#comment-14682379 ] Dustin Cote commented on YARN-2369: --- [~jlowe] thanks for all the input. I'll clean this latest patch up based on these comments this week. Happy to throw this in the MAPREDUCE project instead as well, since basically all the changes are in the MR client. I don't think sub JIRAs would be necessary since it's a pretty small change on the YARN side, but I leave that to the project management experts. I don't see any organizational problem keeping it all in one JIRA here. Environment variable handling assumes values should be appended --- Key: YARN-2369 URL: https://issues.apache.org/jira/browse/YARN-2369 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jason Lowe Assignee: Dustin Cote Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch, YARN-2369-4.patch, YARN-2369-5.patch, YARN-2369-6.patch When processing environment variables for a container context the code assumes that the value should be appended to any pre-existing value in the environment. This may be desired behavior for handling path-like environment variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a non-intuitive and harmful way to handle any variable that does not have path-like semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: (was: YARN-3999.5.patch) RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: YARN-3999.5.patch RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4026: - Attachment: YARN-4026.2.patch Thanks for comments [~jianhe], attached ver.2 patch. FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests Key: YARN-4026 URL: https://issues.apache.org/jira/browse/YARN-4026 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-4026.1.patch, YARN-4026.2.patch After YARN-3983, we have an extensible ContainerAllocator which can be used by FiCaSchedulerApp to decide how to allocate resources. While working on YARN-1651 (allocate resource to increase container), I found one thing in existing logic not flexible enough: - ContainerAllocator decides what to allocate for a given node and priority: To support different kinds of resource allocation, for example, priority as weight / skip priority or not, etc. It's better to let ContainerAllocator to choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2599: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Standby RM should also expose some jmx and metrics -- Key: YARN-2599 URL: https://issues.apache.org/jira/browse/YARN-2599 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Rohith Sharma K S YARN-1898 redirects jmx and metrics to the Active. As discussed there, we need to separate out metrics displayed so the Standby RM can also be monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2506: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) TimelineClient should NOT be in yarn-common project --- Key: YARN-2506 URL: https://issues.apache.org/jira/browse/YARN-2506 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Priority: Critical YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2037) Add restart support for Unmanaged AMs
[ https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2037: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Add restart support for Unmanaged AMs - Key: YARN-2037 URL: https://issues.apache.org/jira/browse/YARN-2037 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla It would be nice to allow Unmanaged AMs also to restart in a work-preserving way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Attachment: YARN-4046.002.patch Fixed whitespace Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Attachments: YARN-4046.002.patch, YARN-4046.002.patch, YARN-4096.001.patch On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error. The attempts are not retried. {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: YARN-3999-branch-2.7.patch upload branch-2.7 patch RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3978: -- Labels: 2.6.1-candidate (was: ) Configurably turn off the saving of container info in Generic AHS - Key: YARN-3978 URL: https://issues.apache.org/jira/browse/YARN-3978 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver, yarn Affects Versions: 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Labels: 2.6.1-candidate Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3978.001.patch, YARN-3978.002.patch, YARN-3978.003.patch, YARN-3978.004.patch Depending on how each application's metadata is stored, one week's worth of data stored in the Generic Application History Server's database can grow to be almost a terabyte of local disk space. In order to alleviate this, I suggest that there is a need for a configuration option to turn off saving of non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692916#comment-14692916 ] Sangjin Lee commented on YARN-2859: --- [~zjshen], can this be done for 2.6.1, or are you OK with deferring it to 2.6.2? ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster -- Key: YARN-2859 URL: https://issues.apache.org/jira/browse/YARN-2859 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah Assignee: Zhijie Shen Priority: Critical Labels: 2.6.1-candidate In mini cluster, a random port should be used. Also, the config is not updated to the host that the process got bound to. {code} 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer address: localhost:10200 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer web address: 0.0.0.0:8188 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692511#comment-14692511 ] Vrushali C commented on YARN-4025: -- Yes, +1 Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Vrushali C Attachments: YARN-4025-YARN-2928.001.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1848) Persist ClusterMetrics across RM HA transitions
[ https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1848: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Persist ClusterMetrics across RM HA transitions --- Key: YARN-1848 URL: https://issues.apache.org/jira/browse/YARN-1848 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Post YARN-1705, ClusterMetrics are reset on transition to standby. This is acceptable as the metrics show statistics since an RM has become active. Users might want to see metrics since the cluster was ever started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2014: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 Key: YARN-2014 URL: https://issues.apache.org/jira/browse/YARN-2014 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: patrick white Assignee: Jason Lowe Performance comparison benchmarks from 2.x against 0.23 shows AM scalability benchmark's runtime is approximately 10% slower in 2.4.0. The trend is consistent across later releases in both lines, latest release numbers are: 2.4.0.0 runtime 255.6 seconds (avg 5 passes) 0.23.9.12 runtime 230.4 seconds (avg 5 passes) Diff: -9.9% AM Scalability test is essentially a sleep job that measures time to launch and complete a large number of mappers. The diff is consistent and has been reproduced in both a larger (350 node, 100,000 mappers) perf environment, as well as a small (10 node, 2,900 mappers) demo cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times
[ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2055: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times -- Key: YARN-2055 URL: https://issues.apache.org/jira/browse/YARN-2055 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal If Queue A does not have enough capacity to run AM, then AM will borrow capacity from queue B to run AM in that case AM will be killed if queue B will reclaim its capacity and again AM will be launched and killed again, in that case job will be failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues
[ https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2457: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) FairScheduler: Handle preemption to help starved parent queues -- Key: YARN-2457 URL: https://issues.apache.org/jira/browse/YARN-2457 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't check for parent queue starvation. We need to check that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1856: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) cgroups based memory monitoring for containers -- Key: YARN-1856 URL: https://issues.apache.org/jira/browse/YARN-1856 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Varun Vasudev -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3906) split the application table from the entity table
[ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692476#comment-14692476 ] Junping Du commented on YARN-3906: -- Ok. Committing this patch now. split the application table from the entity table - Key: YARN-3906 URL: https://issues.apache.org/jira/browse/YARN-3906 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3906-YARN-2928.001.patch, YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch Per discussions on YARN-3815, we need to split the application entities from the main entity table into its own table (application). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692632#comment-14692632 ] Inigo Goiri commented on YARN-313: -- Not critical, I think it can be deferred. I would appreciate ideas on why this change breaks the refreshNodes with a graceful period. Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2038: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Revisit how AMs learn of containers from previous attempts -- Key: YARN-2038 URL: https://issues.apache.org/jira/browse/YARN-2038 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Based on YARN-556, we need to update the way AMs learn about containers allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692843#comment-14692843 ] Hadoop QA commented on YARN-313: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 19s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 56s | The applied patch generated 4 new checkstyle issues (total was 229, now 232). | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 58s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 2m 0s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 53m 35s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 111m 5s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.cli.TestRMAdminCLI | | | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749993/YARN-313-v7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ae716f | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8828/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8828/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8828/console | This message was automatically generated. Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1480: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1681) When banned.users is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message
[ https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1681: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) When banned.users is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message --- Key: YARN-1681 URL: https://issues.apache.org/jira/browse/YARN-1681 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Zhichun Wu Assignee: Zhichun Wu Labels: container, usability Attachments: YARN-1681.patch When using LCE in a secure setup, if banned.users is not set in container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS (mapred, hdfs, bin, 0) will receive unclear error message. for example, if we use hdfs to submit a mr job, we may see the following the yarn app overview page: {code} appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: Application application_1391353981633_0003 initialization failed (exitCode=139) with output: {code} while the prefer error message may look like: {code} appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: Application application_1391353981633_0003 initialization failed (exitCode=139) with output: Requested user hdfs is banned {code} just a minor bug and I would like to start contributing to hadoop-common with it:) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons
[ https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1767: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Windows: Allow a way for users to augment classpath of YARN daemons --- Key: YARN-1767 URL: https://issues.apache.org/jira/browse/YARN-1767 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Karthik Kambatla YARN-1429 adds a way to augment the classpath for *nix-based systems. Need something similar for Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692518#comment-14692518 ] Hadoop QA commented on YARN-4046: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 9s | The applied patch generated 3 new checkstyle issues (total was 97, now 99). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 57s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 42s | Tests failed in hadoop-common. | | | | 63m 5s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749949/YARN-4096.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7c796fd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8825/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8825/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8825/console | This message was automatically generated. Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Attachments: YARN-4096.001.patch On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error. The attempts are not retried. {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4026: - Attachment: YARN-4026.3.patch Attached ver.3, added more comments and fixed findbugs warning. FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests Key: YARN-4026 URL: https://issues.apache.org/jira/browse/YARN-4026 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-4026.1.patch, YARN-4026.2.patch, YARN-4026.3.patch After YARN-3983, we have an extensible ContainerAllocator which can be used by FiCaSchedulerApp to decide how to allocate resources. While working on YARN-1651 (allocate resource to increase container), I found one thing in existing logic not flexible enough: - ContainerAllocator decides what to allocate for a given node and priority: To support different kinds of resource allocation, for example, priority as weight / skip priority or not, etc. It's better to let ContainerAllocator to choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692468#comment-14692468 ] Sangjin Lee commented on YARN-4025: --- For the record, we will go ahead with YARN-3906 first. We'll need to update this patch to reflect the changes in YARN-3906. I'll work with [~vrushalic] on that. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Vrushali C Attachments: YARN-4025-YARN-2928.001.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3906) split the application table from the entity table
[ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692465#comment-14692465 ] Sangjin Lee commented on YARN-3906: --- I checked with [~vrushalic], and we decided to put the patch for this JIRA (YARN-3906) first. split the application table from the entity table - Key: YARN-3906 URL: https://issues.apache.org/jira/browse/YARN-3906 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3906-YARN-2928.001.patch, YARN-3906-YARN-2928.002.patch, YARN-3906-YARN-2928.003.patch, YARN-3906-YARN-2928.004.patch, YARN-3906-YARN-2928.005.patch, YARN-3906-YARN-2928.006.patch, YARN-3906-YARN-2928.007.patch Per discussions on YARN-3815, we need to split the application entities from the main entity table into its own table (application). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692537#comment-14692537 ] Sangjin Lee commented on YARN-2014: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 Key: YARN-2014 URL: https://issues.apache.org/jira/browse/YARN-2014 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: patrick white Assignee: Jason Lowe Performance comparison benchmarks from 2.x against 0.23 shows AM scalability benchmark's runtime is approximately 10% slower in 2.4.0. The trend is consistent across later releases in both lines, latest release numbers are: 2.4.0.0 runtime 255.6 seconds (avg 5 passes) 0.23.9.12 runtime 230.4 seconds (avg 5 passes) Diff: -9.9% AM Scalability test is essentially a sleep job that measures time to launch and complete a large number of mappers. The diff is consistent and has been reproduced in both a larger (350 node, 100,000 mappers) perf environment, as well as a small (10 node, 2,900 mappers) demo cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1848) Persist ClusterMetrics across RM HA transitions
[ https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692541#comment-14692541 ] Sangjin Lee commented on YARN-1848: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Persist ClusterMetrics across RM HA transitions --- Key: YARN-1848 URL: https://issues.apache.org/jira/browse/YARN-1848 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Post YARN-1705, ClusterMetrics are reset on transition to standby. This is acceptable as the metrics show statistics since an RM has become active. Users might want to see metrics since the cluster was ever started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692538#comment-14692538 ] Sangjin Lee commented on YARN-1856: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. cgroups based memory monitoring for containers -- Key: YARN-1856 URL: https://issues.apache.org/jira/browse/YARN-1856 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Varun Vasudev -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692545#comment-14692545 ] Sangjin Lee commented on YARN-1480: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692546#comment-14692546 ] Sangjin Lee commented on YARN-313: -- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons
[ https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692543#comment-14692543 ] Sangjin Lee commented on YARN-1767: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Windows: Allow a way for users to augment classpath of YARN daemons --- Key: YARN-1767 URL: https://issues.apache.org/jira/browse/YARN-1767 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Karthik Kambatla YARN-1429 adds a way to augment the classpath for *nix-based systems. Need something similar for Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1480: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, YARN-1480-5.patch, YARN-1480-6.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2506: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! TimelineClient should NOT be in yarn-common project --- Key: YARN-2506 URL: https://issues.apache.org/jira/browse/YARN-2506 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Priority: Critical YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2599: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Standby RM should also expose some jmx and metrics -- Key: YARN-2599 URL: https://issues.apache.org/jira/browse/YARN-2599 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Rohith Sharma K S YARN-1898 redirects jmx and metrics to the Active. As discussed there, we need to separate out metrics displayed so the Standby RM can also be monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)