[jira] [Updated] (YARN-3478) FairScheduler page not performed because different enum of YarnApplicationState and RMAppState
[ https://issues.apache.org/jira/browse/YARN-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3478: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! FairScheduler page not performed because different enum of YarnApplicationState and RMAppState --- Key: YARN-3478 URL: https://issues.apache.org/jira/browse/YARN-3478 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Xu Chen Attachments: YARN-3478.1.patch, YARN-3478.2.patch, YARN-3478.3.patch, screenshot-1.png Got exception from log java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:96) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1225) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.DynamicUserWebFilter$DynamicUserFilter.doFilter(DynamicUserWebFilter.java:59) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at
[jira] [Updated] (YARN-2076) Minor error in TestLeafQueue files
[ https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2076: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Minor error in TestLeafQueue files -- Key: YARN-2076 URL: https://issues.apache.org/jira/browse/YARN-2076 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Chen He Assignee: Chen He Priority: Minor Labels: test Attachments: YARN-2076.patch numNodes should be 2 instead of 3 in testReservationExchange() since only two nodes are defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1767) Windows: Allow a way for users to augment classpath of YARN daemons
[ https://issues.apache.org/jira/browse/YARN-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1767: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Windows: Allow a way for users to augment classpath of YARN daemons --- Key: YARN-1767 URL: https://issues.apache.org/jira/browse/YARN-1767 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Karthik Kambatla YARN-1429 adds a way to augment the classpath for *nix-based systems. Need something similar for Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2859: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster -- Key: YARN-2859 URL: https://issues.apache.org/jira/browse/YARN-2859 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah Assignee: Zhijie Shen Priority: Critical Labels: 2.6.1-candidate In mini cluster, a random port should be used. Also, the config is not updated to the host that the process got bound to. {code} 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer address: localhost:10200 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer web address: 0.0.0.0:8188 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID
[ https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2746: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! YARNDelegationTokenID misses serializing version from the common abstract ID Key: YARN-2746 URL: https://issues.apache.org/jira/browse/YARN-2746 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Vinod Kumar Vavilapalli Assignee: Jian He I found this during review of YARN-2743. bq. AbstractDTId had a version, we dropped that in the protobuf serialization. We should just write it during the serialization and read it back? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1681) When banned.users is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message
[ https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1681: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! When banned.users is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message --- Key: YARN-1681 URL: https://issues.apache.org/jira/browse/YARN-1681 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Reporter: Zhichun Wu Assignee: Zhichun Wu Labels: container, usability Attachments: YARN-1681.patch When using LCE in a secure setup, if banned.users is not set in container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS (mapred, hdfs, bin, 0) will receive unclear error message. for example, if we use hdfs to submit a mr job, we may see the following the yarn app overview page: {code} appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: Application application_1391353981633_0003 initialization failed (exitCode=139) with output: {code} while the prefer error message may look like: {code} appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: Application application_1391353981633_0003 initialization failed (exitCode=139) with output: Requested user hdfs is banned {code} just a minor bug and I would like to start contributing to hadoop-common with it:) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1856: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! cgroups based memory monitoring for containers -- Key: YARN-1856 URL: https://issues.apache.org/jira/browse/YARN-1856 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Varun Vasudev -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service
[ https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2657: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! MiniYARNCluster to (optionally) add MicroZookeeper service -- Key: YARN-2657 URL: https://issues.apache.org/jira/browse/YARN-2657 Project: Hadoop YARN Issue Type: Sub-task Components: test Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2567-001.patch, YARN-2657-002.patch This is needed for testing things like YARN-2646: add an option for the {{MiniYarnCluster}} to start a {{MicroZookeeperService}}. This is just another YARN service to create and track the lifecycle. The {{MicroZookeeperService}} publishes its binding information for direct takeup by the registry services...this can address in-VM race conditions. The default setting for this service is off -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2014: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 Key: YARN-2014 URL: https://issues.apache.org/jira/browse/YARN-2014 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: patrick white Assignee: Jason Lowe Performance comparison benchmarks from 2.x against 0.23 shows AM scalability benchmark's runtime is approximately 10% slower in 2.4.0. The trend is consistent across later releases in both lines, latest release numbers are: 2.4.0.0 runtime 255.6 seconds (avg 5 passes) 0.23.9.12 runtime 230.4 seconds (avg 5 passes) Diff: -9.9% AM Scalability test is essentially a sleep job that measures time to launch and complete a large number of mappers. The diff is consistent and has been reproduced in both a larger (350 node, 100,000 mappers) perf environment, as well as a small (10 node, 2,900 mappers) demo cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-313: - Attachment: YARN-313-v7.patch Updated to trunk. It looks like it still breaks the unit test for the graceful refresh but I cannot figure out why. Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times
[ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2055: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Preemption: Jobs are failing due to AMs are getting launched and killed multiple times -- Key: YARN-2055 URL: https://issues.apache.org/jira/browse/YARN-2055 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal If Queue A does not have enough capacity to run AM, then AM will borrow capacity from queue B to run AM in that case AM will be killed if queue B will reclaim its capacity and again AM will be launched and killed again, in that case job will be failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1848) Persist ClusterMetrics across RM HA transitions
[ https://issues.apache.org/jira/browse/YARN-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-1848: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Persist ClusterMetrics across RM HA transitions --- Key: YARN-1848 URL: https://issues.apache.org/jira/browse/YARN-1848 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Post YARN-1705, ClusterMetrics are reset on transition to standby. This is acceptable as the metrics show statistics since an RM has become active. Users might want to see metrics since the cluster was ever started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2037) Add restart support for Unmanaged AMs
[ https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2037: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Add restart support for Unmanaged AMs - Key: YARN-2037 URL: https://issues.apache.org/jira/browse/YARN-2037 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla It would be nice to allow Unmanaged AMs also to restart in a work-preserving way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2457) FairScheduler: Handle preemption to help starved parent queues
[ https://issues.apache.org/jira/browse/YARN-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2457: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! FairScheduler: Handle preemption to help starved parent queues -- Key: YARN-2457 URL: https://issues.apache.org/jira/browse/YARN-2457 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN-2395/YARN-2394 add preemption timeout and threshold per queue, but don't check for parent queue starvation. We need to check that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-313: - Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2038: -- Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks! Revisit how AMs learn of containers from previous attempts -- Key: YARN-2038 URL: https://issues.apache.org/jira/browse/YARN-2038 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Based on YARN-556, we need to update the way AMs learn about containers allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files
[ https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692605#comment-14692605 ] Chen He commented on YARN-2076: --- I will update patch. Minor error in TestLeafQueue files -- Key: YARN-2076 URL: https://issues.apache.org/jira/browse/YARN-2076 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Chen He Assignee: Chen He Priority: Minor Labels: test Attachments: YARN-2076.patch numNodes should be 2 instead of 3 in testReservationExchange() since only two nodes are defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692901#comment-14692901 ] Hudson commented on YARN-3999: -- FAILURE: Integrated in Hadoop-trunk-Commit #8286 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8286/]) YARN-3999. RM hangs on draing events. Contributed by Jian He (xgong: rev 3ae716fa696b87e849dae40225dc59fb5ed114cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/logaggregationstatus/TestRMAppLogAggregationStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.7.2 Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2038) Revisit how AMs learn of containers from previous attempts
[ https://issues.apache.org/jira/browse/YARN-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692533#comment-14692533 ] Sangjin Lee commented on YARN-2038: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Revisit how AMs learn of containers from previous attempts -- Key: YARN-2038 URL: https://issues.apache.org/jira/browse/YARN-2038 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Based on YARN-556, we need to update the way AMs learn about containers allocation previous attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692527#comment-14692527 ] Sangjin Lee commented on YARN-2506: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. TimelineClient should NOT be in yarn-common project --- Key: YARN-2506 URL: https://issues.apache.org/jira/browse/YARN-2506 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Priority: Critical YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2076) Minor error in TestLeafQueue files
[ https://issues.apache.org/jira/browse/YARN-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692529#comment-14692529 ] Sangjin Lee commented on YARN-2076: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Minor error in TestLeafQueue files -- Key: YARN-2076 URL: https://issues.apache.org/jira/browse/YARN-2076 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Chen He Assignee: Chen He Priority: Minor Labels: test Attachments: YARN-2076.patch numNodes should be 2 instead of 3 in testReservationExchange() since only two nodes are defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2037) Add restart support for Unmanaged AMs
[ https://issues.apache.org/jira/browse/YARN-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692534#comment-14692534 ] Sangjin Lee commented on YARN-2037: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Add restart support for Unmanaged AMs - Key: YARN-2037 URL: https://issues.apache.org/jira/browse/YARN-2037 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla It would be nice to allow Unmanaged AMs also to restart in a work-preserving way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times
[ https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692532#comment-14692532 ] Sangjin Lee commented on YARN-2055: --- Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know. Preemption: Jobs are failing due to AMs are getting launched and killed multiple times -- Key: YARN-2055 URL: https://issues.apache.org/jira/browse/YARN-2055 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal If Queue A does not have enough capacity to run AM, then AM will borrow capacity from queue B to run AM in that case AM will be killed if queue B will reclaim its capacity and again AM will be launched and killed again, in that case job will be failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4046) Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
[ https://issues.apache.org/jira/browse/YARN-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4046: Attachment: YARN-4046.002.patch Fixed a new checkstyle that was added, the other two are preexisting and should not be fixed. Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST Key: YARN-4046 URL: https://issues.apache.org/jira/browse/YARN-4046 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Attachments: YARN-4046.002.patch, YARN-4096.001.patch On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart. The application will fail with error. The attempts are not retried. {noformat} Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_01 exited with exitCode: 154 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-313: - Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) Add Admin API for supporting node resource configuration in command line Key: YARN-313 URL: https://issues.apache.org/jira/browse/YARN-313 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: YARN-313-sample.patch, YARN-313-v1.patch, YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch We should provide some admin interface, e.g. yarn rmadmin -refreshResources to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4026) FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests
[ https://issues.apache.org/jira/browse/YARN-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692827#comment-14692827 ] Hadoop QA commented on YARN-4026: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 18s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 17s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 54s | The applied patch generated 3 new checkstyle issues (total was 128, now 128). | | {color:red}-1{color} | whitespace | 0m 5s | The patch has 30 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 34s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 47s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 94m 41s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMAdminService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750004/YARN-4026.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7c796fd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8827/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8827/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8827/console | This message was automatically generated. FiCaSchedulerApp: ContainerAllocator should be able to choose how to order pending resource requests Key: YARN-4026 URL: https://issues.apache.org/jira/browse/YARN-4026 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-4026.1.patch, YARN-4026.2.patch, YARN-4026.3.patch After YARN-3983, we have an extensible ContainerAllocator which can be used by FiCaSchedulerApp to decide how to allocate resources. While working on YARN-1651 (allocate resource to increase container), I found one thing in existing logic not flexible enough: - ContainerAllocator decides what to allocate for a given node and priority: To support different kinds of resource allocation, for example, priority as weight / skip priority or not, etc. It's better to let ContainerAllocator to choose how to order pending resource requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service
[ https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2657: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) MiniYARNCluster to (optionally) add MicroZookeeper service -- Key: YARN-2657 URL: https://issues.apache.org/jira/browse/YARN-2657 Project: Hadoop YARN Issue Type: Sub-task Components: test Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2567-001.patch, YARN-2657-002.patch This is needed for testing things like YARN-2646: add an option for the {{MiniYarnCluster}} to start a {{MicroZookeeperService}}. This is just another YARN service to create and track the lifecycle. The {{MicroZookeeperService}} publishes its binding information for direct takeup by the registry services...this can address in-VM race conditions. The default setting for this service is off -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2746) YARNDelegationTokenID misses serializing version from the common abstract ID
[ https://issues.apache.org/jira/browse/YARN-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2746: -- Target Version/s: 2.7.2, 2.6.2 (was: 2.6.1, 2.7.2) YARNDelegationTokenID misses serializing version from the common abstract ID Key: YARN-2746 URL: https://issues.apache.org/jira/browse/YARN-2746 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Vinod Kumar Vavilapalli Assignee: Jian He I found this during review of YARN-2743. bq. AbstractDTId had a version, we dropped that in the protobuf serialization. We should just write it during the serialization and read it back? -- This message was sent by Atlassian JIRA (v6.3.4#6332)