[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327117#comment-14327117 ] Sunil G commented on YARN-2004: --- Thank you [~jlowe] and [~leftnoteasy] for the input. Yes, there are alternate ways we can achieve scenario 1. Also for scenario 2, YARN-2009 will help. Hence this JIRA can now currently focus on the basic priority addition to Schedulers. bq.Priority is only considered if both applications have a priority that was set. If a set of priorities is loaded to RM and one is chosen as Default priority for a queue, it can be any priority from lowest to highest. So All the applications running w/o priority will be given as this default priority. Hence some lower priority application will end up with lower preference than an application running w/o priority. But this is also a perception from user. If user can consider that all applications running w/o priority will fall to default chosen one per queue , then the behavior will be as expected. On that note, I also feel that i can consider all applications running w/o priority will be of Default priority. [~jlowe] Pls share your thoughts w.r.t the above scenario. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328071#comment-14328071 ] Jason Lowe commented on YARN-1963: -- I'd like to see changing app priorities addressed as it is a common ask from users. In many cases jobs are submitted to the cluster via some workflow/pipeline, and they would like to change the priority of apps already submitted. Otherwise they have to update their workflow/pipeline to change the submit-time priority, kill the active jobs, and resubmit the apps for the priority to take effect. Then eventually they need to change it all back to normal priorities later. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage priority labels
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327981#comment-14327981 ] Wangda Tan commented on YARN-2693: -- [~sunilg], After thinking about this, I feel like maybe this part is not required before adding major functionalities. I found existing implementation of priority label manager is very similar to node label manager, but they're two different use cases. In node label manager, each node can be assigned labels, there're lots of mappings in the cluster. However, priority labels will be much simpler, less than 2 dozens of text-based priority labels should satisfy most use cases, and priority labels will not likely to be changed frequently. So what I suggest now is making a simple configuration-based labels first, if RM HA need to be supported, admin can put same priority-label configuration item to different RM nodes : now we don't have a centralized configuration for Hadoop daemon, we assume different RM nodes should have same yarn-site.xml setting. After major functionality completed (saying RM / scheduler / API / Client side), more time could be spent on this part :). Ideas? Priority Label Manager in RM to manage priority labels -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327987#comment-14327987 ] Sangjin Lee commented on YARN-3166: --- [~gtCarrera9], sorry for my delayed response. It looks good mostly. I have some feedback: - As [~zjshen] mentioned, we need to sort out the RM/NM dependency on the timelineservice module. The NM dependency is more of a fluke, but we need to think about the RM dependency because it needs to start its own aggregator service. I believe [~Naganarasimha] mentioned this in another JIRA. Perhaps this is unavoidable if RM is going to start the aggregator? I am not aware of any clean pluggable service mechanism for RM (like the aux services for NM). Another idea if we don't want that is to move the base aggregator class into yarn-server-common. - I think as a rule, it would be good to make sure not to disturb the old ATS classes. IIUC we're deprecating the old ATS classes, but we're not going to modify them in an incompatible way (e.g. moving classes, removing classes, changing interfaces, etc.), as that would be extremely disruptive once this is merged. - What is the difference between TimelineStorage and TimelineStorageImpl? [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327995#comment-14327995 ] Wangda Tan commented on YARN-1963: -- One more question: I didn't see there's an API proposed to update app priority, I think it may be very useful when a job ran for some time, and need get completed as soon as we can. Is this a valid use case that we need to do within YARN-1963 scope? Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327319#comment-14327319 ] Sunil G commented on YARN-3197: --- Yes. Remark from [~rohithsharma] make sense. I also came across scenarios where NM was slightly delayed in reporting its status, and application completed in mean time. Lots of this log will be printed on that time. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327243#comment-14327243 ] Sunil G commented on YARN-1963: --- Thank you [~devaraj.k] for input I have updated the subjiras and uploaded patch by considering integer rather than label names. As mentioned, we can have the enums supported from MR side (can try using enums). But a translation table is needed for same and its better keep the same YarnClient side. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling
[ https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327274#comment-14327274 ] Hudson commented on YARN-1615: -- FAILURE: Integrated in Hadoop-Yarn-trunk #843 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/843/]) YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java Fix typos in description about delay scheduling --- Key: YARN-1615 URL: https://issues.apache.org/jira/browse/YARN-1615 Project: Hadoop YARN Issue Type: Bug Components: documentation, scheduler Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-1615-002.patch, YARN-1615.patch In FSAppAttempt.java there're 4 typos: {code} * containers over rack-local or off-switch containers. To acheive this * we first only allow node-local assigments for a given prioirty level, * then relax the locality threshold once we've had a long enough period * without succesfully scheduling. We measure both the number of missed {code} They should be fixed as follows: {code} * containers over rack-local or off-switch containers. To achieve this * we first only allow node-local assignments for a given priority level, * then relax the locality threshold once we've had a long enough period * without successfully scheduling. We measure both the number of missed {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated
[ https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327279#comment-14327279 ] Hudson commented on YARN-3132: -- FAILURE: Integrated in Hadoop-Yarn-trunk #843 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/843/]) YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev f5da5566d9c392a5df71a2dce4c2d0d50eea51ee) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/CHANGES.txt RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated --- Key: YARN-3132 URL: https://issues.apache.org/jira/browse/YARN-3132 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3132.1.patch Using an example to explain: 1) Admin specify host1 has label=x 2) node=host1:123 registered 3) Get node-to-label mapping, return host1/host1:123 4) node=host1:123 unregistered 5) Get node-to-label mapping, still returns host1:123 Probably we should remove host1:123 when it becomes deactivated and no directly label assigned to it (directly assign means admin specify host1:123 has x instead of host1 has x). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327281#comment-14327281 ] Hudson commented on YARN-1514: -- FAILURE: Integrated in Hadoop-Yarn-trunk #843 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/843/]) YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-project/pom.xml Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated
[ https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327295#comment-14327295 ] Hudson commented on YARN-3132: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #109 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/109/]) YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev f5da5566d9c392a5df71a2dce4c2d0d50eea51ee) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated --- Key: YARN-3132 URL: https://issues.apache.org/jira/browse/YARN-3132 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3132.1.patch Using an example to explain: 1) Admin specify host1 has label=x 2) node=host1:123 registered 3) Get node-to-label mapping, return host1/host1:123 4) node=host1:123 unregistered 5) Get node-to-label mapping, still returns host1:123 Probably we should remove host1:123 when it becomes deactivated and no directly label assigned to it (directly assign means admin specify host1:123 has x instead of host1 has x). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327297#comment-14327297 ] Hudson commented on YARN-1514: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #109 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/109/]) YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java * hadoop-project/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling
[ https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327290#comment-14327290 ] Hudson commented on YARN-1615: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #109 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/109/]) YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java Fix typos in description about delay scheduling --- Key: YARN-1615 URL: https://issues.apache.org/jira/browse/YARN-1615 Project: Hadoop YARN Issue Type: Bug Components: documentation, scheduler Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-1615-002.patch, YARN-1615.patch In FSAppAttempt.java there're 4 typos: {code} * containers over rack-local or off-switch containers. To acheive this * we first only allow node-local assigments for a given prioirty level, * then relax the locality threshold once we've had a long enough period * without succesfully scheduling. We measure both the number of missed {code} They should be fixed as follows: {code} * containers over rack-local or off-switch containers. To achieve this * we first only allow node-local assignments for a given priority level, * then relax the locality threshold once we've had a long enough period * without successfully scheduling. We measure both the number of missed {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327196#comment-14327196 ] Devaraj K commented on YARN-1963: - I would also agree for numbers rather than labels for not to make it more complex. If we are moving with numbers, I think we can just use the existing priority API from ApplicationSubmissionContext.setPriority(Priority priority) and not required any new API's to expose to clients. We may need to think for M/R Job priory case, M/R Job supports enums for priority (i.e. VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW) and we need to have some mechanism to map these enums to priority numbers. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327164#comment-14327164 ] Devaraj K commented on YARN-3087: - Thanks [~gtCarrera9] for the link. The patch is trying to move the static member 'pipeline' to instance level but still there are other places accessing the static members have not fixed which I mentioned in the above comment. Patch owner also says they are experiencing other issues with the same patch, probably it could be due to the other static references. [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Devaraj K This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy
[ https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327210#comment-14327210 ] Brahma Reddy Battula commented on YARN-3217: Manually I executed test cases, all are passing .. From Jenkins also all are passing Please check following for same {noformat}All Tests Test name Duration Status{noformat} {noformat} testWebAppProxyServerMainMethod1.4 sec Passed{noformat} {noformat}testWebAppProxyServlet 0.63 sec Passed{noformat} {{TestWebAppProxyServer}} not executed in jenkins Remove httpclient dependency from hadoop-yarn-server-web-proxy -- Key: YARN-3217 URL: https://issues.apache.org/jira/browse/YARN-3217 Project: Hadoop YARN Issue Type: Task Reporter: Akira AJISAKA Assignee: Brahma Reddy Battula Attachments: YARN-3217.patch Sub-task of HADOOP-10105. Remove httpclient dependency from WebAppProxyServlet.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327610#comment-14327610 ] Junping Du commented on YARN-914: - Break down this feature into sub-JIRAs. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission of NodeManager (v2).pdf, GracefullyDecommissionofNodeManagerv3.pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-914: Summary: (Umbrella) Support graceful decommission of nodemanager (was: Support graceful decommission of nodemanager) (Umbrella) Support graceful decommission of nodemanager --- Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission of NodeManager (v2).pdf, GracefullyDecommissionofNodeManagerv3.pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated
[ https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327620#comment-14327620 ] Hudson commented on YARN-3132: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2060 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2060/]) YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev f5da5566d9c392a5df71a2dce4c2d0d50eea51ee) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated --- Key: YARN-3132 URL: https://issues.apache.org/jira/browse/YARN-3132 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3132.1.patch Using an example to explain: 1) Admin specify host1 has label=x 2) node=host1:123 registered 3) Get node-to-label mapping, return host1/host1:123 4) node=host1:123 unregistered 5) Get node-to-label mapping, still returns host1:123 Probably we should remove host1:123 when it becomes deactivated and no directly label assigned to it (directly assign means admin specify host1:123 has x instead of host1 has x). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327622#comment-14327622 ] Hudson commented on YARN-1514: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2060 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2060/]) YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-project/pom.xml Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling
[ https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327615#comment-14327615 ] Hudson commented on YARN-1615: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2060 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2060/]) YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/CHANGES.txt Fix typos in description about delay scheduling --- Key: YARN-1615 URL: https://issues.apache.org/jira/browse/YARN-1615 Project: Hadoop YARN Issue Type: Bug Components: documentation, scheduler Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-1615-002.patch, YARN-1615.patch In FSAppAttempt.java there're 4 typos: {code} * containers over rack-local or off-switch containers. To acheive this * we first only allow node-local assigments for a given prioirty level, * then relax the locality threshold once we've had a long enough period * without succesfully scheduling. We measure both the number of missed {code} They should be fixed as follows: {code} * containers over rack-local or off-switch containers. To achieve this * we first only allow node-local assignments for a given priority level, * then relax the locality threshold once we've had a long enough period * without successfully scheduling. We measure both the number of missed {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3194) After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node
[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327630#comment-14327630 ] Jason Lowe commented on YARN-3194: -- +1 lgtm. Will commit this tomorrow if there are no further comments. After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node Key: YARN-3194 URL: https://issues.apache.org/jira/browse/YARN-3194 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: NM restart is enabled Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch On NM restart ,NM sends all the outstanding NMContainerStatus to RM during registration. The registration can be treated by RM as New node or Reconnecting node. RM triggers corresponding event on the basis of node added or node reconnected state. # Node added event : Again here 2 scenario's can occur ## New node is registering with different ip:port – NOT A PROBLEM ## Old node is re-registering because of RESYNC command from RM when RM restart – NOT A PROBLEM # Node reconnected event : ## Existing node is re-registering i.e RM treat it as reconnecting node when RM is not restarted ### NM RESTART NOT Enabled – NOT A PROBLEM ### NM RESTART is Enabled Some applications are running on this node – *Problem is here* Zero applications are running on this node – NOT A PROBLEM Since NMContainerStatus are not handled, RM never get to know about completedContainer and never release resource held be containers. RM will not allocate new containers for pending resource request as long as the completedContainer event is triggered. This results in applications to wait indefinitly because of pending containers are not served by RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327650#comment-14327650 ] Sunil G commented on YARN-2004: --- Yes [~jlowe]. Agreeing to your point. As of now, I have given a configuration to specify default priority in a queue. That can be applied for those applications which are submitted w/o priority. A cluster wide config also will be added, and given a queue level config, it can override customer wide default value. I will update patch as per this understanding. Thank you. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3204: --- Attachment: YARN-3204-001.patch Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) - Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3204-001.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated YARN-3195: --- Attachment: (was: YARN-3195.patch) [YARN]Missing uniformity In Yarn Queue CLI command --- Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Environment: SUSE Linux SP3 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3226) UI changes for decommissioning node
Junping Du created YARN-3226: Summary: UI changes for decommissioning node Key: YARN-3226 URL: https://issues.apache.org/jira/browse/YARN-3226 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Some initial thought is: decommissioning nodes should still show up in the active nodes list since they are still running containers. A separate decommissioning tab to filter for those nodes would be nice, although I suppose users can also just use the jquery table to sort/search for nodes in that state from the active nodes list if it's too crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs like the reboot state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned YARN-3225: --- Assignee: Devaraj K New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327941#comment-14327941 ] Devaraj K commented on YARN-3225: - What would be the timeout units here, are we thinking of any constrained range for timeout value? Thanks New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3194) After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node
[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327937#comment-14327937 ] Jian He commented on YARN-3194: --- lgtm too After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node Key: YARN-3194 URL: https://issues.apache.org/jira/browse/YARN-3194 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Environment: NM restart is enabled Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch On NM restart ,NM sends all the outstanding NMContainerStatus to RM during registration. The registration can be treated by RM as New node or Reconnecting node. RM triggers corresponding event on the basis of node added or node reconnected state. # Node added event : Again here 2 scenario's can occur ## New node is registering with different ip:port – NOT A PROBLEM ## Old node is re-registering because of RESYNC command from RM when RM restart – NOT A PROBLEM # Node reconnected event : ## Existing node is re-registering i.e RM treat it as reconnecting node when RM is not restarted ### NM RESTART NOT Enabled – NOT A PROBLEM ### NM RESTART is Enabled Some applications are running on this node – *Problem is here* Zero applications are running on this node – NOT A PROBLEM Since NMContainerStatus are not handled, RM never get to know about completedContainer and never release resource held be containers. RM will not allocate new containers for pending resource request as long as the completedContainer event is triggered. This results in applications to wait indefinitly because of pending containers are not served by RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3076: - Summary: Add API/Implementation to YarnClient to retrieve label-to-node mapping (was: YarnClient implementation to retrieve label to node mapping) Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3229) Incorrect processing of container as LOST on Interruption during NM shutdown
Anubhav Dhoot created YARN-3229: --- Summary: Incorrect processing of container as LOST on Interruption during NM shutdown Key: YARN-3229 URL: https://issues.apache.org/jira/browse/YARN-3229 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot YARN-2846 fixed the issue of writing to the state store incorrectly that the process is LOST. But even after that we still process the ContainerExitEvent. If notInterrupted is false in RecoveredContainerLaunch#call we should skip the following {noformat} if (retCode != 0) { LOG.warn(Recovered container exited with a non-zero exit code + retCode); this.dispatcher.getEventHandler().handle(new ContainerExitEvent( containerId, ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, retCode, Container exited with a non-zero exit code + retCode)); return retCode; } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327969#comment-14327969 ] Hudson commented on YARN-3076: -- FAILURE: Integrated in Hadoop-trunk-Commit #7157 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7157/]) YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node mapping (Varun Saxena via wangda) (wangda: rev d49ae725d5fa3eecf879ac42c42a368dd811f854) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3230: -- Attachment: YARN-3230.1.patch Uploaded a patch to add more text to clarify the application state Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3229) Incorrect processing of container as LOST on Interruption during NM shutdown
[ https://issues.apache.org/jira/browse/YARN-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-3229: --- Assignee: Anubhav Dhoot Incorrect processing of container as LOST on Interruption during NM shutdown Key: YARN-3229 URL: https://issues.apache.org/jira/browse/YARN-3229 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot YARN-2846 fixed the issue of writing to the state store incorrectly that the process is LOST. But even after that we still process the ContainerExitEvent. If notInterrupted is false in RecoveredContainerLaunch#call we should skip the following {noformat} if (retCode != 0) { LOG.warn(Recovered container exited with a non-zero exit code + retCode); this.dispatcher.getEventHandler().handle(new ContainerExitEvent( containerId, ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, retCode, Container exited with a non-zero exit code + retCode)); return retCode; } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3230: -- Attachment: YARN-3230.2.patch Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3230: -- Attachment: application page.png Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3230: -- Attachment: (was: application page.png) Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3230: -- Attachment: application page.png uploaded an application page screen shot Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch, application page.png Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328593#comment-14328593 ] Tsuyoshi OZAWA commented on YARN-2820: -- I'll take a look. Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} As discussed at YARN-1778, TestFSRMStateStore failure is also due to IOException in storeApplicationStateInternal. Stack trace from TestFSRMStateStore failure: {code} 2015-02-03 00:09:19,092 INFO [Thread-110] recovery.TestFSRMStateStore (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still not started at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971) at
[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
[ https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3236: Issue Type: Improvement (was: Bug) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. - Key: YARN-3236 URL: https://issues.apache.org/jira/browse/YARN-3236 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3236.000.patch cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the code which use AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better remove it to avoid confusion since it is only introduced for a very short time and no one use it now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
[ https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3236: Attachment: YARN-3236.000.patch cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. - Key: YARN-3236 URL: https://issues.apache.org/jira/browse/YARN-3236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Attachments: YARN-3236.000.patch cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the code which use AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better remove it to avoid confusion since it is only introduce for a very short time and no one use it now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328619#comment-14328619 ] Hadoop QA commented on YARN-3195: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699685/YARN-3195.patch against trunk revision c0d9b93. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.cli.TestLogsCLI org.apache.hadoop.yarn.client.cli.TestYarnCLI The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6679//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6679//console This message is automatically generated. [YARN]Missing uniformity In Yarn Queue CLI command --- Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Environment: SUSE Linux SP3 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png, YARN-3195.patch Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena resolved YARN-3003. Resolution: Fixed Thanks [~tedyu] for reporting. Resolving it as fixed by YARN-3075 and YARN-3076. Not sure if need to be marked as Duplicate or some other resolution status. Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Attachments: YARN-3003.001.patch, YARN-3003.002.patch Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328629#comment-14328629 ] Devaraj K commented on YARN-3225: - Thanks [~djp] for clarification. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328627#comment-14328627 ] Devaraj K commented on YARN-3225: - I see the same mentioned in the design doc https://issues.apache.org/jira/secure/attachment/12699496/GracefullyDecommissionofNodeManagerv3.pdf {quote} Before NMs get decommissioned, the timeout can be updated to shorter or longer. e.g. admin can terminate the CLI and resubmit it with a different timeout value.{quote} New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
[ https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328641#comment-14328641 ] zhihai xu commented on YARN-3236: - This is a code cleanup(remove unused variable), I think a test case is not needed. cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. - Key: YARN-3236 URL: https://issues.apache.org/jira/browse/YARN-3236 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: cleanup, maintenance Attachments: YARN-3236.000.patch cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the code which use AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better remove it to avoid confusion since it is only introduced for a very short time and no one use it now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
[ https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3236: Labels: cleanup maintenance (was: maintenance) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. - Key: YARN-3236 URL: https://issues.apache.org/jira/browse/YARN-3236 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: cleanup, maintenance Attachments: YARN-3236.000.patch cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the code which use AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better remove it to avoid confusion since it is only introduced for a very short time and no one use it now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
zhihai xu created YARN-3236: --- Summary: cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. Key: YARN-3236 URL: https://issues.apache.org/jira/browse/YARN-3236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the code which use AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better remove it to avoid confusion since it is only introduce for a very short time and no one use it now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328631#comment-14328631 ] Sunil G commented on YARN-3225: --- Yes [~devaraj.k]. Thank you for the clarification. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328576#comment-14328576 ] zhihai xu commented on YARN-2820: - All these 5 findbugs are not related to my change. Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch, YARN-2820.002.patch, YARN-2820.003.patch Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} As discussed at YARN-1778, TestFSRMStateStore failure is also due to IOException in storeApplicationStateInternal. Stack trace from TestFSRMStateStore failure: {code} 2015-02-03 00:09:19,092 INFO [Thread-110] recovery.TestFSRMStateStore (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still not started at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage priority labels
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328582#comment-14328582 ] Sunil G commented on YARN-2693: --- Hi [~leftnoteasy] Thank you for the update. NodeLabels and AppPrioirty managers are more or less same, but we cant merge more closer as we have different PBs for each operation. However a plan can laid to merge most of FileSystem and Manager classes so that more common part of code can be shared. As mentioned, I will move the parsing and config support changes to RMAppManager (as a separate class), and will have a minimal implementation. I will still keep this JIRA open so as to handle the same after the major scheduler changes and api support is done. Priority Label Manager in RM to manage priority labels -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328617#comment-14328617 ] Varun Saxena commented on YARN-3076: Thanks [~leftnoteasy] for the review and commit. Add API/Implementation to YarnClient to retrieve label-to-node mapping -- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
[ https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328638#comment-14328638 ] Hadoop QA commented on YARN-3236: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699819/YARN-3236.000.patch against trunk revision c0d9b93. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6680//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6680//console This message is automatically generated. cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. - Key: YARN-3236 URL: https://issues.apache.org/jira/browse/YARN-3236 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: cleanup, maintenance Attachments: YARN-3236.000.patch cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY. RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the code which use AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better remove it to avoid confusion since it is only introduced for a very short time and no one use it now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Tiwari updated YARN-2556: -- Attachment: YARN-2556.patch Hi guys, I've done the following enhancements to the previous patches that were posted: 1) Earlier, the payload was getting set as the entityId. Since the entityId is used as a key, by LevelDB it was crashing under moderate loads, because each key size was ~2MB. Hence I've changed it to send the payload as a part of OtherInfo. This is handled well. 2) Instead of posting a string of repeated 'a's as a payload, I choose from a set of characters. This ensures that the LevelDB does not get away easily with compression ('cos algos can easily compress a string if it comprises a single repeated character) Here are some of the performance numbers that I've got: I run 20 concurrent jobs, with the argument -m 300 -s 10 -t 20 On a 36 node cluster, this results in ~830 concurrent containers (e.g maps), each firing 10KB of payload, 20 times. Level DB seems to hold up fine. Would you have other ways that I could stress/load the system even more? thanks --amit Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3230: -- Attachment: YARN-3230.3.patch Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328346#comment-14328346 ] Li Lu commented on YARN-3034: - I agree that the RM may have a derived type of aggregator. Meanwhile, maybe we'd like to consider reuse the code for web server/data storage layer connections? BTW, I've done a simple write up for app-level aggregators and their relationships with RM/NMs, posted in YARN-3033. To make sure we're on the same page, could some one of you take a look at it? Thanks! [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328386#comment-14328386 ] Wangda Tan commented on YARN-3230: -- Since the new_saving issue seems hard to fit in this ticket, I suggest to file a separated one to tracking it. Patch looks good to me, findbugs warning not related to this patch. I will commit it today. Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, YARN-3230.3.patch, application page.png Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328388#comment-14328388 ] Zhijie Shen commented on YARN-2423: --- Sure, I'll review the last patch. No matter the java client lib exists or not, we have exposed the REST getter APIs, and have users that depend on them. Having java client lib may make put more issue on backward compatibility of TS v2, but hopefully it's not going to be a big addition, as we anyway need to make the REST APIs compatible, which is the internal stuff within the java wrapper. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2986: - Summary: (Umbrella) Support hierarchical and unified scheduler configuration (was: Support hierarchical and unified scheduler configuration) (Umbrella) Support hierarchical and unified scheduler configuration --- Key: YARN-2986 URL: https://issues.apache.org/jira/browse/YARN-2986 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Attachments: YARN-2986.1.patch Today's scheduler configuration is fragmented and non-intuitive, and needs to be improved. Details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328450#comment-14328450 ] Sunil G commented on YARN-1963: --- Thank you Wangda and Jason for the input Yes, it's good to change the priority of an application at runtime. I had mentioned it in the design doc. I have created a user api jira already, and it's client part can be handled there. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327662#comment-14327662 ] Sunil G commented on YARN-3225: --- Hi [~djp] To understand the idea correctly, do you mean a command is to be added so that a given node can be made as decommissioned. And it can be given a timeout to gracefully verify the same is done. So something like ./yarn -node nodeID -timeout 200 -decommission Pls help to clarify, and I would like pursue this if you are not assigning yourself. :) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327711#comment-14327711 ] Varun Saxena commented on YARN-3223: Junping Du, pls reassign if you plan to work on this Resource update during NM graceful decommission --- Key: YARN-3223 URL: https://issues.apache.org/jira/browse/YARN-3223 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Junping Du Assignee: Varun Saxena During NM graceful decommission, we should handle resource update properly, include: make RMNode keep track of old resource for possible rollback, keep available resource to 0 and used resource get updated when container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327668#comment-14327668 ] Sunil G commented on YARN-3224: --- We have an event name PREEMPT_CONTAINER which is used in ProportionalPolicy preemption to notify AM, It can be used here. Do you mind if I also participate in this JIRA. Thank you [~djp] Notify AM with containers (on decommissioning node) could be preempted after timeout. - Key: YARN-3224 URL: https://issues.apache.org/jira/browse/YARN-3224 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327673#comment-14327673 ] Hadoop QA commented on YARN-3204: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699676/YARN-3204-001.patch against trunk revision 2fd02af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6668//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6668//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6668//console This message is automatically generated. Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) - Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3204-001.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3223: -- Assignee: Varun Saxena Resource update during NM graceful decommission --- Key: YARN-3223 URL: https://issues.apache.org/jira/browse/YARN-3223 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Junping Du Assignee: Varun Saxena During NM graceful decommission, we should handle resource update properly, include: make RMNode keep track of old resource for possible rollback, keep available resource to 0 and used resource get updated when container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327680#comment-14327680 ] Sunil G commented on YARN-3225: --- Sorry, I slightly misunderstood earlier. You meant rmadmin command with a new option such as -g. So one doubt here, can timeout also be passed here? New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3204: --- Attachment: YARN-3204-002.patch Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) - Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3204-001.patch, YARN-3204-002.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2004: -- Attachment: 0002-YARN-2004.patch Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3228) Deadlock altering user resource queue
Christian Hott created YARN-3228: Summary: Deadlock altering user resource queue Key: YARN-3228 URL: https://issues.apache.org/jira/browse/YARN-3228 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.0.1-alpha Environment: hadoop yarn, postgresql Reporter: Christian Hott Priority: Blocker let me introduce you with my problem: all of this began after we created some resources queues on postgresql, well we created it, assign it to the users and all was fine... until we run a process (a large one iterative query) and I do an Alter Role over the user and the resource queue that he was using, before that I can't login whit the user and got a message saying deadlock detection, locking against self does you have any idea why this for? or if have any comprensible log in to I can search for more information? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated
[ https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327452#comment-14327452 ] Hudson commented on YARN-3132: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2041 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2041/]) YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev f5da5566d9c392a5df71a2dce4c2d0d50eea51ee) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated --- Key: YARN-3132 URL: https://issues.apache.org/jira/browse/YARN-3132 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3132.1.patch Using an example to explain: 1) Admin specify host1 has label=x 2) node=host1:123 registered 3) Get node-to-label mapping, return host1/host1:123 4) node=host1:123 unregistered 5) Get node-to-label mapping, still returns host1:123 Probably we should remove host1:123 when it becomes deactivated and no directly label assigned to it (directly assign means admin specify host1:123 has x instead of host1 has x). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327454#comment-14327454 ] Hudson commented on YARN-1514: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2041 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2041/]) YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-project/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling
[ https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327447#comment-14327447 ] Hudson commented on YARN-1615: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2041 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2041/]) YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java Fix typos in description about delay scheduling --- Key: YARN-1615 URL: https://issues.apache.org/jira/browse/YARN-1615 Project: Hadoop YARN Issue Type: Bug Components: documentation, scheduler Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-1615-002.patch, YARN-1615.patch In FSAppAttempt.java there're 4 typos: {code} * containers over rack-local or off-switch containers. To acheive this * we first only allow node-local assigments for a given prioirty level, * then relax the locality threshold once we've had a long enough period * without succesfully scheduling. We measure both the number of missed {code} They should be fixed as follows: {code} * containers over rack-local or off-switch containers. To achieve this * we first only allow node-local assignments for a given priority level, * then relax the locality threshold once we've had a long enough period * without successfully scheduling. We measure both the number of missed {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327489#comment-14327489 ] Hudson commented on YARN-1514: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #110 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/110/]) YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-project/pom.xml * hadoop-yarn-project/CHANGES.txt Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated
[ https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327487#comment-14327487 ] Hudson commented on YARN-3132: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #110 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/110/]) YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev f5da5566d9c392a5df71a2dce4c2d0d50eea51ee) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated --- Key: YARN-3132 URL: https://issues.apache.org/jira/browse/YARN-3132 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3132.1.patch Using an example to explain: 1) Admin specify host1 has label=x 2) node=host1:123 registered 3) Get node-to-label mapping, return host1/host1:123 4) node=host1:123 unregistered 5) Get node-to-label mapping, still returns host1:123 Probably we should remove host1:123 when it becomes deactivated and no directly label assigned to it (directly assign means admin specify host1:123 has x instead of host1 has x). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated YARN-3195: --- Attachment: YARN-3195.patch Attached the patch after fix.Please check [YARN]Missing uniformity In Yarn Queue CLI command --- Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Environment: SUSE Linux SP3 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png, YARN-3195.patch Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
Junping Du created YARN-3224: Summary: Notify AM with containers (on decommissioning node) could be preempted after timeout. Key: YARN-3224 URL: https://issues.apache.org/jira/browse/YARN-3224 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327445#comment-14327445 ] Varun Saxena commented on YARN-3197: [~devaraj.k] and others, I meant printing unknown container or unknown application while printing their respective IDs' might be deemed as confusing by some too. Cant we say something Non-alive container containerid ? AppID can probably be printed from ContainerID. Thoughts ? Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling
[ https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327464#comment-14327464 ] Hudson commented on YARN-1615: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/100/]) YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java Fix typos in description about delay scheduling --- Key: YARN-1615 URL: https://issues.apache.org/jira/browse/YARN-1615 Project: Hadoop YARN Issue Type: Bug Components: documentation, scheduler Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-1615-002.patch, YARN-1615.patch In FSAppAttempt.java there're 4 typos: {code} * containers over rack-local or off-switch containers. To acheive this * we first only allow node-local assigments for a given prioirty level, * then relax the locality threshold once we've had a long enough period * without succesfully scheduling. We measure both the number of missed {code} They should be fixed as follows: {code} * containers over rack-local or off-switch containers. To achieve this * we first only allow node-local assignments for a given priority level, * then relax the locality threshold once we've had a long enough period * without successfully scheduling. We measure both the number of missed {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327514#comment-14327514 ] Brahma Reddy Battula commented on YARN-3204: Updateinterval is read from initilzataion of fairschduler, so it will not change.. Hence need not be protected by a lock for following piece of code..I want to add this in findbug-exclude file... {code} public void run() { while (!Thread.currentThread().isInterrupted()) { try { Thread.sleep(updateInterval); long start = getClock().getTime(); update(); {code} Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) - Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3204-001.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327527#comment-14327527 ] Jason Lowe commented on YARN-2004: -- My thoughts are as I stated above. We should not ignore priorities if one of the apps does not have a priority specified. A lack of a specified priority on an application should imply a default priority value and still be compared to the other application's priority rather than skipping the priority comparison. That would be the expected behavior. We can come up with all sorts of schemes to determine what the default priority value should be (e.g.: hardcoded default value, cluster-wide configurable, queue-specific configurable, etc.). The important part is to not skip the priority comparison completely as that would be unexpected behavior for users. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3223) Resource update during NM graceful decommission
Junping Du created YARN-3223: Summary: Resource update during NM graceful decommission Key: YARN-3223 URL: https://issues.apache.org/jira/browse/YARN-3223 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Junping Du During NM graceful decommission, we should handle resource update properly, include: make RMNode keep track of old resource for possible rollback, keep available resource to 0 and used resource get updated when container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327471#comment-14327471 ] Hudson commented on YARN-1514: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/100/]) YARN-1514. Utility to benchmark ZKRMStateStore#loadState for RM HA. Contributed by Tsuyoshi OZAWA (jianhe: rev 1c03376300a46722d4147f5b8f37242f68dba0a2) * hadoop-project/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStorePerf.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/test/YarnTestDriver.java Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3132) RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated
[ https://issues.apache.org/jira/browse/YARN-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327469#comment-14327469 ] Hudson commented on YARN-3132: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/100/]) YARN-3132. RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated. Contributed by Wangda Tan (jianhe: rev f5da5566d9c392a5df71a2dce4c2d0d50eea51ee) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java RMNodeLabelsManager should remove node from node-to-label mapping when node becomes deactivated --- Key: YARN-3132 URL: https://issues.apache.org/jira/browse/YARN-3132 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3132.1.patch Using an example to explain: 1) Admin specify host1 has label=x 2) node=host1:123 registered 3) Get node-to-label mapping, return host1/host1:123 4) node=host1:123 unregistered 5) Get node-to-label mapping, still returns host1:123 Probably we should remove host1:123 when it becomes deactivated and no directly label assigned to it (directly assign means admin specify host1:123 has x instead of host1 has x). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1615) Fix typos in description about delay scheduling
[ https://issues.apache.org/jira/browse/YARN-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327482#comment-14327482 ] Hudson commented on YARN-1615: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #110 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/110/]) YARN-1615. Fix typos in delay scheduler's description. Contributed by Akira Ajisaka. (ozawa: rev b8a14efdf535d42bcafa58d380bd2c7f4d36f8cb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java Fix typos in description about delay scheduling --- Key: YARN-1615 URL: https://issues.apache.org/jira/browse/YARN-1615 Project: Hadoop YARN Issue Type: Bug Components: documentation, scheduler Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Trivial Labels: newbie Fix For: 2.7.0 Attachments: YARN-1615-002.patch, YARN-1615.patch In FSAppAttempt.java there're 4 typos: {code} * containers over rack-local or off-switch containers. To acheive this * we first only allow node-local assigments for a given prioirty level, * then relax the locality threshold once we've had a long enough period * without succesfully scheduling. We measure both the number of missed {code} They should be fixed as follows: {code} * containers over rack-local or off-switch containers. To achieve this * we first only allow node-local assignments for a given priority level, * then relax the locality threshold once we've had a long enough period * without successfully scheduling. We measure both the number of missed {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
Junping Du created YARN-3225: Summary: New parameter or CLI for decommissioning node gracefully in RMAdmin CLI Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327604#comment-14327604 ] Devaraj K commented on YARN-3197: - I am not completely convinced to change the log level to debug, even if there are many logs those would be one log per container. If we change the log level to debug then we would be missing the update of those containers after NM restart in the usual cases where the log level is Info. And also there is a debug log in the caller method would probably serve the same and (rmContainer == null) {} log wouldn't be required if you have decided to make the log level as debug. {code:xml} LOG.debug(Container FINISHED: + containerId); {code} IMO, we don't need to explicitly derive and print the application id from container id, just logging container id would be enough and user can derive application id from it if they really want. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor Attachments: YARN-3197.001.patch 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327716#comment-14327716 ] Craig Welch commented on YARN-2495: --- So, here's my proposal [~Naganarasimha] [~leftnoteasy], take a minute and consider whether or not DECENTRALIZED_CONFIGURATION_ENABLED is more likely to cause difficulty than prevent it, as I'm suggesting, and then you all can decide to keep it or not as you wish - I don't want to hold up the way forward over something which is, on the whole, a detail... Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327718#comment-14327718 ] Hadoop QA commented on YARN-2495: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch against trunk revision 2fd02af. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6670//console This message is automatically generated. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired
Jonathan Eagles created YARN-3227: - Summary: Timeline renew delegation token fails when RM user's TGT is expired Key: YARN-3227 URL: https://issues.apache.org/jira/browse/YARN-3227 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Priority: Critical When the RM user's kerberos TGT is expired, the RM renew delegation token operation fails as part of job submission. Expected behavior is that RM will relogin to get a new TGT. {quote} 2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN security.DelegationTokenRenewer: Unable to add the application to the delegation token renewer. java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: timelineserver.example.com:4080, Ident: (owner=user, renewer=rmuser, realUser=oozie, issueDate=1423248845528, maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: HTTP status [401], message [Unauthorized] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378) at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) at org.apache.hadoop.security.token.Token.renew(Token.java:377) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327730#comment-14327730 ] Sunil G commented on YARN-2004: --- As per YARN-2003, RMAppManager#submitApplication process input from submissionContext. I will add a case here which will handle the scenario where priority is NULL from submission context. It can be updated with default priority from queue. As for this patch, i can remove NULL check. Will only have a direct compareTo check for priority. Priority scheduling support in Capacity scheduler - Key: YARN-2004 URL: https://issues.apache.org/jira/browse/YARN-2004 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2004.patch Based on the priority of the application, Capacity Scheduler should be able to give preference to application while doing scheduling. ComparatorFiCaSchedulerApp applicationComparator can be changed as below. 1.Check for Application priority. If priority is available, then return the highest priority job. 2.Otherwise continue with existing logic such as App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3228) Deadlock altering user resource queue
[ https://issues.apache.org/jira/browse/YARN-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-3228. --- Resolution: Incomplete Not sure how/why this is related to Hadoop. In any case, please first try to resolve user issues in the user mailing lists (http://hadoop.apache.org/mailing_lists.html). The JIRA is a place to address existing bugs/new features in the project. Closing this for now. Thanks. Deadlock altering user resource queue - Key: YARN-3228 URL: https://issues.apache.org/jira/browse/YARN-3228 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager, scheduler Affects Versions: 2.0.1-alpha Environment: hadoop yarn, postgresql Reporter: Christian Hott Priority: Blocker Labels: newbie Original Estimate: 203h Remaining Estimate: 203h let me introduce you with my problem: all of this began after we created some resources queues on postgresql, well we created it, assign it to the users and all was fine... until we run a process (a large one iterative query) and I do an Alter Role over the user and the resource queue that he was using, before that I can't login whit the user and got a message saying deadlock detection, locking against self does you have any idea why this for? or if have any comprensible log in to I can search for more information? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired
[ https://issues.apache.org/jira/browse/YARN-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327798#comment-14327798 ] Vinod Kumar Vavilapalli commented on YARN-3227: --- Is it only the Timeline delegation token that fails renewal or all the tokens? Timeline renew delegation token fails when RM user's TGT is expired --- Key: YARN-3227 URL: https://issues.apache.org/jira/browse/YARN-3227 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Priority: Critical When the RM user's kerberos TGT is expired, the RM renew delegation token operation fails as part of job submission. Expected behavior is that RM will relogin to get a new TGT. {quote} 2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN security.DelegationTokenRenewer: Unable to add the application to the delegation token renewer. java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: timelineserver.example.com:4080, Ident: (owner=user, renewer=rmuser, realUser=oozie, issueDate=1423248845528, maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: HTTP status [401], message [Unauthorized] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378) at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) at org.apache.hadoop.security.token.Token.renew(Token.java:377) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327807#comment-14327807 ] Junping Du commented on YARN-3225: -- Thanks [~sunilg] for the comments! Yes. I mean mradmin command line. I think it could be better to pass a timeout with adding a parameter something like -t. Without this parameter, it will decommission node forcefully just like the old. Thoughts? New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327811#comment-14327811 ] Junping Du commented on YARN-3224: -- Sure. Please go ahead to take on this JIRA. Thanks [~sunilg]! Notify AM with containers (on decommissioning node) could be preempted after timeout. - Key: YARN-3224 URL: https://issues.apache.org/jira/browse/YARN-3224 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327816#comment-14327816 ] Hadoop QA commented on YARN-3204: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699701/YARN-3204-002.patch against trunk revision 2fd02af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6669//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6669//console This message is automatically generated. Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) - Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3204-001.patch, YARN-3204-002.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3204: --- Summary: Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) (was: Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) -- Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3204-001.patch, YARN-3204-002.patch Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328264#comment-14328264 ] Wangda Tan commented on YARN-3230: -- [~jianhe], thanks for working on this, generally looks good to me, some minor comments: 1) FinalStatus from Application's POV: to Final State Reported by Application Master? 2) NEW_SAVING: is not necessary to be seen by client? 3) RUNNING: AM container has registered to RM and started running. Wangda Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch, application page.png Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3230) Clarify application states on the web UI
[ https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3230: -- Attachment: (was: application page.png) Clarify application states on the web UI Key: YARN-3230 URL: https://issues.apache.org/jira/browse/YARN-3230 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3230.1.patch, YARN-3230.2.patch Today, application state are simply surfaced as a single word on the web UI. Not everyone understands the meaning of NEW_SAVING, SUBMITTED, ACCEPTED. This jira is to clarify the meaning of these states, things like what the application is waiting for at this state. In addition,the difference between application state and FinalStatus are fairly confusing to users, especially when state=FINISHED, but FinalStatus=FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3231: --- Priority: Critical (was: Major) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck -- Key: YARN-3231 URL: https://issues.apache.org/jira/browse/YARN-3231 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-3231.v1.patch When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit. We want to increase this property on the fly to make some of the pending job active. However, once we increase the limit, all pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
[ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328350#comment-14328350 ] Hadoop QA commented on YARN-3131: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699763/yarn_3131_v1.patch against trunk revision d49ae72. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6676//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6676//console This message is automatically generated. YarnClientImpl should check FAILED and KILLED state in submitApplication Key: YARN-3131 URL: https://issues.apache.org/jira/browse/YARN-3131 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: yarn_3131_v1.patch Just run into a issue when submit a job into a non-existent queue and YarnClient raise no exception. Though that job indeed get submitted successfully and just failed immediately after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner does -- This message was sent by Atlassian JIRA (v6.3.4#6332)