[jira] [Commented] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9
[ https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846155#comment-13846155 ] Radim Kolar commented on YARN-1327: --- can anybody look at this? it breaks hadoop on freeBSD. Fix nodemgr native compilation problems on FreeBSD9 --- Key: YARN-1327 URL: https://issues.apache.org/jira/browse/YARN-1327 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Radim Kolar Assignee: Radim Kolar Fix For: 3.0.0, 2.3.0 Attachments: nodemgr-portability.txt There are several portability problems preventing from compiling native component on freebsd. 1. libgen.h is not included. correct function prototype is there but linux glibc has workaround to define it for user if libgen.h is not directly included. Include this file directly. 2. query max size of login name using sysconf. it follows same code style like rest of code using sysconf too. 3. cgroups are linux only feature, make conditional compile and return error if mount_cgroup is attempted on non linux OS 4. do not use posix function setpgrp() since it clashes with same function from BSD 4.2, use equivalent function. After inspecting glibc sources its just shortcut to setpgid(0,0) These changes makes it compile on both linux and freebsd. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1028: --- Attachment: yarn-1028-6.patch Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846315#comment-13846315 ] Karthik Kambatla commented on YARN-1028: Thanks Tom. Updated the patch to address your comments. bq. It looks like the behaviour in this patch differs from the way failover is implemented for HDFS HA, where it is controlled by dfs.client.failover settings For consistency, all yarn-failover configs are prefixed by yarn.client.failover. The suffixes are also similar to the ones HDFS uses, but use hyphens instead of dots for consistency with rest of YARN configs. bq. Why do you need both YarnFailoverProxyProvider and ConfiguredFailoverProxyProvider? Changed {{YarnFailoverProxyProvider}} to an interface with a single method {{#init(Conf, RMProxy, ClassT protocol)}}. This init() is called after creating an instance of the specified class. HDFS, on the other hand, expects the plugged-in FailoverProxyProvider to have a constructor of a particular form. I think the approach in the current patch is cleaner, so anyone writing a plugin knows they should have an init method. What do you think? I can change it to remove YarnFailoverProxyProvider altogether if you think it is a better approach. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (YARN-1502) Protocol changes and implementations in RM side to support change container resource
Wangda Tan created YARN-1502: Summary: Protocol changes and implementations in RM side to support change container resource Key: YARN-1502 URL: https://issues.apache.org/jira/browse/YARN-1502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan As described in YARN-1197, we need add API/implementation changes, 1) Add a ListContainerResourceIncreaseRequest to YarnScheduler interface 2) Can get resource changed containers in AllocateResponse 3) Added implementation in Capacity Scheduler side to support increase/decrease Other details, please refer to design doc and discussion in YARN-1197 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1502) Protocol changes and implementations in RM side to support change container resource
[ https://issues.apache.org/jira/browse/YARN-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1502: - Attachment: yarn-1502.1.patch Attached the first patch of scheduler changes for preview, just to want to know if I have any big issue of this approach. Currently this still in development, container increasing is supported and have unit tests, but container decreasing not tested yet. Hope you can share some light to me! :) Protocol changes and implementations in RM side to support change container resource Key: YARN-1502 URL: https://issues.apache.org/jira/browse/YARN-1502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: yarn-1502.1.patch As described in YARN-1197, we need add API/implementation changes, 1) Add a ListContainerResourceIncreaseRequest to YarnScheduler interface 2) Can get resource changed containers in AllocateResponse 3) Added implementation in Capacity Scheduler side to support increase/decrease Other details, please refer to design doc and discussion in YARN-1197 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1029: --- Attachment: yarn-1029-0.patch yarn-1029-0.patch is a working patch that uses ActiveStandbyElector directly. The ZKFCProtocol implementation of the code is straight-forward - 60 lines of code between AdminService and RMZKActiveStandbyElector along with error/ exception handling. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1029-0.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846325#comment-13846325 ] Hadoop QA commented on YARN-1028: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618407/yarn-1028-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.TestRMNMSecretKeys {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2649//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2649//console This message is automatically generated. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1029: --- Attachment: embedded-zkfc-approach.patch Also uploading embedded-zkfc-approach that I worked on earlier. It might not apply to trunk/ branch-2 anymore though. The uploaded patch doesn't remove any of the overheads or handle short-circuit. After having implemented both approaches, I sincerely feel the ActiveStandbyElector approach is simpler, cleaner, straight-forward than the embedded ZKFC approach. Refactoring ZKFC will only add more work, without apparent gains. When working on the embedded ZKFC approach, I ran it by Todd and he suggested we might want to use ActiveStandbyElector directly and do away with unnecessary failover code paths if we are not using rest of the ZKFC features. Thanks to his suggestion, the code definitely looks simpler that way. [~vinodkv] - is there a good technical reason for using ZKFC instead of ActiveStandbyElector directly. In this case, we only need election and will be using ZKFC for the elector. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846332#comment-13846332 ] Karthik Kambatla commented on YARN-1028: The test failures are unrelated. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1488) Allow containers to delegate resources to another container
[ https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846343#comment-13846343 ] Arun C Murthy commented on YARN-1488: - [~henryr] Good to see you around here. bq. Would the recipient and delegated containers have to match the queues to which their original resources were granted? No, not at all. The requirement, gleaned from discussions in YARN-1404 (and HDFS-4949, if you squint hard *smile*) is that you'd want an external framework, call it Gamma, which runs in a separate queue, call it GammaQ. Now, a user Alice, belonging to queue Phi, needs to use Gamma. The key requirement is that Gamma would like to leverage YARN's workload management capabilities (queues, SLAs etc.) rather than merely run under YARN to leverage YARN's resource management. Use-cases: # Alice running queries on Impala (resource: memory, cpu, others in future) # Bob caching data-sets in HDFS i.e. DataNodes (resource: memory) # Charlie doing a bunch of I/O operations on HBase/Accumulo (resource: cpu, iops). If we all agree on the use cases; then it would be very critical to support source and target containers belonging to different queues - that would be key to allow these external frameworks to leverage YARN's workload management. Does that make sense? This would definitely require the NodeManager (and, potentially, the external system i.e. impalad, datanode etc.) to maintain the resource-map so that they can return the original source container to YARN for various reasons (finished the task at hand, preemption to respect queue SLAs etc.) We could, and should, allow the recipient service to decide how to manage the resource map for itself (i.e. decouple that from how the NodeManager manages the mapping) - this could be either a single cgroup (which the NodeManager has to manage for the external framework anyway) or a hierarchy within. Thoughts? Thanks. Allow containers to delegate resources to another container --- Key: YARN-1488 URL: https://issues.apache.org/jira/browse/YARN-1488 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy We should allow containers to delegate resources to another container. This would allow external frameworks to share not just YARN's resource-management capabilities but also it's workload-management capabilities. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1488) Allow containers to delegate resources to another container
[ https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846347#comment-13846347 ] Arun C Murthy commented on YARN-1488: - bq. We could, and should, allow the recipient service to decide how to manage the resource map for itself [~henryr]: To clarify, we would have to do this whether or not we decide to take the delegation approach or not. Thanks. Allow containers to delegate resources to another container --- Key: YARN-1488 URL: https://issues.apache.org/jira/browse/YARN-1488 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy We should allow containers to delegate resources to another container. This would allow external frameworks to share not just YARN's resource-management capabilities but also it's workload-management capabilities. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (YARN-1488) Allow containers to delegate resources to another container
[ https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846343#comment-13846343 ] Arun C Murthy edited comment on YARN-1488 at 12/12/13 3:06 PM: --- [~henryr] Good to see you around here. bq. Would the recipient and delegated containers have to match the queues to which their original resources were granted? No, not at all. The requirement, gleaned from discussions in YARN-1404 (and HDFS-4949, if you squint hard *smile*) is that you'd want an external framework, call it Gamma, which runs in a separate queue, call it GammaQ. Now, a user Alice, belonging to queue Alpha, needs to use Gamma. The key requirement is that Gamma would like to leverage YARN's workload management capabilities (queues, SLAs etc.) rather than merely run under YARN to leverage YARN's resource management. Use-cases: # Alice, belonging to queue Phi, running queries on Impala (resource: memory, cpu, others in future) # Bob, belonging to queue Alpha, caching data-sets in HDFS i.e. DataNodes (resource: memory) # Charlie, belonging to queue Beta, doing a bunch of I/O operations on HBase/Accumulo (resource: cpu, iops). If we all agree on the use cases; then it would be very critical to support source and target containers belonging to different queues - that would be key to allow these external frameworks to leverage YARN's workload management. Does that make sense? This would definitely require the NodeManager (and, potentially, the external system i.e. impalad, datanode etc.) to maintain the resource-map so that they can return the original source container to YARN for various reasons (finished the task at hand, preemption to respect queue SLAs etc.) We could, and should, allow the recipient service to decide how to manage the resource map for itself (i.e. decouple that from how the NodeManager manages the mapping) - this could be either a single cgroup (which the NodeManager has to manage for the external framework anyway) or a hierarchy within. Thoughts? Thanks. was (Author: acmurthy): [~henryr] Good to see you around here. bq. Would the recipient and delegated containers have to match the queues to which their original resources were granted? No, not at all. The requirement, gleaned from discussions in YARN-1404 (and HDFS-4949, if you squint hard *smile*) is that you'd want an external framework, call it Gamma, which runs in a separate queue, call it GammaQ. Now, a user Alice, belonging to queue Phi, needs to use Gamma. The key requirement is that Gamma would like to leverage YARN's workload management capabilities (queues, SLAs etc.) rather than merely run under YARN to leverage YARN's resource management. Use-cases: # Alice running queries on Impala (resource: memory, cpu, others in future) # Bob caching data-sets in HDFS i.e. DataNodes (resource: memory) # Charlie doing a bunch of I/O operations on HBase/Accumulo (resource: cpu, iops). If we all agree on the use cases; then it would be very critical to support source and target containers belonging to different queues - that would be key to allow these external frameworks to leverage YARN's workload management. Does that make sense? This would definitely require the NodeManager (and, potentially, the external system i.e. impalad, datanode etc.) to maintain the resource-map so that they can return the original source container to YARN for various reasons (finished the task at hand, preemption to respect queue SLAs etc.) We could, and should, allow the recipient service to decide how to manage the resource map for itself (i.e. decouple that from how the NodeManager manages the mapping) - this could be either a single cgroup (which the NodeManager has to manage for the external framework anyway) or a hierarchy within. Thoughts? Thanks. Allow containers to delegate resources to another container --- Key: YARN-1488 URL: https://issues.apache.org/jira/browse/YARN-1488 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy We should allow containers to delegate resources to another container. This would allow external frameworks to share not just YARN's resource-management capabilities but also it's workload-management capabilities. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1491) Upgrade JUnit3 TestCase to JUnit 4
[ https://issues.apache.org/jira/browse/YARN-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846369#comment-13846369 ] Hudson commented on YARN-1491: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1610 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1610/]) YARN-1491. Upgrade JUnit3 TestCase to JUnit 4 (Chen He via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550204) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsResourceCalculatorPlugin.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java Upgrade JUnit3 TestCase to JUnit 4 -- Key: YARN-1491 URL: https://issues.apache.org/jira/browse/YARN-1491 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jonathan Eagles Assignee: Chen He Labels: newbie Fix For: 3.0.0, 2.4.0 Attachments: Yarn-1491.patch There are still four references to test classes that extend from junit.framework.TestCase hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsResourceCalculatorPlugin.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager
[ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846371#comment-13846371 ] Hudson commented on YARN-1481: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1610 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1610/]) YARN-1481. Move internal services logic from AdminService to ResourceManager. (vinodkv via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550167) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java Move internal services logic from AdminService to ResourceManager - Key: YARN-1481 URL: https://issues.apache.org/jira/browse/YARN-1481 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 2.4.0 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-408) Capacity Scheduler delay scheduling should not be disabled by default
[ https://issues.apache.org/jira/browse/YARN-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846375#comment-13846375 ] Hudson commented on YARN-408: - FAILURE: Integrated in Hadoop-Hdfs-trunk #1610 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1610/]) YARN-408. Change CapacityScheduler to not disable delay-scheduling by default. Contributed by Mayank Bansal. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550245) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java Capacity Scheduler delay scheduling should not be disabled by default - Key: YARN-408 URL: https://issues.apache.org/jira/browse/YARN-408 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Mayank Bansal Assignee: Mayank Bansal Priority: Minor Fix For: 2.4.0 Attachments: YARN-408-trunk-2.patch, YARN-408-trunk-3.patch, YARN-408-trunk.patch Capacity Scheduler delay scheduling should not be disabled by default. Enabling it to number of nodes in one rack. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197-scheduler-v1.pdf Attached scheduler design doc for increasing and decreasing, I've uploaded a very draft preview patch for scheduler changes in YARN-1502 Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846389#comment-13846389 ] Wangda Tan commented on YARN-1197: -- Copy text from scheduler design doc to here for easier discussion, please feel free to let me know your comments! *Basic Requirements* We need support handling resource increase request from AM and resource decrease notify from NM * Such resource changes should reflect to FiCaSchedulerNode/ App, LeafQueue, ParentQueue (like usedResource, reservedResource, etc.) * If user requested an increase request and not be satisfied immediately, it will be reserved in node/app (The node/app means FiCaSchedulerApp/Node, same in below) like before. *Advanced Requirements* * We need gracefully handle racing conditions, ** Only acquired/running containers can be increased ** Container decreasing will only take effect in acquired/running containers. (If a container is finished/killed, etc. All of its resource will be released, we don’t need decrease it) ** User may request a new increase requests on a container, and a pending increase request for the same container existed. We need replace the pending with the new one. ** When a requested container resource is less or equal to existing container resource. * This will be ignored if no pending increase request for this container * This will be ignored and the pending increase request will be canceled ** When a pending increase request existed, and a decrease container notify on the same container comes, this container will be decreased and the pending increase request will be canceled *Requirements not clear* * Do we need a time-out parameter for reserved resource increase request to avoid it occupy the node resource too long? (Do we have such parameter for reserve a “normal” container in CS?) * How to decide which of increase request and normal container request will be satisfied first? (Currently, I simply make CS satisfy increase request first). Should it be a configurable parameter? *Current Implementation* *1) Decrease Container* I start with decrease container because it’s more easier to understand, Decreased container will be handled in nodeUpdate() of Capacity scheduler. When CS received decreased containers from NM, it will process them one by one by following steps * Check if it’s in running state (Because this is reported by NM, it’s state will either be running or completed), skip if no. * Remove increase request on the same container-id if it exists * Decrease/Update container resource in FiCaSchedulerApp/AppSchedulingInfo/FiCaSchedulerNode/LeafQueue/ParentQueue/other-related-metrics * Update resource in Container. * Return decreased container to AM by calling setDecreasedContainer in AllocateResponse *2) Increase Container* Increasing container will be much more complex than decreasing, *Steps to add container increase request, (pseudo code)* In CapacityScheduler.allocate(...) {code} foreach (increase_request): if (state != ACQUIRED) and (state != RUNNING): continue; // Remove the old request on the same container-id if it exists if increase_request_exist(increase_request.getContainerId()): remove(increaseRequest); // Ask target resource should larger than existing resource if increase_request.ask_resource = existing_resource(increase_request.getContainerId()): continue; // Add it to application getApplication(increase_request.getContainerId()).add(increase_request) {code} *Steps to handle container increase request,* 2.1) In CapacityScheduler.nodeUpdate(...): {code} if node.is_reserved(): if reserved-increase-request: LeafQueue.assignReservedIncreaseRequest(...) elif reserved-normal-container: ... else: ParentQueue.assignContainers(...) // this will finally call // LeafQueue.assignContainers(...) {code} 2.2) In CapacityScheduler.nodeUpdate(...): {code} if request-is-fit-in-resource: allocate-resource update container token add to AllocateResponse return allocated-resource else: return None {code} 2.3) In LeafQueue.assignContainers(...): {code} foreach (application): // do increase allocation first foreach (increase_request): // check if we can allocate it // in queue/user limites, etc. // return None if not satisfied if request-is-fit-in-resource: allocate-resource update container token add to AllocateResponse else: reserve in app/node return reserved-resource // do normal allocation ... {code} *API changes in CapacityScheduler* 1)YarnScheduler {code} public Allocation
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846395#comment-13846395 ] Tom White commented on YARN-1028: - Thanks for the explanation of how failover works, Karthik. I think the failover configuration is much better now - the patch is very close. Just a few minor comments: * The YarnFailoverProxyProvider interface is an improvement. It might be good to have RM in its name since it is about RM failover. Ditto for ConfiguredFailoverProxyProvider. * It would be nice to have YarnClientImpl still report which RM it submitted to - the logical name when HA is enabled, the host/port when not. * Nit: TestRMFailover has a spurious log message LOG.error(KK) * Nit: YARN_MINI_CLUSTER_USE_RPC and DEFAULT_YARN_MINI_CLUSTER_USE_RPC - should be MINICLUSTER (without a space) for consistency with existing names. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1028: --- Attachment: yarn-1028-7.patch Thanks again, Tom. New patch that addresses you comments. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-7.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846501#comment-13846501 ] Hadoop QA commented on YARN-1028: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618440/yarn-1028-7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.TestRMNMSecretKeys {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2650//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2650//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2650//console This message is automatically generated. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-7.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1028: --- Attachment: yarn-1028-8.patch Fix findbugs warning. Also verified the output from YarnClientImpl by running a job against HA cluster. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-7.patch, yarn-1028-8.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846574#comment-13846574 ] Hadoop QA commented on YARN-1028: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618449/yarn-1028-8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.TestRMNMSecretKeys {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2651//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2651//console This message is automatically generated. Add FailoverProxyProvider like capability to RMProxy Key: YARN-1028 URL: https://issues.apache.org/jira/browse/YARN-1028 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-6.patch, yarn-1028-7.patch, yarn-1028-8.patch, yarn-1028-draft-cumulative.patch RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1325) Enabling HA should check Configuration contains multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846696#comment-13846696 ] Xuan Gong commented on YARN-1325: - testcase failures are un-related Enabling HA should check Configuration contains multiple RMs Key: YARN-1325 URL: https://issues.apache.org/jira/browse/YARN-1325 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Xuan Gong Labels: ha Attachments: YARN-1325.1.patch, YARN-1325.2.patch, YARN-1325.3.patch, YARN-1325.4.patch Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS). This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs. One idea is to support strict mode to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (YARN-1180) Update capacity scheduler docs to include types on the configs
[ https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He reassigned YARN-1180: - Assignee: Chen He Update capacity scheduler docs to include types on the configs -- Key: YARN-1180 URL: https://issues.apache.org/jira/browse/YARN-1180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Chen He Labels: documentation, newbie The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1325) Enabling HA should check Configuration contains multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846730#comment-13846730 ] Xuan Gong commented on YARN-1325: - Thanks. https://issues.apache.org/jira/browse/YARN-1463 is used to track the testcase failures Enabling HA should check Configuration contains multiple RMs Key: YARN-1325 URL: https://issues.apache.org/jira/browse/YARN-1325 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Xuan Gong Labels: ha Fix For: 2.4.0 Attachments: YARN-1325.1.patch, YARN-1325.2.patch, YARN-1325.3.patch, YARN-1325.4.patch Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS). This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs. One idea is to support strict mode to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (YARN-1503) Support making additional 'LocalResources' available to running containers
Siddharth Seth created YARN-1503: Summary: Support making additional 'LocalResources' available to running containers Key: YARN-1503 URL: https://issues.apache.org/jira/browse/YARN-1503 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth We have a use case, where additional resources (jars, libraries etc) need to be made available to an already running container. Ideally, we'd like this to be done via YARN (instead of having potentially multiple containers per node download resources on their own). Proposal: NM to support an additional API where a list of resources can be specified. Something like localiceResource(ContainerId, MapString, LocalResource) NM would also require an additional API to get state for these resources - getLocalizationState(ContainerId) - which returns the current state of all local resources for the specified container(s). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1498) RM changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1498: - Attachment: YARN-1498.patch RM changes for moving apps between queues - Key: YARN-1498 URL: https://issues.apache.org/jira/browse/YARN-1498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1498.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1325) Enabling HA should check Configuration contains multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846784#comment-13846784 ] Hudson commented on YARN-1325: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4875 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4875/]) YARN-1325. Modified RM HA configuration validation to also ensure that multiple RMs are configured. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550524) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestHAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Enabling HA should check Configuration contains multiple RMs Key: YARN-1325 URL: https://issues.apache.org/jira/browse/YARN-1325 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Xuan Gong Labels: ha Fix For: 2.4.0 Attachments: YARN-1325.1.patch, YARN-1325.2.patch, YARN-1325.3.patch, YARN-1325.4.patch Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS). This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs. One idea is to support strict mode to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
[ https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846816#comment-13846816 ] Jian He commented on YARN-1311: --- patch looks good, check it in Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Attachments: YARN-1311-20131015.txt, YARN-1311-20131211.1.txt, YARN-1311-20131211.txt Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1498) RM changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846822#comment-13846822 ] Hadoop QA commented on YARN-1498: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618484/YARN-1498.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2652//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2652//console This message is automatically generated. RM changes for moving apps between queues - Key: YARN-1498 URL: https://issues.apache.org/jira/browse/YARN-1498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1498.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1485) Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs
[ https://issues.apache.org/jira/browse/YARN-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1485: Attachment: YARN-1485.1.patch Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs - Key: YARN-1485 URL: https://issues.apache.org/jira/browse/YARN-1485 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1485.1.patch After YARN-1325, the YarnConfiguration.RM_HA_IDS will contain multiple RM_Ids. We need to verify that the RM service addresses configurations have been set for all of RM_Ids. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails
[ https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846865#comment-13846865 ] Haohui Mai commented on YARN-1463: -- Just discussed with [~vinodkv], we believe that the unit tests should be fixed as well. Maybe we can fix the yarn tests by specifying the keytabs just like what TestSecureNamenode does. TestContainerManagerSecurity#testContainerManager fails --- Key: YARN-1463 URL: https://issues.apache.org/jira/browse/YARN-1463 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Binglin Chang Attachments: YARN-1463.v1.patch, YARN-1463.v2.patch Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1180) Update capacity scheduler docs to include types on the configs
[ https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1180: -- Attachment: Yarn-1180.patch Update capacity scheduler docs to include types on the configs -- Key: YARN-1180 URL: https://issues.apache.org/jira/browse/YARN-1180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Chen He Labels: documentation, newbie Fix For: 2.4.0 Attachments: Yarn-1180.patch The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1180) Update capacity scheduler docs to include types on the configs
[ https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846893#comment-13846893 ] Chen He commented on YARN-1180: --- patch submitted! Update capacity scheduler docs to include types on the configs -- Key: YARN-1180 URL: https://issues.apache.org/jira/browse/YARN-1180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Chen He Labels: documentation, newbie Fix For: 2.4.0 Attachments: Yarn-1180.patch The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
[ https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846898#comment-13846898 ] Hudson commented on YARN-1311: -- FAILURE: Integrated in Hadoop-trunk-Commit #4876 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4876/]) YARN-1311. Fixed app specific scheduler-events' names to be app-attempt based. Contributed by Vinod Kumar Vavilapalli (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550579) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptRemovedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppRemovedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Attachments: YARN-1311-20131015.txt, YARN-1311-20131211.1.txt, YARN-1311-20131211.txt Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1180) Update capacity scheduler docs to include types on the configs
[ https://issues.apache.org/jira/browse/YARN-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846907#comment-13846907 ] Hadoop QA commented on YARN-1180: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618492/Yarn-1180.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2654//console This message is automatically generated. Update capacity scheduler docs to include types on the configs -- Key: YARN-1180 URL: https://issues.apache.org/jira/browse/YARN-1180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 0.23.9 Reporter: Thomas Graves Assignee: Chen He Labels: documentation, newbie Fix For: 2.4.0 Attachments: Yarn-1180.patch The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1485) Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs
[ https://issues.apache.org/jira/browse/YARN-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846914#comment-13846914 ] Hadoop QA commented on YARN-1485: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618490/YARN-1485.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.TestRMNMSecretKeys {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2653//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2653//console This message is automatically generated. Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs - Key: YARN-1485 URL: https://issues.apache.org/jira/browse/YARN-1485 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1485.1.patch After YARN-1325, the YarnConfiguration.RM_HA_IDS will contain multiple RM_Ids. We need to verify that the RM service addresses configurations have been set for all of RM_Ids. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1415) In scheduler UI, including used memory in Memory Total seems to be inaccurate
[ https://issues.apache.org/jira/browse/YARN-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846938#comment-13846938 ] Siqi Li commented on YARN-1415: --- According to the tests in TestQueueMetric.java, AvailableMB is never deducted when allocating memory to applications. It actually means the total available memory of the cluster. Therefore, totalMB displayed in the UI should only include AvailableMB. In scheduler UI, including used memory in Memory Total seems to be inaccurate --- Key: YARN-1415 URL: https://issues.apache.org/jira/browse/YARN-1415 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Siqi Li Fix For: 2.1.0-beta Attachments: 1.png, 2.png Memory Total is currently a sum of availableMB, allocatedMB, and reservedMB. It seems that the term availableMB actually means total memory, since it doesn't get decreased when some jobs use a certain amount of memory. Hence, the Memory Total should not include allocatedMB, or availableMB doesn't get updated properly. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
[ https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846940#comment-13846940 ] Hudson commented on YARN-1311: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4877 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4877/]) Reverting YARN-1311. Fixed app specific scheduler-events' names to be app-attempt based. Contributed by Vinod Kumar Vavilapalli (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550594) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptRemovedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppRemovedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Fix For: 2.4.0 Attachments: YARN-1311-20131015.txt, YARN-1311-20131211.1.txt, YARN-1311-20131211.txt Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart. -- This message was
[jira] [Commented] (YARN-1415) In scheduler UI, including used memory in Memory Total seems to be inaccurate
[ https://issues.apache.org/jira/browse/YARN-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846951#comment-13846951 ] Sandy Ryza commented on YARN-1415: -- Currently, all the schedulers interpret available MB to mean non-allocated memory. Check out CSQueueUtils.updateQueueStatistics, FairScheduler.updateRootQueueMetrics, and FifoScheduler.nodeUpdate. If TestQueueMetrics does not reflect this, it's TestQueueMetrics that is misinterpreting. In scheduler UI, including used memory in Memory Total seems to be inaccurate --- Key: YARN-1415 URL: https://issues.apache.org/jira/browse/YARN-1415 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Siqi Li Fix For: 2.1.0-beta Attachments: 1.png, 2.png Memory Total is currently a sum of availableMB, allocatedMB, and reservedMB. It seems that the term availableMB actually means total memory, since it doesn't get decreased when some jobs use a certain amount of memory. Hence, the Memory Total should not include allocatedMB, or availableMB doesn't get updated properly. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1391) Lost node list should be identify by NodeId
[ https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846969#comment-13846969 ] Gera Shegalov commented on YARN-1391: - Siqi asked me to chime in. An example of a real yet untypical scenario I am aware of is an HPC scale up machine. A single node manager does not scale to manage all containers that can run concurrently there. So you have a choice of either unnecessarily fragmenting this machine into a bunch of smaller VM/OS's or run a bunch of NM's without any overhead of virtualization. It's always been possible to run multiple TT's in MRv1 as well. Lost node list should be identify by NodeId --- Key: YARN-1391 URL: https://issues.apache.org/jira/browse/YARN-1391 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1391.v1.patch in case of multiple node managers on a single machine. each of them should be identified by NodeId, which is more unique than just host name -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
[ https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847050#comment-13847050 ] Hudson commented on YARN-1311: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4878 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4878/]) YARN-1311. Fixed app specific scheduler-events' names to be app-attempt based. Contributed by Vinod Kumar Vavilapalli (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550613) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptRemovedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppRemovedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Fix For: 2.4.0 Attachments: YARN-1311-20131015.txt, YARN-1311-20131211.1.txt, YARN-1311-20131211.txt Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future,
[jira] [Commented] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based
[ https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847054#comment-13847054 ] Hudson commented on YARN-1311: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4879 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4879/]) Updated CHANGES.txt for YARN-1311. (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550615) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Fix app specific scheduler-events' names to be app-attempt based Key: YARN-1311 URL: https://issues.apache.org/jira/browse/YARN-1311 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Trivial Fix For: 2.4.0 Attachments: YARN-1311-20131015.txt, YARN-1311-20131211.1.txt, YARN-1311-20131211.txt Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-312) Add updateNodeResource in ResourceManagerAdministrationProtocol
[ https://issues.apache.org/jira/browse/YARN-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847062#comment-13847062 ] Junping Du commented on YARN-312: - Thanks [~vinodkv] for review and comments! All points make sense to me. Please see my reply. bq. The patch isn't applying anymore. Please update. Sure. Will update in next patch. bq. There's a better way to implement the map. See ApplicationACLMapProto in yarn_protos.proto for example and its usage. This should avoid the length checks in AdminService. In a similar vein, the java APIs can directly deal with maps. Thanks for your suggestion here. Yes. That seems better, will update in next patch. bq. Didn't review the previous patches, but I think we should have a better name instead of ResourceOption. Will file a JIRA. Yes. Please share your idea there. Thanks. bq. The UpdateNodeResourceRequest and response objects need to be @Public too? Yes. Nice catch. Will change it to public. bq. Failure handling: If there is an invalid node, should we reject the change completely or partially update all the correctly defined nodes? You'e done the former. Seems fine. May be say the same in the exception message? That we are rejecting all requests? I tried to keep it simple as getting rid of partial update. Will update the exception message. bq. Are you not doing the CLI support for the update resources in this patch? I think we should. Here or separate patch. Yes. This is major work for YARN-313. Make sense? bq. Again, didn't review previous patch. So we need to fix here or elsewhere: RMNode is supposed to be a read-only interface, so setsetResourceOption() doesn't belong there. It should be an event to the node informing the change in resource. That's a good point! Can we fix it in a separated JIRA given this patch is big enough and we may want it to be dedicated for RPC things? Add updateNodeResource in ResourceManagerAdministrationProtocol --- Key: YARN-312 URL: https://issues.apache.org/jira/browse/YARN-312 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.2.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-312-v1.patch, YARN-312-v2.patch, YARN-312-v3.patch, YARN-312-v4.1.patch, YARN-312-v4.patch, YARN-312-v5.1.patch, YARN-312-v5.patch, YARN-312-v6.patch, YARN-312-v7.1.patch, YARN-312-v7.1.patch, YARN-312-v7.patch, YARN-312-v8.patch Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's resource change. For design detail, please refer parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1495) Allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847081#comment-13847081 ] Vinod Kumar Vavilapalli commented on YARN-1495: --- Hi Sandy, some questions and quick thoughts on this ticket: - Any specific use-case? Example where it can be used? To justify this isn't feature creep. - What happens when scheduling-constraints are violated? The client will just get an error? It kind of depends on the type of scheduling constraint. - Who initiates the move any regular user or just admins? Given your description of ACLs, seems like any one. - Only running apps can be moved? There are races w.r.t apps that are submitted but not accepted and close-to-completion apps. - The ACLs choice seems straightforward and makes sense. There is some non-trivial stuff that needs ironing out, outside of schedulers. - While the move happens, -- Apps may be in the process of submitting new requests. What happens to them? I guess queue-move and new-requests should be synchronized. -- Preemption monitors will need to be notified. As they kind of know a lot about schedulers but sit outside the schedulers. -- there will be a potential wild-change in the head-room for the application. Allow moving apps between queues Key: YARN-1495 URL: https://issues.apache.org/jira/browse/YARN-1495 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza This is an umbrella JIRA for work needed to allow moving YARN applications from one queue to another. The work will consist of additions in the command line options, additions in the client RM protocol, and changes in the schedulers to support this. I have a picture of how this should function in the Fair Scheduler, but I'm not familiar enough with the Capacity Scheduler for the same there. Ultimately, the decision to whether an application can be moved should go down to the scheduler - some schedulers may wish not to support this at all. However, schedulers that do support it should share some common semantics around ACLs and what happens to running containers. Here is how I see the general semantics working out: * A move request is issued by the client. After it gets past ACLs, the scheduler checks whether executing the move will violate any constraints. For the Fair Scheduler, these would be queue maxRunningApps and queue maxResources constraints * All running containers are transferred from the old queue to the new queue * All outstanding requests are transferred from the old queue to the new queue Here is I see the ACLs of this working out: * To move an app from a queue a user must have modify access on the app or administer access on the queue * To move an app to a queue a user must have submit access on the queue or administer access on the queue -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-312) Add updateNodeResource in ResourceManagerAdministrationProtocol
[ https://issues.apache.org/jira/browse/YARN-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847084#comment-13847084 ] Vinod Kumar Vavilapalli commented on YARN-312: -- Sure, go ahead and update the patch. Tx. Add updateNodeResource in ResourceManagerAdministrationProtocol --- Key: YARN-312 URL: https://issues.apache.org/jira/browse/YARN-312 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.2.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-312-v1.patch, YARN-312-v2.patch, YARN-312-v3.patch, YARN-312-v4.1.patch, YARN-312-v4.patch, YARN-312-v5.1.patch, YARN-312-v5.patch, YARN-312-v6.patch, YARN-312-v7.1.patch, YARN-312-v7.1.patch, YARN-312-v7.patch, YARN-312-v8.patch Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's resource change. For design detail, please refer parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1029: --- Attachment: yarn-1029-0.patch Patch with tests for automatic and manual failover. The major pending item is adding the configs and description to yarn-default.xml. Will address that once we agree using ActiveStandbyElector is a simpler approach. BTW, this patch is to be applied on top of the latest one for YARN-1028. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1495) Allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847108#comment-13847108 ] Sandy Ryza commented on YARN-1495: -- Thanks for taking a look Vinod. bq. Any specific use-case? Example where it can be used? To justify this isn't feature creep. Yeah, we've seen requests for this a few times. I think the most common scenario is that someone experiences job slowly because of the queue that it's in and the job needs to be placed in a queue where it can complete more quickly. This can occur because it's taking longer than expected and a deadline is approaching, the original queue is fuller than expected, the job was submitted incorrectly in the first place but has made some progress, or for a number of other reasons. bq. What happens when scheduling-constraints are violated? The client will just get an error? It kind of depends on the type of scheduling constraint. Not sure how this should play out for the Capacity Scheduler, but for the Fair Scheduler constraints I mentioned in the description I think the client should get an error. I suppose another option would be to kill containers until the constraints would be satisfied, but I think this is a lot more work and not clearly better behavior. bq. Who initiates the move any regular user or just admins? My opinion is any regular user, within ACLs. I.e. if I could kill my job and resubmit it to a different queue, I should be able to move it. bq. Only running apps can be moved? I don't see a reason that we shouldn't be able to move an app that has been submitted, but not accepted, or that is very close to completion. In some cases we may not need to touch the scheduler. There are definitely race conditions we need to be careful of here. bq. Apps may be in the process of submitting new requests. What happens to them? I guess queue-move and new-requests should be synchronized. Right. Allow moving apps between queues Key: YARN-1495 URL: https://issues.apache.org/jira/browse/YARN-1495 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza This is an umbrella JIRA for work needed to allow moving YARN applications from one queue to another. The work will consist of additions in the command line options, additions in the client RM protocol, and changes in the schedulers to support this. I have a picture of how this should function in the Fair Scheduler, but I'm not familiar enough with the Capacity Scheduler for the same there. Ultimately, the decision to whether an application can be moved should go down to the scheduler - some schedulers may wish not to support this at all. However, schedulers that do support it should share some common semantics around ACLs and what happens to running containers. Here is how I see the general semantics working out: * A move request is issued by the client. After it gets past ACLs, the scheduler checks whether executing the move will violate any constraints. For the Fair Scheduler, these would be queue maxRunningApps and queue maxResources constraints * All running containers are transferred from the old queue to the new queue * All outstanding requests are transferred from the old queue to the new queue Here is I see the ACLs of this working out: * To move an app from a queue a user must have modify access on the app or administer access on the queue * To move an app to a queue a user must have submit access on the queue or administer access on the queue -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1498: - Attachment: YARN-1498-1.patch Common scheduler changes for moving apps between queues --- Key: YARN-1498 URL: https://issues.apache.org/jira/browse/YARN-1498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1498-1.patch, YARN-1498.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1498: - Summary: Common scheduler changes for moving apps between queues (was: RM changes for moving apps between queues) Common scheduler changes for moving apps between queues --- Key: YARN-1498 URL: https://issues.apache.org/jira/browse/YARN-1498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1498.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1498: - Description: This JIRA is to track changes that aren't in particular schedulers but that help them support moving apps between queues. Common scheduler changes for moving apps between queues --- Key: YARN-1498 URL: https://issues.apache.org/jira/browse/YARN-1498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1498-1.patch, YARN-1498.patch This JIRA is to track changes that aren't in particular schedulers but that help them support moving apps between queues. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1498) Common scheduler changes for moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847136#comment-13847136 ] Hadoop QA commented on YARN-1498: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618534/YARN-1498-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2655//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2655//console This message is automatically generated. Common scheduler changes for moving apps between queues --- Key: YARN-1498 URL: https://issues.apache.org/jira/browse/YARN-1498 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1498-1.patch, YARN-1498.patch This JIRA is to track changes that aren't in particular schedulers but that help them support moving apps between queues. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1495) Allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847134#comment-13847134 ] Sandy Ryza commented on YARN-1495: -- Also, a coding question you can maybe provide me guidance on? Ideally, we would like to return the RPC with whether or not the operation succeeded. However, we need to go down through the app, app attempt, and finally, scheduler to determine this. We could achieve this in a couple of ways: * Use an aync event at each level as is the convention (e.g. as is done for killing an application). Have the call in ClientRMService block and wait for things to get sorted out lower down before returning. Not entirely sure what we would wait for because the ClientRMService itself doesn't receive events. A Future might be clean. * Bypass events and go synchronously through to the scheduler. Is one of these preferred? Is there a third path I'm missing? Allow moving apps between queues Key: YARN-1495 URL: https://issues.apache.org/jira/browse/YARN-1495 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza This is an umbrella JIRA for work needed to allow moving YARN applications from one queue to another. The work will consist of additions in the command line options, additions in the client RM protocol, and changes in the schedulers to support this. I have a picture of how this should function in the Fair Scheduler, but I'm not familiar enough with the Capacity Scheduler for the same there. Ultimately, the decision to whether an application can be moved should go down to the scheduler - some schedulers may wish not to support this at all. However, schedulers that do support it should share some common semantics around ACLs and what happens to running containers. Here is how I see the general semantics working out: * A move request is issued by the client. After it gets past ACLs, the scheduler checks whether executing the move will violate any constraints. For the Fair Scheduler, these would be queue maxRunningApps and queue maxResources constraints * All running containers are transferred from the old queue to the new queue * All outstanding requests are transferred from the old queue to the new queue Here is I see the ACLs of this working out: * To move an app from a queue a user must have modify access on the app or administer access on the queue * To move an app to a queue a user must have submit access on the queue or administer access on the queue -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking
[ https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1363: -- Attachment: YARN-1363.3.patch Upload a new patch which include test cases and some bug fixes Get / Cancel / Renew delegation token api should be non blocking Key: YARN-1363 URL: https://issues.apache.org/jira/browse/YARN-1363 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Zhijie Shen Attachments: YARN-1363.1.patch, YARN-1363.2.patch, YARN-1363.3.patch Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are all blocking apis. * As a part of these calls we try to update RMStateStore and that may slow it down. * Now as we have limited number of client request handlers we may fill up client handlers quickly. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well
[ https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847208#comment-13847208 ] Zhijie Shen commented on YARN-1413: --- Some comments: 1. The wrong javadoc bellow: {code} + /* + * (non-Javadoc) + * + * @see + * org.apache.hadoop.mapreduce.v2.hs.webapp.AHSView#preHead(org.apache.hadoop + * .yarn.webapp.hamlet.Hamlet.HTML) + */ {code} {code} + /** + * The content of this page is the JobBlock + * + * @return HsJobBlock.class + */ {code} 2. I think the better way to construct the logURL in attempt/container blocks is to use ContainerReport.getLogURL directly (adding host:port prefix), instead of combining several attributes. The logURL should be set correctly in RMContainer final transition. [YARN-321] AHS WebUI should server aggregated logs as well -- Key: YARN-1413 URL: https://issues.apache.org/jira/browse/YARN-1413 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1413-1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)