[jira] [Commented] (YARN-2187) FairScheduler: Disable max-AM-share check by default
[ https://issues.apache.org/jira/browse/YARN-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039726#comment-14039726 ] Karthik Kambatla commented on YARN-2187: +1. Committing this. FairScheduler: Disable max-AM-share check by default Key: YARN-2187 URL: https://issues.apache.org/jira/browse/YARN-2187 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2187.patch Say you have a small cluster with 8gb memory and 5 queues. This means that equal queue can have 8gb / 5 = 1.6gb but an AM requires 2gb to start so no AMs can be started. By default, max-am-share check should be disabled so users don't see a regression. On medium-sized clusters, it still makes sense to set the max-am-share to a value between 0 and 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2187) FairScheduler: Disable max-AM-share check by default
[ https://issues.apache.org/jira/browse/YARN-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039729#comment-14039729 ] Hudson commented on YARN-2187: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5749 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5749/]) YARN-2187. FairScheduler: Disable max-AM-share check by default. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604321) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm FairScheduler: Disable max-AM-share check by default Key: YARN-2187 URL: https://issues.apache.org/jira/browse/YARN-2187 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.5.0 Attachments: YARN-2187.patch Say you have a small cluster with 8gb memory and 5 queues. This means that equal queue can have 8gb / 5 = 1.6gb but an AM requires 2gb to start so no AMs can be started. By default, max-am-share check should be disabled so users don't see a regression. On medium-sized clusters, it still makes sense to set the max-am-share to a value between 0 and 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039733#comment-14039733 ] Niels Basjes commented on YARN-1680: Looks like YARN-2105 has been fixed. Can someone please retrigger this patch? availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039744#comment-14039744 ] Varun Vasudev commented on YARN-1039: - I agree with [~zjshen]. Using the tags field also means we don't have to worry about switching to an enum like [~cwelch] mentioned in one of earlier comments. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2187) FairScheduler: Disable max-AM-share check by default
[ https://issues.apache.org/jira/browse/YARN-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039783#comment-14039783 ] Hudson commented on YARN-2187: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #590 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/590/]) YARN-2187. FairScheduler: Disable max-AM-share check by default. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604321) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm FairScheduler: Disable max-AM-share check by default Key: YARN-2187 URL: https://issues.apache.org/jira/browse/YARN-2187 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.5.0 Attachments: YARN-2187.patch Say you have a small cluster with 8gb memory and 5 queues. This means that equal queue can have 8gb / 5 = 1.6gb but an AM requires 2gb to start so no AMs can be started. By default, max-am-share check should be disabled so users don't see a regression. On medium-sized clusters, it still makes sense to set the max-am-share to a value between 0 and 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2187) FairScheduler: Disable max-AM-share check by default
[ https://issues.apache.org/jira/browse/YARN-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039843#comment-14039843 ] Hudson commented on YARN-2187: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1781/]) YARN-2187. FairScheduler: Disable max-AM-share check by default. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604321) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm FairScheduler: Disable max-AM-share check by default Key: YARN-2187 URL: https://issues.apache.org/jira/browse/YARN-2187 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.5.0 Attachments: YARN-2187.patch Say you have a small cluster with 8gb memory and 5 queues. This means that equal queue can have 8gb / 5 = 1.6gb but an AM requires 2gb to start so no AMs can be started. By default, max-am-share check should be disabled so users don't see a regression. On medium-sized clusters, it still makes sense to set the max-am-share to a value between 0 and 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2187) FairScheduler: Disable max-AM-share check by default
[ https://issues.apache.org/jira/browse/YARN-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039867#comment-14039867 ] Hudson commented on YARN-2187: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1808 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1808/]) YARN-2187. FairScheduler: Disable max-AM-share check by default. (Robert Kanter via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1604321) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm FairScheduler: Disable max-AM-share check by default Key: YARN-2187 URL: https://issues.apache.org/jira/browse/YARN-2187 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.5.0 Attachments: YARN-2187.patch Say you have a small cluster with 8gb memory and 5 queues. This means that equal queue can have 8gb / 5 = 1.6gb but an AM requires 2gb to start so no AMs can be started. By default, max-am-share check should be disabled so users don't see a regression. On medium-sized clusters, it still makes sense to set the max-am-share to a value between 0 and 1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039934#comment-14039934 ] Carlo Curino commented on YARN-2144: I only skimmed the patch quickly, but I see several places where you are changing method signatures by adding booleans to communicate the container was preempted. Would it be possibly to use/extend some of the container state / event objects that are already passed around? It might be less intrusive, and if we ever get to different levels of preemption or anything like that, will also be more flexible of a mechanism. Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU
[ https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039973#comment-14039973 ] Steve Loughran commented on YARN-2190: -- # what would the implications for the move windows 8 for the API? What does it mean for server versions and builds? # something is cutting off all the ASF copyright comments ... probably the IDE. That'll have the RAT tool complaining. It may be something to ignore during iterative development , but would need to be fixed before committing # would it be possible to have a command line like {code} task create --memory 2048 name command-line {code} so that new options could go in (--cpu, --io) without confusion...the current approach looks a bit brittle Provide a Windows container executor that can limit memory and CPU -- Key: YARN-2190 URL: https://issues.apache.org/jira/browse/YARN-2190 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Chuan Liu Attachments: YARN-2190-prototype.patch Yarn default container executor on Windows does not set the resource limit on the containers currently. The memory limit is enforced by a separate monitoring thread. The container implementation on Windows uses Job Object right now. The latest Windows (8 or later) API allows CPU and memory limits on the job objects. We want to create a Windows container executor that sets the limits on job objects thus provides resource enforcement at OS level. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039976#comment-14039976 ] Steve Loughran commented on YARN-1039: -- # I'd make the long-lived flag a container request, *not the AM launch request*. An AM may wish to indicate that some containers are shortlife, others long-lived. # If the tag approach lets my AM add this request while running with the 2.4 JARs -even though the hint will be ignored- I'm happy. Protobuf may be agile, but the generated proto classes aren't, and working with fields directly is hard to do, introspection brittle. I know that from working with the am restart flag. # Otherwise, I'd like a long64 with bits we can set and read. It's the cross-platform way and would give us a single field for future additions Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040001#comment-14040001 ] Zhijie Shen commented on YARN-1039: --- bq. An AM may wish to indicate that some containers are shortlife, others long-lived. Container-level long-live flag is an interesting idea. Given any container of an app is long-lived, the AM container is automatically going to be long-lived as well, right? Suppose AM should last until the exit of the whole app. Shall we mark an app long-lived, and then allow long-lived app to start a long-lived container? bq. If the tag approach lets my AM add this request while running with the 2.4 JARs even though the hint will be ignored I'm happy. If the granularity is going to be container, the tag may not help, as it's an application-level information Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Priority: Minor Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2144) Add logs when preemption occurs
[ https://issues.apache.org/jira/browse/YARN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040010#comment-14040010 ] Wangda Tan commented on YARN-2144: -- Since suggested by [~jianhe], to propose of this JIRA, I will simply add logs to CapacityScheduler.killContainer(). [~curino], thanks for your comment, since I may not need change event objects as suggested by Jian, I will do as your suggested when working on other items like YARN-2181 Add logs when preemption occurs --- Key: YARN-2144 URL: https://issues.apache.org/jira/browse/YARN-2144 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.5.0 Reporter: Tassapol Athiapinya Assignee: Wangda Tan Attachments: AM-page-preemption-info.png, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch, YARN-2144.patch There should be easy-to-read logs when preemption does occur. RM logs should have following properties: * Logs are retrievable when an application is still running and often flushed. * Can distinguish between AM container preemption and task container preemption with container ID shown. * Should be INFO level log. -- This message was sent by Atlassian JIRA (v6.2#6252)