[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-18: --- Attachment: YARN-18-v7.3.patch Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-794) YarnClientImpl.submitApplication() to add a timeout
[ https://issues.apache.org/jira/browse/YARN-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-794. - Resolution: Duplicate Fix Version/s: 3.0.0 2.1.0-beta YarnClientImpl.submitApplication() to add a timeout --- Key: YARN-794 URL: https://issues.apache.org/jira/browse/YARN-794 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Steve Loughran Priority: Minor Fix For: 2.1.0-beta, 3.0.0 {{YarnClientImpl.submitApplication()}} can spin forever waiting for the RM to accept the submission, ignoring interrupts on the sleep. # A timeout allows client applications to recognise and react to a failure of the RM to accept work in a timely manner. # The interrupt exception could be converted to an {{InterruptedIOException}} and raised within the current method signature -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: YARN-427-trunk-c.patch YARN-427-branch-0.23-c.patch Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679439#comment-13679439 ] Aleksey Gorshkov commented on YARN-427: --- Thanks for review, Jonathan. I've updated tests. patch YARN-427-branch-0.23-c.patch for branch-0.23 patch YARN-427-trunk-c.patch for trunk and branch-2 The coverage by last tests: org.apache.hadoop.yarn.server.api.impl.pb.client (86.7%) org.apache.hadoop.yarn.server.api.impl.pb.service (87.5%) org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb (80.7%) org.apache.hadoop.yarn.server.api.records.impl.pb (83.8%) org.apache.hadoop.yarn.server.nodemanager (82%) test testHeartbeatResponsePBImpl does not make sense in trunk and branch-2 because class HeartbeatResponsePBImpl presented only in branch-0.23 Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679444#comment-13679444 ] Hadoop QA commented on YARN-427: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587050/YARN-427-trunk-c.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1177//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1177//console This message is automatically generated. Coverage fix for org.apache.hadoop.yarn.server.api.* Key: YARN-427 URL: https://issues.apache.org/jira/browse/YARN-427 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-427-branch-0.23-b.patch, YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology
[ https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679544#comment-13679544 ] Junping Du commented on YARN-18: The findbugs warning seems to be unrelated. [~vicaya] and [~acmurthy], would you help to review it? Thanks! Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology - Key: YARN-18 URL: https://issues.apache.org/jira/browse/YARN-18 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Junping Du Assignee: Junping Du Labels: features Attachments: HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, MAPREDUCE-4309-v7.patch, Pluggable topologies with NodeGroup for YARN.pdf, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, YARN-18-v3.2.patch, YARN-18-v3.patch, YARN-18-v4.1.patch, YARN-18-v4.2.patch, YARN-18-v4.3.patch, YARN-18-v4.patch, YARN-18-v5.1.patch, YARN-18-v5.patch, YARN-18-v6.1.patch, YARN-18-v6.2.patch, YARN-18-v6.3.patch, YARN-18-v6.4.patch, YARN-18-v6.patch, YARN-18-v7.1.patch, YARN-18-v7.2.patch, YARN-18-v7.3.patch, YARN-18-v7.patch There are several classes in YARN’s container assignment and task scheduling algorithms that relate to data locality which were updated to give preference to running a container on other locality besides node-local and rack-local (like nodegroup-local). This propose to make these data structure/algorithms pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class ScheduledRequests was made a package level class to it would be easier to create a subclass, ScheduledRequestsWithNodeGroup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-641) Make AMLauncher in RM Use NMClient
[ https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-641: - Attachment: YARN-641.1.patch In the patch, ApplicationMasterLauncher is changed to extends NMClientAsync, and AMLauncher is changed to make use of NMClient APIs to start/stop AM containers. A number of tests that previously use ContainerManager APIs directly changed to use NMClient APIs instead. Last but not least, due to the mvn dependency check issue, all the tests in yarn-client has been moved to server-tests, yarn-client cleans the dependency on server sub-projects, and then server-resourcemanager add dependency on yarn-client. Make AMLauncher in RM Use NMClient -- Key: YARN-641 URL: https://issues.apache.org/jira/browse/YARN-641 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-641.1.patch YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions with an application's AM container. AMLauncher should also replace the raw ContainerManager proxy with NMClient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path
[ https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679703#comment-13679703 ] Chris Nauroth commented on YARN-766: Hi Sid, There are a couple of other minor differences between trunk and branch-2 for {{TestNodeManagerShutdown}}. Would you mind including those in your patch too, just so the files are identical and easier to maintain between the 2 branches? Below is the full output I'm seeing from {{git diff trunk branch-2 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java}} . Thank you! {code} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apa index e0db826..95c1c10 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/had +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/had @@ -149,8 +149,8 @@ public void testKillContainersOnShutdown() throws IOException, } public static void startContainer(NodeManager nm, ContainerId cId, - FileContext localFS, File scriptFileDir, File processStartFile) - throws IOException, YarnException { + FileContext localFS, File scriptFileDir, File processStartFile) + throws IOException, YarnException { File scriptFile = createUnhaltingScriptFile(cId, scriptFileDir, processStartFile); @@ -158,7 +158,7 @@ public static void startContainer(NodeManager nm, ContainerId cId, recordFactory.newRecordInstance(ContainerLaunchContext.class); NodeId nodeId = BuilderUtils.newNodeId(localhost, 1234); - + URL localResourceUri = ConverterUtils.getYarnUrlFromPath(localFS .makeQualified(new Path(scriptFile.getAbsolutePath(; @@ -235,7 +235,7 @@ private YarnConfiguration createNMConfig() { */ private static File createUnhaltingScriptFile(ContainerId cId, File scriptFileDir, File processStartFile) throws IOException { -File scriptFile = Shell.appendScriptExtension(scriptFileDir, scriptFile); +File scriptFile = new File(scriptFileDir, scriptFile.sh); PrintWriter fileWriter = new PrintWriter(scriptFile); if (Shell.WINDOWS) { fileWriter.println(@echo \Running testscript for delayed kill\); @@ -272,4 +272,4 @@ public void setMasterKey(MasterKey masterKey) { getNMContext().getContainerTokenSecretManager().setMasterKey(masterKey); } } -} \ No newline at end of file +} {code} TestNodeManagerShutdown should use Shell to form the output path Key: YARN-766 URL: https://issues.apache.org/jira/browse/YARN-766 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Priority: Minor Attachments: YARN-766.txt File scriptFile = new File(tmpDir, scriptFile.sh); should be replaced with File scriptFile = Shell.appendScriptExtension(tmpDir, scriptFile); to match trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-641) Make AMLauncher in RM Use NMClient
[ https://issues.apache.org/jira/browse/YARN-641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679733#comment-13679733 ] Hadoop QA commented on YARN-641: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587109/YARN-641.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 13 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1178//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1178//console This message is automatically generated. Make AMLauncher in RM Use NMClient -- Key: YARN-641 URL: https://issues.apache.org/jira/browse/YARN-641 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-641.1.patch YARN-422 adds NMClient. RM's AMLauncher is responsible for the interactions with an application's AM container. AMLauncher should also replace the raw ContainerManager proxy with NMClient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-767) Initialize Application status metrics when QueueMetrics is initialized
[ https://issues.apache.org/jira/browse/YARN-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679763#comment-13679763 ] Jian He commented on YARN-767: -- we cannot put the initAppStatusMetrics immediately after forQueue is called because by that time metricsSystem has not been initialized. Initialize Application status metrics when QueueMetrics is initialized --- Key: YARN-767 URL: https://issues.apache.org/jira/browse/YARN-767 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-767.1.patch, YARN-767.2.patch, YARN-767.3.patch Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed For now these metrics are created only when they are needed, we want to make them be seen when QueueMetrics is initialized -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-612) Cleanup BuilderUtils
[ https://issues.apache.org/jira/browse/YARN-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-612: -- Attachment: yarn-612-2.patch Uploading a patch that cleans up BuilderUtils further. [~vinodkv], I tried to remove BuilderUtils altogether, but there are 100s of calls to the BuilderUtils methods. In most of these cases, BuilderUtils simplify code and avoid duplication - I think we should hold onto BuilderUtils. Cleanup BuilderUtils Key: YARN-612 URL: https://issues.apache.org/jira/browse/YARN-612 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Karthik Kambatla Attachments: yarn-612-1.patch, yarn-612-2.patch There's 4 different methods to create ApplicationId. There's likely other such methods as well which could be consolidated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-612) Cleanup BuilderUtils
[ https://issues.apache.org/jira/browse/YARN-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679805#comment-13679805 ] Hadoop QA commented on YARN-612: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587114/yarn-612-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1179//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1179//console This message is automatically generated. Cleanup BuilderUtils Key: YARN-612 URL: https://issues.apache.org/jira/browse/YARN-612 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Karthik Kambatla Attachments: yarn-612-1.patch, yarn-612-2.patch There's 4 different methods to create ApplicationId. There's likely other such methods as well which could be consolidated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679884#comment-13679884 ] Sandy Ryza commented on YARN-791: - Uploading a patch that does the latter. It also refactors the web API and the RPC API to use the same underlying code. Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-791: Attachment: YARN-791.patch Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-752) In AMRMClient, automatically add corresponding rack requests for requested nodes
[ https://issues.apache.org/jira/browse/YARN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679899#comment-13679899 ] Sandy Ryza commented on YARN-752: - [~bikassaha], your updated patch looks good to me. Noticed a minor spacing issue that originated in my code, uploading a patch to fix it. In AMRMClient, automatically add corresponding rack requests for requested nodes Key: YARN-752 URL: https://issues.apache.org/jira/browse/YARN-752 Project: Hadoop YARN Issue Type: Improvement Components: api, applications Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-752-1.patch, YARN-752-1.patch, YARN-752-2.patch, YARN-752.3.patch, YARN-752.4.patch, YARN-752.patch A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-612) Cleanup BuilderUtils
[ https://issues.apache.org/jira/browse/YARN-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679902#comment-13679902 ] Jian He commented on YARN-612: -- Hi [~kkambatl], we have already added built-in factory method for each record, you can directly call the factory method in each record to replace those BuilderUtils methods. Cleanup BuilderUtils Key: YARN-612 URL: https://issues.apache.org/jira/browse/YARN-612 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Karthik Kambatla Attachments: yarn-612-1.patch, yarn-612-2.patch There's 4 different methods to create ApplicationId. There's likely other such methods as well which could be consolidated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-752) In AMRMClient, automatically add corresponding rack requests for requested nodes
[ https://issues.apache.org/jira/browse/YARN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-752: Attachment: YARN-752-5.patch In AMRMClient, automatically add corresponding rack requests for requested nodes Key: YARN-752 URL: https://issues.apache.org/jira/browse/YARN-752 Project: Hadoop YARN Issue Type: Improvement Components: api, applications Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-752-1.patch, YARN-752-1.patch, YARN-752-2.patch, YARN-752.3.patch, YARN-752.4.patch, YARN-752-5.patch, YARN-752.patch A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679912#comment-13679912 ] Hadoop QA commented on YARN-791: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587125/YARN-791.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1180//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1180//console This message is automatically generated. Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE
[ https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-299: --- Attachment: YARN-299-trunk-1.patch Adding patch. Thanks, Mayank Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE --- Key: YARN-299 URL: https://issues.apache.org/jira/browse/YARN-299 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-299-trunk-1.patch {code:xml} 2012-12-31 10:36:27,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-12-31 10:36:27,845 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1356792558130_0002_01_01 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-752) In AMRMClient, automatically add corresponding rack requests for requested nodes
[ https://issues.apache.org/jira/browse/YARN-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679915#comment-13679915 ] Hadoop QA commented on YARN-752: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587130/YARN-752-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1181//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1181//console This message is automatically generated. In AMRMClient, automatically add corresponding rack requests for requested nodes Key: YARN-752 URL: https://issues.apache.org/jira/browse/YARN-752 Project: Hadoop YARN Issue Type: Improvement Components: api, applications Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-752-1.patch, YARN-752-1.patch, YARN-752-2.patch, YARN-752.3.patch, YARN-752.4.patch, YARN-752-5.patch, YARN-752.patch A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-795) Fair scheduler root queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-795: - Description: The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because QueueMetrics.getAllocateResources() doesn't return the allocated vCores. was: The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because codeQueueMetrics.getAllocateResources()code doesn't return the allocated vCores. Fair scheduler root queue metrics should subtract allocated vCores from available vCores Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because QueueMetrics.getAllocateResources() doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-795) Fair scheduler root queue metrics should subtract allocated vCores from available vCores
Wei Yan created YARN-795: Summary: Fair scheduler root queue metrics should subtract allocated vCores from available vCores Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because codeQueueMetrics.getAllocateResources()code doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-795) Fair scheduler root queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-795: - Description: The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. was: The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because QueueMetrics.getAllocateResources() doesn't return the allocated vCores. Fair scheduler root queue metrics should subtract allocated vCores from available vCores Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-795) Fair scheduler root queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-795: - Description: The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because QueueMetrics.getAllocateResources() doesn't return the allocated vCores. was: The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because QueueMetrics.getAllocateResources() doesn't return the allocated vCores. Fair scheduler root queue metrics should subtract allocated vCores from available vCores Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because QueueMetrics.getAllocateResources() doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-795: - Summary: Fair scheduler queue metrics should subtract allocated vCores from available vCores (was: Fair scheduler root queue metrics should subtract allocated vCores from available vCores) Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-795: - Description: The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. was: The root queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE
[ https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679940#comment-13679940 ] Hadoop QA commented on YARN-299: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587131/YARN-299-trunk-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1182//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1182//console This message is automatically generated. Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE --- Key: YARN-299 URL: https://issues.apache.org/jira/browse/YARN-299 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-299-trunk-1.patch {code:xml} 2012-12-31 10:36:27,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-12-31 10:36:27,845 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1356792558130_0002_01_01 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal reassigned YARN-502: -- Assignee: Mayank Bansal RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679941#comment-13679941 ] Mayank Bansal commented on YARN-502: Thanks [~sandyr] I did not reprduce it. As no body is working , let me take a look Thanks, Mayank RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679982#comment-13679982 ] Mayank Bansal commented on YARN-502: By Looking at the code looks like if there is race condition between ReconnectNodeTransition and UnhealthyTrabsntion in event dispatcher This condition may arrise when Nodemanager tries to register itself and ResourceTrackerService puts this node in the Nodes list and schedule the event for recoonect however in the mean time there is an unhealthy event come first to RM and it deletes this Node from the Nodes map. Thanks, Mayank RM crash with NPE on NODE_REMOVED event --- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-795: - Attachment: YARN-795.patch Uploaded patch that subtracts allocated vcores from available vcores in QueueMetrics. Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680028#comment-13680028 ] Hadoop QA commented on YARN-795: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587150/YARN-795.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1183//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1183//console This message is automatically generated. Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-796) Allow for (admin) labels on nodes and resource-requests
Arun C Murthy created YARN-796: -- Summary: Allow for (admin) labels on nodes and resource-requests Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-796: --- Description: It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. was: It will be useful for admins to specify labels for nodes. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Arun C Murthy It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-731) RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions
[ https://issues.apache.org/jira/browse/YARN-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-731: - Attachment: YARN-731.1.patch Added the block in RPCUtil#unwrapAndThrowException to handle the runtime exceptions. Corresponding tests are added. RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions -- Key: YARN-731 URL: https://issues.apache.org/jira/browse/YARN-731 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Zhijie Shen Attachments: YARN-731.1.patch Will be required for YARN-662. Also, remote NPEs show up incorrectly for some unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-782) vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way
[ https://issues.apache.org/jira/browse/YARN-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-782: Attachment: YARN-782.patch vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way - Key: YARN-782 URL: https://issues.apache.org/jira/browse/YARN-782 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Attachments: YARN-782.patch The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not. If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions. The lack of consistency will exacerbate the already difficult problem of resource configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-782) vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way
[ https://issues.apache.org/jira/browse/YARN-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680109#comment-13680109 ] Sandy Ryza commented on YARN-782: - My opinion is that it would be best to remove the property altogether, as it's an unnecessary layer of indirection. Will upload a patch that does this. vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way - Key: YARN-782 URL: https://issues.apache.org/jira/browse/YARN-782 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Attachments: YARN-782.patch The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not. If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions. The lack of consistency will exacerbate the already difficult problem of resource configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680114#comment-13680114 ] Sandy Ryza commented on YARN-795: - Thanks for working on this, Wei. The patch looks good. My only nit is that there's an unnecessary whitespace change in TestFairScheduler. Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680149#comment-13680149 ] Hadoop QA commented on YARN-795: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587187/YARN-795-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1185//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1185//console This message is automatically generated. Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-795-2.patch, YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-782) vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way
[ https://issues.apache.org/jira/browse/YARN-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680150#comment-13680150 ] Hadoop QA commented on YARN-782: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587178/YARN-782.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1186//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1186//console This message is automatically generated. vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way - Key: YARN-782 URL: https://issues.apache.org/jira/browse/YARN-782 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Attachments: YARN-782.patch The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not. If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions. The lack of consistency will exacerbate the already difficult problem of resource configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-795) Fair scheduler queue metrics should subtract allocated vCores from available vCores
[ https://issues.apache.org/jira/browse/YARN-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680163#comment-13680163 ] Sandy Ryza commented on YARN-795: - +1 Fair scheduler queue metrics should subtract allocated vCores from available vCores --- Key: YARN-795 URL: https://issues.apache.org/jira/browse/YARN-795 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-795-2.patch, YARN-795.patch The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680176#comment-13680176 ] Bikas Saha commented on YARN-569: - Sorry for the delayed response. This doesnt seem to affect the fair scheduler or does it? If not, then it can be misleading for users. {code} + public static final String RM_SCHEDULER_ENABLE_PREEMPTION = {code} Missing default? {code} + /** List of ScheduleEditPolicy classes affecting scheduler preemption. */ + public static final String RM_SCHEDULER_PREEMPTION_POLICIES = +RM_PREFIX + scheduler.preemption.policies; {code} Why cast when one has generic T? {code} +public RMContainerPreemptEventDispatcher(ResourceScheduler scheduler) { + this.scheduler = (T) scheduler; +} {code} How do we envisage multiple policies working together without stepping on each other? Better off limiting to 1? {code} +for (ScheduleEditPolicy policy : policies) { + LOG.info(LOADING ScheduleEditPolicy: + policy.toString()); + policy.init(conf, this.rmContext.getDispatcher().getEventHandler(), + (PreemptableResourceScheduler) scheduler); + // preemption service, periodically check whether we need to + // preempt to guarantee capacity constraints + ScheduleMonitor mon = new ScheduleMonitor(policy); + addService(mon); + +} {code} Might be a personal choice but ScheduleMonitor or ScheduleEditPolicy would sound better if they used Scheduling instead of Schedule. Why would we want to get this from the policy (which seems natural) as well as be able to set it. If it needs to be configurable then it can be done via the policy config right? {code} + protected void setMonitorInterval(int monitorInterval) { +this.monitorInterval = monitorInterval; + } {code} Having multiple threads named Preemption Checker will probably not help debugging. Not joining the thread to make sure its cleaned up? {code} + public void stop() { +stopped = true; +if (checkerThread != null) { + checkerThread.interrupt(); +} {code} Nothing else other than this seems to be synchronized. Then why this? {code} + private class PreepmtionChecker implements Runnable { +@Override +public void run() {+ while (!stopped !Thread.currentThread().isInterrupted()) { +synchronized (ScheduleMonitor.this) { {code} Couldnt quite grok this. What is delta? What is 0.5? A percentage? Whats the math behind the calculation? Should it be even absent preemption instead of even absent natural termination? Is this applied before or after TOTAL_PREEMPTION_PER_ROUND? {code} + /** + * Given a computed preemption target, account for containers naturally + * expiring and preempt only this percentage of the delta. This determines + * the rate of geometric convergence into the deadzone ({@link + * #MAX_IGNORED_OVER_CAPACITY}). For example, a termination factor of 0.5 + * will reclaim almost 95% of resources within 5 * {@link + * #WAIT_TIME_BEFORE_KILL}, even absent natural termination. */ + public static final String NATURAL_TERMINATION_FACTOR = {code} In which config file do these above configurations go when defined by the admin? Shouldnt they be defined in the config defaults of that file? e.g. capacity-scheduler.xml? If they get it from the scheduler config then we probably shouldnt pass it a configuration object during init. RMContainer already has the ApplicationAttemptId inside it. No need for extra args. {code} + void preemptContainer(ApplicationAttemptId aid, RMContainer container); {code} Why no lock here when the other new methods have a lock? Do we not care that the app remains in applications during the duration of the operations? {code} + @Override + @Lock(Lock.NoLock.class) + public void preemptContainer(ApplicationAttemptId aid, RMContainer cont) { +if(LOG.isDebugEnabled()){ + LOG.debug(PREEMPT_CONTAINER: application: + aid.toString() + + container: + cont.toString()); +} +FiCaSchedulerApp app = applications.get(aid); +if (app != null) { + app.addPreemptContainer(cont.getContainerId()); +} + } {code} UnmodifiableSet? {code} + // need to access the list of apps from the preemption monitor + public SetFiCaSchedulerApp getApplications() { +return activeApplications; + } {code} containersToPreempt? {code} + private final SetContainerId containerToPreempt = {code} There is one critical difference between old and new behavior. The new code will not send the finish event to the container if its not part of the liveContainers. This probably is wrong. Secondly, the parent/queue metrics etc are not updated also. I am not sure if this book-keeping is actually designed to be in sync with liveContainers - which is what the new code enforces it to be. Same comment for the hierarchical callers of this method who now