[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Horvath updated YARN-2280: Attachment: YARN-2280.patch Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Minor Fix For: 2.5.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Horvath updated YARN-2280: Attachment: (was: YARN-2280.patch) Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Minor Fix For: 2.5.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075334#comment-14075334 ] Hadoop QA commented on YARN-2280: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657984/YARN-2280.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4447//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4447//console This message is automatically generated. Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Minor Fix For: 2.5.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private
[ https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075338#comment-14075338 ] Hudson commented on YARN-2335: -- FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/624/]) YARN-2335. Annotate all hadoop-sls APIs as @Private. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613478) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RumenToSLSConverter.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/conf/SLSConfiguration.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/CapacitySchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ContainerSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FifoSchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/NodeUpdateSchedulerEventWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Annotate all hadoop-sls APIs as @Private Key: YARN-2335 URL: https://issues.apache.org/jira/browse/YARN-2335 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Fix For: 2.5.0 Attachments: YARN-2335-1.branch2.patch, YARN-2335-1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075346#comment-14075346 ] Hudson commented on YARN-1796: -- FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/624/]) YARN-1796. container-executor shouldn't require o-r permissions. Contributed by Aaron T. Myers. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613548) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c container-executor shouldn't require o-r permissions Key: YARN-1796 URL: https://issues.apache.org/jira/browse/YARN-1796 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Fix For: 2.6.0 Attachments: YARN-1796.patch The container-executor currently checks that other users don't have read permissions. This is unnecessary and runs contrary to the debian packaging policy manual. This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075344#comment-14075344 ] Hudson commented on YARN-2211: -- FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/624/]) YARN-2211. Persist AMRMToken master key in RMStateStore for RM recovery. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613515) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/AMRMTokenSecretManagerState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/AMRMTokenSecretManagerStatePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java RMStateStore needs to save
[jira] [Commented] (YARN-2214) FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075342#comment-14075342 ] Hudson commented on YARN-2214: -- FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/624/]) YARN-2214. FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness. (Ashwin Shankar via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613459) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness -- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.6.0 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1726) ResourceSchedulerWrapper broken due to AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075341#comment-14075341 ] Hudson commented on YARN-1726: -- FAILURE: Integrated in Hadoop-Yarn-trunk #624 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/624/]) YARN-1726. Add missing files. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613552) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java YARN-1726. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613547) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/TestSLSRunner.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt ResourceSchedulerWrapper broken due to AbstractYarnScheduler Key: YARN-1726 URL: https://issues.apache.org/jira/browse/YARN-1726 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Wei Yan Assignee: Wei Yan Priority: Blocker Fix For: 2.5.0 Attachments: YARN-1726-5.patch, YARN-1726-6-branch2.patch, YARN-1726-6.patch, YARN-1726-7-branch2.patch, YARN-1726-7.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch The YARN scheduler simulator failed when running Fair Scheduler, due to AbstractYarnScheduler introduced in YARN-1041. The ResourceSchedulerWrapper should inherit AbstractYarnScheduler, instead of implementing ResourceScheduler interface directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private
[ https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075385#comment-14075385 ] Hudson commented on YARN-2335: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/]) YARN-2335. Annotate all hadoop-sls APIs as @Private. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613478) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RumenToSLSConverter.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/conf/SLSConfiguration.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/CapacitySchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ContainerSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FifoSchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/NodeUpdateSchedulerEventWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Annotate all hadoop-sls APIs as @Private Key: YARN-2335 URL: https://issues.apache.org/jira/browse/YARN-2335 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Fix For: 2.5.0 Attachments: YARN-2335-1.branch2.patch, YARN-2335-1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2214) FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075389#comment-14075389 ] Hudson commented on YARN-2214: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/]) YARN-2214. FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness. (Ashwin Shankar via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613459) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness -- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.6.0 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1726) ResourceSchedulerWrapper broken due to AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075388#comment-14075388 ] Hudson commented on YARN-1726: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/]) YARN-1726. Add missing files. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613552) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java YARN-1726. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613547) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/TestSLSRunner.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt ResourceSchedulerWrapper broken due to AbstractYarnScheduler Key: YARN-1726 URL: https://issues.apache.org/jira/browse/YARN-1726 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Wei Yan Assignee: Wei Yan Priority: Blocker Fix For: 2.5.0 Attachments: YARN-1726-5.patch, YARN-1726-6-branch2.patch, YARN-1726-6.patch, YARN-1726-7-branch2.patch, YARN-1726-7.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch The YARN scheduler simulator failed when running Fair Scheduler, due to AbstractYarnScheduler introduced in YARN-1041. The ResourceSchedulerWrapper should inherit AbstractYarnScheduler, instead of implementing ResourceScheduler interface directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075391#comment-14075391 ] Hudson commented on YARN-2211: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/]) YARN-2211. Persist AMRMToken master key in RMStateStore for RM recovery. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613515) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/AMRMTokenSecretManagerState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/AMRMTokenSecretManagerStatePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java RMStateStore needs to
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075393#comment-14075393 ] Hudson commented on YARN-1796: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1816 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1816/]) YARN-1796. container-executor shouldn't require o-r permissions. Contributed by Aaron T. Myers. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613548) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c container-executor shouldn't require o-r permissions Key: YARN-1796 URL: https://issues.apache.org/jira/browse/YARN-1796 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Fix For: 2.6.0 Attachments: YARN-1796.patch The container-executor currently checks that other users don't have read permissions. This is unnecessary and runs contrary to the debian packaging policy manual. This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private
[ https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075402#comment-14075402 ] Hudson commented on YARN-2335: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/]) YARN-2335. Annotate all hadoop-sls APIs as @Private. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613478) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/RumenToSLSConverter.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/conf/SLSConfiguration.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/CapacitySchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ContainerSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FairSchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/FifoSchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/NodeUpdateSchedulerEventWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Annotate all hadoop-sls APIs as @Private Key: YARN-2335 URL: https://issues.apache.org/jira/browse/YARN-2335 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Fix For: 2.5.0 Attachments: YARN-2335-1.branch2.patch, YARN-2335-1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075410#comment-14075410 ] Hudson commented on YARN-1796: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/]) YARN-1796. container-executor shouldn't require o-r permissions. Contributed by Aaron T. Myers. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613548) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c container-executor shouldn't require o-r permissions Key: YARN-1796 URL: https://issues.apache.org/jira/browse/YARN-1796 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Fix For: 2.6.0 Attachments: YARN-1796.patch The container-executor currently checks that other users don't have read permissions. This is unnecessary and runs contrary to the debian packaging policy manual. This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075408#comment-14075408 ] Hudson commented on YARN-2211: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/]) YARN-2211. Persist AMRMToken master key in RMStateStore for RM recovery. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613515) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/AMRMTokenSecretManagerState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/AMRMTokenSecretManagerStatePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AMRMTokenSecretManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java RMStateStore
[jira] [Commented] (YARN-1726) ResourceSchedulerWrapper broken due to AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075405#comment-14075405 ] Hudson commented on YARN-1726: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/]) YARN-1726. Add missing files. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613552) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java YARN-1726. ResourceSchedulerWrapper broken due to AbstractYarnScheduler. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613547) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/TestSLSRunner.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt ResourceSchedulerWrapper broken due to AbstractYarnScheduler Key: YARN-1726 URL: https://issues.apache.org/jira/browse/YARN-1726 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Wei Yan Assignee: Wei Yan Priority: Blocker Fix For: 2.5.0 Attachments: YARN-1726-5.patch, YARN-1726-6-branch2.patch, YARN-1726-6.patch, YARN-1726-7-branch2.patch, YARN-1726-7.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch, YARN-1726.patch The YARN scheduler simulator failed when running Fair Scheduler, due to AbstractYarnScheduler introduced in YARN-1041. The ResourceSchedulerWrapper should inherit AbstractYarnScheduler, instead of implementing ResourceScheduler interface directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2214) FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness
[ https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075406#comment-14075406 ] Hudson commented on YARN-2214: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1843 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1843/]) YARN-2214. FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness. (Ashwin Shankar via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613459) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness -- Key: YARN-2214 URL: https://issues.apache.org/jira/browse/YARN-2214 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.6.0 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt preemptContainerPreCheck() in FSParentQueue rejects preemption requests if the parent queue is below fair share. This can cause a delay in converging towards fairness when the starved leaf queue and the queue above fairshare belong under a non-root parent queue(ie their least common ancestor is a parent queue which is not root). Here is an example : root.parent has fair share = 80% and usage = 80% root.parent.child1 has fair share =40% usage = 80% root.parent.child2 has fair share=40% usage=0% Now a job is submitted to child2 and the demand is 40%. Preemption will kick in and try to reclaim all the 40% from child1. When it preempts the first container from child1,the usage of root.parent will become 80%, which is less than root.parent's fair share,causing preemption to stop.So only one container gets preempted in this round although the need is a lot more. child2 would eventually get to half its fair share but only after multiple rounds of preemption. Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it only in FSLeafQueue(which is already there). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps
Ram Venkatesh created YARN-2362: --- Summary: Capacity Scheduler apps with requests that exceed capacity can starve pending apps Key: YARN-2362 URL: https://issues.apache.org/jira/browse/YARN-2362 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.4.1 Reporter: Ram Venkatesh Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps
[ https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Venkatesh updated YARN-2362: Description: Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. {code:title=LeafQueue.java|borderStyle=solid} // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } {code} With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. was: Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. Capacity Scheduler apps with requests that exceed capacity can starve pending apps -- Key: YARN-2362 URL: https://issues.apache.org/jira/browse/YARN-2362 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.4.1 Reporter: Ram Venkatesh Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. {code:title=LeafQueue.java|borderStyle=solid} // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } {code} With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2346) Add a 'status' command to yarn-daemon.sh
[ https://issues.apache.org/jira/browse/YARN-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075462#comment-14075462 ] Nikunj Bansal commented on YARN-2346: - HADOOP-9902 is being resolved for 3.0.0. Meanwhile for 2.5.0 I do have a patch based on the current scripts. Add a 'status' command to yarn-daemon.sh Key: YARN-2346 URL: https://issues.apache.org/jira/browse/YARN-2346 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1 Reporter: Nikunj Bansal Assignee: Allen Wittenauer Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Adding a 'status' command to yarn-daemon.sh will be useful for finding out the status of yarn daemons. Running the 'status' command should exit with a 0 exit code if the target daemon is running and non-zero code in case its not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps
[ https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075476#comment-14075476 ] Chen He commented on YARN-2362: --- This is interesting. In general, user may not submit an application that asks for 50% of the whole cluster resources. It is possible that a cluster has more than 2 applications. If third application finishes, App2 can get enough resource and run. Then, deadlock breaks. Is this reasonable, [~venkateshrin] ? Capacity Scheduler apps with requests that exceed capacity can starve pending apps -- Key: YARN-2362 URL: https://issues.apache.org/jira/browse/YARN-2362 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.4.1 Reporter: Ram Venkatesh Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. {code:title=LeafQueue.java|borderStyle=solid} // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } {code} With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2346) Add a 'status' command to yarn-daemon.sh
[ https://issues.apache.org/jira/browse/YARN-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075508#comment-14075508 ] Allen Wittenauer commented on YARN-2346: In order to make this work reliably, it's a significant refactoring of how daemons launch. All of that refactoring has already been done in HADOOP-9902. Specifically, pid handling has to get moved to the yarn, hdfs, and mapred commands from the *-daemon.sh commands. For example, if one runs 'yarn resourcemanager' it will not generate a pid file. This in turn means that if one were to modify only yarn-daemon.sh, the status subcommand will be giving incorrect information because it doesn't see a pid file. Now one could try to suss out the Java process running the RM, but that's a bit to fragile for my tastes. Another option would be to just do even more copypasta in the shell code, but that's just making bad code even worse. There's been talking of backporting HADOOP-9902 to branch-2, so it is worthwhile to take a wait and see approach, especially given the door is pretty shut on getting anything more into 2.5. Add a 'status' command to yarn-daemon.sh Key: YARN-2346 URL: https://issues.apache.org/jira/browse/YARN-2346 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1 Reporter: Nikunj Bansal Assignee: Allen Wittenauer Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Adding a 'status' command to yarn-daemon.sh will be useful for finding out the status of yarn daemons. Running the 'status' command should exit with a 0 exit code if the target daemon is running and non-zero code in case its not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2346) Add a 'status' command to yarn-daemon.sh
[ https://issues.apache.org/jira/browse/YARN-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075508#comment-14075508 ] Allen Wittenauer edited comment on YARN-2346 at 7/26/14 11:10 PM: -- In order to make this work reliably, it's a significant refactoring of how daemons launch. All of that refactoring has already been done in HADOOP-9902. Specifically, pid handling has to get moved to the yarn, hdfs, and mapred commands from the *-daemon.sh commands. For example, if one runs 'yarn resourcemanager' it will not generate a pid file. This in turn means that if one were to modify only yarn-daemon.sh, the status subcommand will be giving incorrect information because it doesn't see a pid file. Now one could try to suss out the Java process running the RM, but that's a bit to fragile for my tastes. Another option would be to just do even more copypasta in the shell code, but that's just making bad code even worse. There's been talk of backporting HADOOP-9902 to branch-2, so it is worthwhile to take a wait and see approach, especially given the door is pretty shut on getting anything more into 2.5. was (Author: aw): In order to make this work reliably, it's a significant refactoring of how daemons launch. All of that refactoring has already been done in HADOOP-9902. Specifically, pid handling has to get moved to the yarn, hdfs, and mapred commands from the *-daemon.sh commands. For example, if one runs 'yarn resourcemanager' it will not generate a pid file. This in turn means that if one were to modify only yarn-daemon.sh, the status subcommand will be giving incorrect information because it doesn't see a pid file. Now one could try to suss out the Java process running the RM, but that's a bit to fragile for my tastes. Another option would be to just do even more copypasta in the shell code, but that's just making bad code even worse. There's been talking of backporting HADOOP-9902 to branch-2, so it is worthwhile to take a wait and see approach, especially given the door is pretty shut on getting anything more into 2.5. Add a 'status' command to yarn-daemon.sh Key: YARN-2346 URL: https://issues.apache.org/jira/browse/YARN-2346 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1 Reporter: Nikunj Bansal Assignee: Allen Wittenauer Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Adding a 'status' command to yarn-daemon.sh will be useful for finding out the status of yarn daemons. Running the 'status' command should exit with a 0 exit code if the target daemon is running and non-zero code in case its not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075518#comment-14075518 ] Junping Du commented on YARN-2347: -- Thanks for review and comments, [~zjshen]! Nice catch for javadoc issue, will fix it soon. For naming of this generic version, I don't have strong preference on which is better. YarnVersion seems to be a little misleading as we already had yarn version command line to list the version of YARN. Version sounds too generic and easily get duplicated (we had a writable object with the same name in Common). Actually, this version get used for RMState, NMState, ShuffleHandler's State, etc. In this case, may not sounds so weird to you? Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075529#comment-14075529 ] Zhijie Shen commented on YARN-2347: --- bq. Actually, this version get used for RMState, NMState, ShuffleHandler's State, etc. In this case, may not sounds so weird to you? IMHO, the version belongs to the stores, but it happens that the stores are storing the state information. It's more accurate to say the version is of the storage schema. On the other side, timeline server is a stateless machine, but it will still use this version stack. StateVersion may make users consider it stateful. If StateVersion is going to be only used for the storage layer, something like StoreVersion sounds better to me. On the contrary, if it is going to be used to annotate other stuff, such as RPC interface, it looks good to have more generalized name. Anyway, it's not a critical problem, and I'm not strong minded about refatoring the name. However, it reminds me another issue: it may be better to add some more javadoc for StateVersion to let users know what it is really about. Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.
[ https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2359: Attachment: YARN-2359.001.patch Application is hung without timeout and retry after DNS/network is down. - Key: YARN-2359 URL: https://issues.apache.org/jira/browse/YARN-2359 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-2359.000.patch, YARN-2359.001.patch Application is hung without timeout and retry after DNS/network is down. It is because right after the container is allocated for the AM, the DNS/network is down for the node which has the AM container. The application attempt is at state RMAppAttemptState.SCHEDULED, it receive RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the IllegalArgumentException(due to DNS error) happened, it stay at state RMAppAttemptState.SCHEDULED. In the state machine, only two events will be processed at this state: RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL. The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) which will be generated when the node and container timeout. So even the node is removed, the Application is still hung in this state RMAppAttemptState.SCHEDULED. The only way to make the application exit this state is to send RMAppAttemptEventType.KILL event which will only be generated when you manually kill the application from Job Client by forceKillApplication. To fix the issue, we should add an entry in the state machine table to handle RMAppAttemptEventType.CONTAINER_FINISHED event at state RMAppAttemptState.SCHEDULED add the following code in StateMachineFactory: {code}.addTransition(RMAppAttemptState.SCHEDULED, RMAppAttemptState.FINAL_SAVING, RMAppAttemptEventType.CONTAINER_FINISHED, new FinalSavingTransition( new AMContainerCrashedBeforeRunningTransition(), RMAppAttemptState.FAILED)){code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.
[ https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075531#comment-14075531 ] zhihai xu commented on YARN-2359: - I just added a unit test case (testAMCrashAtScheduled) in the patch to verify this state transition in RMAppAttempt state machine. Application is hung without timeout and retry after DNS/network is down. - Key: YARN-2359 URL: https://issues.apache.org/jira/browse/YARN-2359 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-2359.000.patch, YARN-2359.001.patch Application is hung without timeout and retry after DNS/network is down. It is because right after the container is allocated for the AM, the DNS/network is down for the node which has the AM container. The application attempt is at state RMAppAttemptState.SCHEDULED, it receive RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the IllegalArgumentException(due to DNS error) happened, it stay at state RMAppAttemptState.SCHEDULED. In the state machine, only two events will be processed at this state: RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL. The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) which will be generated when the node and container timeout. So even the node is removed, the Application is still hung in this state RMAppAttemptState.SCHEDULED. The only way to make the application exit this state is to send RMAppAttemptEventType.KILL event which will only be generated when you manually kill the application from Job Client by forceKillApplication. To fix the issue, we should add an entry in the state machine table to handle RMAppAttemptEventType.CONTAINER_FINISHED event at state RMAppAttemptState.SCHEDULED add the following code in StateMachineFactory: {code}.addTransition(RMAppAttemptState.SCHEDULED, RMAppAttemptState.FINAL_SAVING, RMAppAttemptEventType.CONTAINER_FINISHED, new FinalSavingTransition( new AMContainerCrashedBeforeRunningTransition(), RMAppAttemptState.FAILED)){code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.
[ https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075537#comment-14075537 ] Hadoop QA commented on YARN-2359: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658009/YARN-2359.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4448//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4448//console This message is automatically generated. Application is hung without timeout and retry after DNS/network is down. - Key: YARN-2359 URL: https://issues.apache.org/jira/browse/YARN-2359 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-2359.000.patch, YARN-2359.001.patch Application is hung without timeout and retry after DNS/network is down. It is because right after the container is allocated for the AM, the DNS/network is down for the node which has the AM container. The application attempt is at state RMAppAttemptState.SCHEDULED, it receive RMAppAttemptEventType.CONTAINER_ALLOCATED event, because the IllegalArgumentException(due to DNS error) happened, it stay at state RMAppAttemptState.SCHEDULED. In the state machine, only two events will be processed at this state: RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL. The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) which will be generated when the node and container timeout. So even the node is removed, the Application is still hung in this state RMAppAttemptState.SCHEDULED. The only way to make the application exit this state is to send RMAppAttemptEventType.KILL event which will only be generated when you manually kill the application from Job Client by forceKillApplication. To fix the issue, we should add an entry in the state machine table to handle RMAppAttemptEventType.CONTAINER_FINISHED event at state RMAppAttemptState.SCHEDULED add the following code in StateMachineFactory: {code}.addTransition(RMAppAttemptState.SCHEDULED, RMAppAttemptState.FINAL_SAVING, RMAppAttemptEventType.CONTAINER_FINISHED, new FinalSavingTransition( new AMContainerCrashedBeforeRunningTransition(), RMAppAttemptState.FAILED)){code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075538#comment-14075538 ] Junping Du commented on YARN-2347: -- bq. On the other side, timeline server is a stateless machine, but it will still use this version stack. StateVersion may make users consider it stateful. If StateVersion is going to be only used for the storage layer, something like StoreVersion sounds better to me. That's good point. Can we think it is for application's state that stored in timeline store? If still no reasonable, let's get back to version. The problem of StoreVersion is: it sounds like a version for store implementation. For example, v1 for LevelDB, v2 for some others (HBase), etc. What do you think? bq. it may be better to add some more javadoc for StateVersion to let users know what it is really about. Also good point. Will fix it soon. Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2362) Capacity Scheduler apps with requests that exceed capacity can starve pending apps
[ https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075554#comment-14075554 ] Ram Venkatesh commented on YARN-2362: - I agree that apps that need the entire cluster capacity are likely not common. However, I think the scenario above can happen in busy clusters where an app might make a request that exceeds _current_ capacity and hence block all other apps. Yes, whenever more resources get freed up and App1's request is satisfied, only then will App2 run. Note, since we are enumerating the set of active apps, the behavior is actually non-deterministic - if the new app happens to be enumerated before the large app, the allocation request will actually be satisfied. The change proposed here makes it deterministic and can also reduce the wait for jobs that can complete - the downside of course is the large app can now experience starvation if small apps keep getting through. Capacity Scheduler apps with requests that exceed capacity can starve pending apps -- Key: YARN-2362 URL: https://issues.apache.org/jira/browse/YARN-2362 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.4.1 Reporter: Ram Venkatesh Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. {code:title=LeafQueue.java|borderStyle=solid} // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } {code} With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2362) Capacity Scheduler: apps with requests that exceed capacity can starve pending apps
[ https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Venkatesh updated YARN-2362: Summary: Capacity Scheduler: apps with requests that exceed capacity can starve pending apps (was: Capacity Scheduler apps with requests that exceed capacity can starve pending apps) Capacity Scheduler: apps with requests that exceed capacity can starve pending apps --- Key: YARN-2362 URL: https://issues.apache.org/jira/browse/YARN-2362 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.4.1 Reporter: Ram Venkatesh Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. {code:title=LeafQueue.java|borderStyle=solid} // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } {code} With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2362) Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps
[ https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Venkatesh updated YARN-2362: Summary: Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps (was: Capacity Scheduler: apps with requests that exceed capacity can starve pending apps) Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps --- Key: YARN-2362 URL: https://issues.apache.org/jira/browse/YARN-2362 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.4.1 Reporter: Ram Venkatesh Cluster configuration: Total memory: 8GB yarn.scheduler.minimum-allocation-mb 256 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config) App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. It subsequently makes a request for 4.6 GB, which cannot be granted and it waits. App 2 makes a request for 1 GB - never receives it, so the app stays in the ACCEPTED state for ever. I think this can happen in leaf queues that are near capacity. The fix is likely in LeafQueue.java assignContainers near line 861, where it returns if the assignment would exceed queue capacity, instead of checking if requests for other active applications can be met. {code:title=LeafQueue.java|borderStyle=solid} // Check queue max-capacity limit if (!assignToQueue(clusterResource, required)) { -return NULL_ASSIGNMENT; +break; } {code} With this change, the scenario above allows App 2 to start and finish while App 1 continues to wait. I have a patch available, but wondering if the current behavior is by design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2347: - Attachment: YARN-2347-v4.patch Update patch in v4 as [~zjshen]'s comments. Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common Key: YARN-2347 URL: https://issues.apache.org/jira/browse/YARN-2347 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347-v4.patch, YARN-2347.patch We have similar things for version state for RM, NM, TS (TimelineServer), etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)