[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-2807: --- Attachment: YARN-2807.2.patch Thanks for the comment [~ajisakaa]. I rethinked about this and left --forcemanual undocumented as intended in the usage of HAAdmin because it is dangerous especially for HDFS. I just added description about --forcemanual in the site document of YARN. Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Priority: Critical Attachments: YARN-2807.1.patch, YARN-2807.2.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-2807: --- Priority: Minor (was: Critical) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265860#comment-14265860 ] Hadoop QA commented on YARN-2427: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690283/apache-yarn-2427.4.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6251//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6251//console This message is automatically generated. Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265865#comment-14265865 ] Hadoop QA commented on YARN-2997: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690286/YARN-2997.4.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6252//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6252//console This message is automatically generated. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.4.patch Updated patch. The testing-only method is removed. {{pendingCompletedContainers.clear()}} is added at the end of {{removeOrTrackCompletedContainersFromContext}}, and also in RESYNC section to clear the cache so that these outdated container statuses will not be reported to the restarted RM. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265835#comment-14265835 ] Yi Liu commented on YARN-2996: -- Test failure and findbugs are not related. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3010: - Description: A new findbug issues reported recently in latest trunk: {quote} IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html was: A new findbug issues reported recently in latest trunk: {quote} IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265959#comment-14265959 ] Steve Loughran commented on YARN-2616: -- thanks for doing the tests; without that the code that was checked in earlier doesn't officially exist. There's an outstanding patch, YARN-2683, which I'm trying to get in; this moves all registry config settings to core-default and documents the registry in the hadoop site docs. This impacts the CLI in a couple of ways: *config* Once YARN-2683 is in, all registry options will move to the core config, not yarn config; this helps the registry to run without any other YARN dependencies. Can you switch to using the basic {{Configuration}}? * docs* The YARN-2683 patch will provide the structure for adding documentation on the CLI If we can get that patch in then it'll be easy to round off the CLI with a basic manpage h3. Testing * test assertions* There's lots of test operations like {code} result = cli.run(new String[] { ls, NonSlashPath}); assertEquals(-1, result); {code} This could be factored out into some method assertResult(cli, int code, String...args) which included the arg list on a failure minor: lots of tabs in the source. Indent with (two) spaces please. * failure testing * Can you add some tests with invalid bindings and see how the CLI fails? e.g: no valid ZK host/port. Add CLI client to the registry to list/view entries --- Key: YARN-2616 URL: https://issues.apache.org/jira/browse/YARN-2616 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Steve Loughran Assignee: Akshay Radia Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-3010: - Attachment: YARN-3010.002.patch Update patch Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265977#comment-14265977 ] Hudson commented on YARN-2360: -- FAILURE: Integrated in Hadoop-Yarn-trunk #799 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/799/]) Move YARN-2360 from 2.6 to 2.7 in CHANGES.txt (kasha: rev 41d72cbd48e6df7be3d177eaf04d73e88cf38381) * hadoop-yarn-project/CHANGES.txt Fair Scheduler: Display dynamic fair share for queues on the scheduler page --- Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.7.0 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265982#comment-14265982 ] Hudson commented on YARN-2958: -- FAILURE: Integrated in Hadoop-Yarn-trunk #799 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/799/]) YARN-2958. Made RMStateStore not update the last sequence number when updating the delegation token. Contributed by Varun Saxena. (zjshen: rev 562a701945be3a672f9cb5a52cc6db2c1589ba2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreRMDTEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java RMStateStore seems to unnecessarily and wrongly store sequence number separately Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch, YARN-2958.004.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); //
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265978#comment-14265978 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Yarn-trunk #799 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/799/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java Add support for FairScheduler to the ReservationSystem -- Key: YARN-2574 URL: https://issues.apache.org/jira/browse/YARN-2574 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Subru Krishnan Assignee: Anubhav Dhoot YARN-1051 introduces the ReservationSystem and the current implementation is based on CapacityScheduler. This JIRA proposes adding support for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266041#comment-14266041 ] Hadoop QA commented on YARN-3010: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690315/YARN-3010.002.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6253//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6253//console This message is automatically generated. Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265997#comment-14265997 ] Hudson commented on YARN-2958: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #65 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/65/]) YARN-2958. Made RMStateStore not update the last sequence number when updating the delegation token. Contributed by Varun Saxena. (zjshen: rev 562a701945be3a672f9cb5a52cc6db2c1589ba2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreRMDTEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java RMStateStore seems to unnecessarily and wrongly store sequence number separately Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch, YARN-2958.004.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.);
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265991#comment-14265991 ] Hudson commented on YARN-2881: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #65 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/65/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265993#comment-14265993 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #65 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/65/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java Add support for FairScheduler to the ReservationSystem -- Key: YARN-2574 URL: https://issues.apache.org/jira/browse/YARN-2574 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Subru Krishnan Assignee: Anubhav Dhoot YARN-1051 introduces the ReservationSystem and the current implementation is based on CapacityScheduler. This JIRA proposes adding support for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265992#comment-14265992 ] Hudson commented on YARN-2360: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #65 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/65/]) Move YARN-2360 from 2.6 to 2.7 in CHANGES.txt (kasha: rev 41d72cbd48e6df7be3d177eaf04d73e88cf38381) * hadoop-yarn-project/CHANGES.txt Fair Scheduler: Display dynamic fair share for queues on the scheduler page --- Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.7.0 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266142#comment-14266142 ] Hudson commented on YARN-2881: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1997 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1997/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266148#comment-14266148 ] Hudson commented on YARN-2958: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1997 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1997/]) YARN-2958. Made RMStateStore not update the last sequence number when updating the delegation token. Contributed by Varun Saxena. (zjshen: rev 562a701945be3a672f9cb5a52cc6db2c1589ba2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreRMDTEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java RMStateStore seems to unnecessarily and wrongly store sequence number separately Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch, YARN-2958.004.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); //
[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266143#comment-14266143 ] Hudson commented on YARN-2360: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1997 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1997/]) Move YARN-2360 from 2.6 to 2.7 in CHANGES.txt (kasha: rev 41d72cbd48e6df7be3d177eaf04d73e88cf38381) * hadoop-yarn-project/CHANGES.txt Fair Scheduler: Display dynamic fair share for queues on the scheduler page --- Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.7.0 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2571: - Target Version/s: 2.7.0 (was: 2.6.0) RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, YARN-2571-008.patch, YARN-2571-009.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266144#comment-14266144 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1997 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1997/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java Add support for FairScheduler to the ReservationSystem -- Key: YARN-2574 URL: https://issues.apache.org/jira/browse/YARN-2574 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Subru Krishnan Assignee: Anubhav Dhoot YARN-1051 introduces the ReservationSystem and the current implementation is based on CapacityScheduler. This JIRA proposes adding support for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266171#comment-14266171 ] Hudson commented on YARN-2958: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #62 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/62/]) YARN-2958. Made RMStateStore not update the last sequence number when updating the delegation token. Contributed by Varun Saxena. (zjshen: rev 562a701945be3a672f9cb5a52cc6db2c1589ba2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreRMDTEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java RMStateStore seems to unnecessarily and wrongly store sequence number separately Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch, YARN-2958.004.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.);
[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266166#comment-14266166 ] Hudson commented on YARN-2360: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #62 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/62/]) Move YARN-2360 from 2.6 to 2.7 in CHANGES.txt (kasha: rev 41d72cbd48e6df7be3d177eaf04d73e88cf38381) * hadoop-yarn-project/CHANGES.txt Fair Scheduler: Display dynamic fair share for queues on the scheduler page --- Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.7.0 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266167#comment-14266167 ] Hudson commented on YARN-2574: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #62 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/62/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java Add support for FairScheduler to the ReservationSystem -- Key: YARN-2574 URL: https://issues.apache.org/jira/browse/YARN-2574 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Subru Krishnan Assignee: Anubhav Dhoot YARN-1051 introduces the ReservationSystem and the current implementation is based on CapacityScheduler. This JIRA proposes adding support for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266165#comment-14266165 ] Hudson commented on YARN-2881: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #62 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/62/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266109#comment-14266109 ] Steve Loughran commented on YARN-2605: -- Can it just send a 307 resubmit same verb response to the caller? That will be picked up by browsers and handled as a new GET, while REST clients (including curl, jersey, htttpclient) will either GET or resubmit the original verb depending on their config. Sending a custom structured response won't work with those existing clients [RM HA] Rest api endpoints doing redirect incorrectly - Key: YARN-2605 URL: https://issues.apache.org/jira/browse/YARN-2605 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: bc Wong Labels: newbie The standby RM's webui tries to do a redirect via meta-refresh. That is fine for pages designed to be viewed by web browsers. But the API endpoints shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd suggest HTTP 303, or return a well-defined error message (json or xml) stating that the standby status and a link to the active RM. The standby RM is returning this today: {noformat} $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Expires: Thu, 25 Sep 2014 18:34:53 GMT Date: Thu, 25 Sep 2014 18:34:53 GMT Pragma: no-cache Content-Type: text/plain; charset=UTF-8 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics Content-Length: 117 Server: Jetty(6.1.26) This is standby RM. Redirecting to the current active RM: http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266201#comment-14266201 ] Hadoop QA commented on YARN-2571: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685837/YARN-2571-009.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6254//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6254//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6254//console This message is automatically generated. RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, YARN-2571-008.patch, YARN-2571-009.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266207#comment-14266207 ] Hudson commented on YARN-2360: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/66/]) Move YARN-2360 from 2.6 to 2.7 in CHANGES.txt (kasha: rev 41d72cbd48e6df7be3d177eaf04d73e88cf38381) * hadoop-yarn-project/CHANGES.txt Fair Scheduler: Display dynamic fair share for queues on the scheduler page --- Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.7.0 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266208#comment-14266208 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/66/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java Add support for FairScheduler to the ReservationSystem -- Key: YARN-2574 URL: https://issues.apache.org/jira/browse/YARN-2574 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Subru Krishnan Assignee: Anubhav Dhoot YARN-1051 introduces the ReservationSystem and the current implementation is based on CapacityScheduler. This JIRA proposes adding support for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266212#comment-14266212 ] Hudson commented on YARN-2958: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/66/]) YARN-2958. Made RMStateStore not update the last sequence number when updating the delegation token. Contributed by Varun Saxena. (zjshen: rev 562a701945be3a672f9cb5a52cc6db2c1589ba2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreRMDTEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java RMStateStore seems to unnecessarily and wrongly store sequence number separately Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch, YARN-2958.004.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266206#comment-14266206 ] Hudson commented on YARN-2881: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/66/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266249#comment-14266249 ] Hudson commented on YARN-2881: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2016 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2016/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266251#comment-14266251 ] Hudson commented on YARN-2574: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2016 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2016/]) YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java Add support for FairScheduler to the ReservationSystem -- Key: YARN-2574 URL: https://issues.apache.org/jira/browse/YARN-2574 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Subru Krishnan Assignee: Anubhav Dhoot YARN-1051 introduces the ReservationSystem and the current implementation is based on CapacityScheduler. This JIRA proposes adding support for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266250#comment-14266250 ] Hudson commented on YARN-2360: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2016 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2016/]) Move YARN-2360 from 2.6 to 2.7 in CHANGES.txt (kasha: rev 41d72cbd48e6df7be3d177eaf04d73e88cf38381) * hadoop-yarn-project/CHANGES.txt Fair Scheduler: Display dynamic fair share for queues on the scheduler page --- Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Fix For: 2.7.0 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266255#comment-14266255 ] Hudson commented on YARN-2958: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2016 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2016/]) YARN-2958. Made RMStateStore not update the last sequence number when updating the delegation token. Contributed by Varun Saxena. (zjshen: rev 562a701945be3a672f9cb5a52cc6db2c1589ba2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreRMDTEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java RMStateStore seems to unnecessarily and wrongly store sequence number separately Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Fix For: 2.7.0 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, YARN-2958.003.patch, YARN-2958.004.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.);
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.27.patch patch tests which fail when null check for rmcontext.getscheduler is not present in ficaschedulerapp maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266602#comment-14266602 ] Chris Trezzo commented on YARN-2217: Will do. Thanks! Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266692#comment-14266692 ] Ming Ma commented on YARN-914: -- Thanks, Junping. The timeout is definitely necessary. * Sounds like we need a new state for NM, called decommission_in_progress when NM is draining the containers. When RM considers the decommission completes, it will be marked decommissioned. * To clarify my early comment all its map output are fetched or until all the applications the node touches have completed, the question is when YARN can declare a node's state has been gracefully drained and thus the node gracefully decommissioned ( admins can shutdown the whole machine without any impact on jobs ). For MR, the state could be running tasks/containers or mapper outputs. Say we have timeout of 30 minutes for decommission, it takes 3 minutes to finish the mappers on the node, another 5 minutes for the job to finish, then YARN can declare the node gracefully decommissioned in 8 minutes, instead of waiting for 30 minutes. RM knows all applications on any given NM. So if all applications on any given node have completed, RM can mark the node decommissioned. * Yes, I meant long running services. If YARN just kills the containers upon decommission request, the impact could vary. Some services might not have states to drain. Or maybe the services can handle the state migration on their own without YARN's help. For such services, maybe we can just use ResourceOption's timeout for that; set timeout to 0 and NM will just kill the containers. * Given we don't plan to have applications checkpoint and migrate states, it doesn't seem to be necessary to have YARN notify applications upon decommission requests. Just to call it out. * It might be useful to have a new state called decommissioned_timeout, so that admins know the node has been gracefully decommissioned or not. Thoughts? Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266691#comment-14266691 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690392/YARN-2637.27.patch against trunk revision 4cd66f7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6256//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6256//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6256//console This message is automatically generated. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2574) Add support for FairScheduler to the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot resolved YARN-2574. - Resolution: Fixed Fix Version/s: 2.7.0 Add support for FairScheduler to the ReservationSystem -- Key: YARN-2574 URL: https://issues.apache.org/jira/browse/YARN-2574 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Subru Krishnan Assignee: Anubhav Dhoot Fix For: 2.7.0 YARN-1051 introduces the ReservationSystem and the current implementation is based on CapacityScheduler. This JIRA proposes adding support for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266707#comment-14266707 ] Wangda Tan commented on YARN-2637: -- Failed test should not relate to this patch. Could you check the findbugs warning? Besides the findbugs warning, +1. Thanks, maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2982) Use ReservationQueueConfiguration in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2982: Issue Type: Bug (was: Sub-task) Parent: (was: YARN-2574) Use ReservationQueueConfiguration in CapacityScheduler -- Key: YARN-2982 URL: https://issues.apache.org/jira/browse/YARN-2982 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Anubhav Dhoot ReservationQueueConfiguration is common to reservation irrespective of Scheduler. It would be good to have CapacityScheduler also support this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2982) Use ReservationQueueConfiguration in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2982: Component/s: (was: fairscheduler) Use ReservationQueueConfiguration in CapacityScheduler -- Key: YARN-2982 URL: https://issues.apache.org/jira/browse/YARN-2982 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Anubhav Dhoot ReservationQueueConfiguration is common to reservation irrespective of Scheduler. It would be good to have CapacityScheduler also support this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266843#comment-14266843 ] Chen He commented on YARN-1680: --- Since Label scheduling patches are continuously being checked into trunk, we need to consider a little bit more than just blacklist nodes in headroom and user limit calculation. It is possible that the app asks for some labeled nodes in its ResourceRequest but some of them have already been blocked listed by cluster. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2933: Attachment: YARN-2933-4.patch Thanks [~wangda] for review. I updated the patch based on the comments. Thanks, Mayank Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266880#comment-14266880 ] Jian He commented on YARN-2230: --- +1 Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code) - Key: YARN-2230 URL: https://issues.apache.org/jira/browse/YARN-2230 Project: Hadoop YARN Issue Type: Bug Components: client, documentation, scheduler Affects Versions: 2.4.0 Reporter: Adam Kawa Assignee: Vijay Bhat Priority: Minor Attachments: YARN-2230.001.patch, YARN-2230.002.patch When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() 0 || resReq.getCapability().getVirtualCores() maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException(Invalid resource request + , requested virtual cores 0 + , or requested virtual cores max configured + , requestedVirtualCores= + resReq.getCapability().getVirtualCores() + , maxVirtualCores= + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request should be capped to the allocation limit. {code} property descriptionThe maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value./description nameyarn.scheduler.maximum-allocation-vcores/name value32/value /property {code} This means that: * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores 0, or requested virtual cores max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) . at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} * IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress. The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266900#comment-14266900 ] Hudson commented on YARN-2427: -- FAILURE: Integrated in Hadoop-trunk-Commit #6818 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6818/]) YARN-2427. Added the API of moving apps between queues in RM web services. Contributed by Varun Vasudev. (zjshen: rev 60103fca04dc713183e4ec9e12f961642e7d1001) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266911#comment-14266911 ] Xuan Gong commented on YARN-2571: - Thanks for the patch. Overall looks fine. 1. Could we move the registry service from alway-on service to active services ? For example, if RM HA is enabled, only the active RM can start the registry service. 2. The Curator Framework is used here to do the ZK operation. I am not familiar with this. Does the curator framework provide automatic fence mechanism when we write/delete related znodes? For example, only the active RM can write data in znode. The standby RM should not be allowed to do anything. RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, YARN-2571-008.patch, YARN-2571-009.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266919#comment-14266919 ] Varun Saxena commented on YARN-3003: Thanks [~leftnoteasy] for your input. Yes, its about getting labels to nodes. Separating it out into 2 APIs' is more cleaner and less confusing to the user. Basically wanted your input on whether we need this new API or not. I had a similar idea in mind regarding how to fix this issue. Will look at YARN-2943 as well as [~Naganarasimha] suggested. Thanks. Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266923#comment-14266923 ] Varun Saxena commented on YARN-2902: Kindly review this. Pending since a long time. Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266925#comment-14266925 ] Varun Saxena commented on YARN-2936: [~jianhe], sorry couldnt get you. What do you mean by core change ? The newly added test code calls getProto and that is where the change is. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info
[ https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266881#comment-14266881 ] Hudson commented on YARN-2978: -- FAILURE: Integrated in Hadoop-trunk-Commit #6817 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6817/]) YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java ResourceManager crashes with NPE while getting queue info - Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Priority: Critical Labels: capacityscheduler, resourcemanager Fix For: 2.7.0 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, YARN-2978.003.patch, YARN-2978.004.patch java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.28.patch maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2979) Unsupported operation exception in message building (YarnProtos)
[ https://issues.apache.org/jira/browse/YARN-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266908#comment-14266908 ] Varun Saxena commented on YARN-2979: Resolving this as well as YARN-2978 is committed Unsupported operation exception in message building (YarnProtos) Key: YARN-2979 URL: https://issues.apache.org/jira/browse/YARN-2979 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Fix For: 2.7.0 java.lang.UnsupportedOperationException at java.util.AbstractList.add(AbstractList.java:148) at java.util.AbstractList.add(AbstractList.java:108) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2979) Unsupported operation exception in message building (YarnProtos)
[ https://issues.apache.org/jira/browse/YARN-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena resolved YARN-2979. Resolution: Fixed Fix Version/s: 2.7.0 Unsupported operation exception in message building (YarnProtos) Key: YARN-2979 URL: https://issues.apache.org/jira/browse/YARN-2979 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Fix For: 2.7.0 java.lang.UnsupportedOperationException at java.util.AbstractList.add(AbstractList.java:148) at java.util.AbstractList.add(AbstractList.java:108) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info
[ https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266920#comment-14266920 ] Varun Saxena commented on YARN-2978: Thanks [~jianhe] for the review and commit. ResourceManager crashes with NPE while getting queue info - Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Priority: Critical Labels: capacityscheduler, resourcemanager Fix For: 2.7.0 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, YARN-2978.003.patch, YARN-2978.004.patch java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266985#comment-14266985 ] Hadoop QA commented on YARN-2423: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690438/YARN-2423.006.patch against trunk revision 60103fc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6259//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6259//console This message is automatically generated. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267068#comment-14267068 ] Andy Schlaikjer commented on YARN-1529: --- Any update on this? These new metrics look valuable. Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267075#comment-14267075 ] Hadoop QA commented on YARN-1529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621292/YARN-1529.v03.patch against trunk revision 788ee35. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6260//console This message is automatically generated. Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267086#comment-14267086 ] Yi Liu commented on YARN-2637: -- {quote} Findbugs was the result of changing the ratio of sync to unsync accesses which hit the findbugs limits, but not the pattern itself, which looks fine, so added fb exclusion. {quote} Not exactly, in FairScheduler, it's a real issue, we need *synchronized* for _resolveReservationQueueName_. Already have a JIRA YARN-3010 to fix the findbugs... maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267098#comment-14267098 ] Varun Saxena commented on YARN-2936: [~jianhe], changed the test. Kindly review. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, YARN-2936.006.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2213: --- Attachment: YARN-2213.001.patch Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2213.001.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2428: --- Attachment: (was: YARN-2428.patch) LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Fix For: 2.7.0 When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267100#comment-14267100 ] Ted Yu commented on YARN-2213: -- lgtm Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Attachments: YARN-2213.001.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266968#comment-14266968 ] Varun Saxena commented on YARN-2902: Thanks [~jlowe] Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267030#comment-14267030 ] Craig Welch commented on YARN-2637: --- Findbugs was the result of changing the ratio of sync to unsync accesses which hit the findbugs limits, but not the pattern itself, which looks fine, so added fb exclusion. TestFairScheduler passes on my box with the change so build server related / not a real issue. Was not originally planning to address the max am percent for user as that wasn't the issue we kept encountering but forgot to mention this / edit the jira to reflect. However, I'm going to see what the impact would be of adding that now then we can decide to include it or move to it's own jira. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2936: --- Attachment: YARN-2936.006.patch YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, YARN-2936.006.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266981#comment-14266981 ] Hadoop QA commented on YARN-2933: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690428/YARN-2933-4.patch against trunk revision dd57c20. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6257//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6257//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6257//console This message is automatically generated. Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266947#comment-14266947 ] Jason Lowe commented on YARN-2902: -- Sorry for the delay, Varun, as I was busy with end-of-year items and vacation. I'll try to get to this by the end of the week. Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266994#comment-14266994 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690430/YARN-2637.28.patch against trunk revision dd57c20. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6258//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6258//console This message is automatically generated. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267008#comment-14267008 ] Wangda Tan commented on YARN-2933: -- Hi [~mayank_bansal], Thanks for updating, In Proportional...Policy, some minor comments 1. {{getNodeLabels}} is nobody using it, should be remove it. 2. {{setNodeLabels}} is too simple to be a method, suggest to remove it too. 3. {{getNonLabeledResources}} should be private 4. {{isLabeledContainer}} could write like {code} private boolean isLabeledContainer(RMContainer c) { return labels.containsKey(c.getAllocatedNode()); } {code} Avoid traversing of all keys, I suggest to remove this method since it's too simple. At least, it should be private. In Test, currently the {{testIdealAllocationForLabels}} is not correct. In your test, queueA/B has total guaranteed *NON_LABELED* resource 100, they used 100 *NON_LABELED* resource, but {{NodeLabelsManager.getResourceByLabel(no-label)}} is only 80. (non-labeled-used/configured-resource NodeLabelsManager.ResourceByLabel(no-label)) One thing need worth to take care is, if we don't do anything on TestPro..Policy mocking queues and applications. All used/configured capacities are *NON_LABELED* capacity. I suggest to write test like: {code} @Test public void testIdealAllocationForLabels() { int[][] qData = new int[][] { // / A B { 80, 40, 40 }, // abs { 80, 80, 80 }, // maxcap { 80, 80, 0 }, // used { 70, 20, 50 }, // pending { 0, 0, 0 }, // reserved { 5, 4, 1 }, // apps { -1, 1, 1 }, // req granularity { 2, 0, 0 }, // subqueues }; setAMContainer = true; setLabelContainer = true; MapNodeId, SetString labels = new HashMapNodeId, SetString(); NodeId node = NodeId.newInstance(node1, 0); SetString labelSet = new HashSetString(); labelSet.add(x); labels.put(node, labelSet); when(lm.getNodeLabels()).thenReturn(labels); ProportionalCapacityPreemptionPolicy policy = buildPolicy(qData); // Subtracting Label X resources from cluster resources when(lm.getResourceByLabel(anyString(), any(Resource.class))).thenReturn( Resources.clone(Resource.newInstance(80, 0))); clusterResources.setMemory(100); policy.editSchedule(); // By skipping AM Container and Labeled container, all other 18 containers // of appD will be // preempted verify(mDisp, times(18)).handle(argThat(new IsPreemptionRequestFor(appD))); // By skipping AM Container and Labeled container, all other 18 containers // of appC will be // preempted verify(mDisp, times(18)).handle(argThat(new IsPreemptionRequestFor(appC))); // rest 4 containers from appB will be preempted verify(mDisp, times(4)).handle(argThat(new IsPreemptionRequestFor(appB))); setAMContainer = false; setLabelContainer = false; } {code} Now, configured *NON_LABELED* resource is 80, before entering policy.editSchedule, {{clusterResources.setMemory(100);}}. Which makes clusterResource non-labeled-resource And in computation, it will only consider clusterResource is 80 after {{getNonLabeledResources}}. And could you take a look at findbugs warning. Thoughts? Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267065#comment-14267065 ] Yi Liu commented on YARN-2996: -- Yes, Zhijie {quote} Good catch! It seems that MemoryRMStateStore#storeOrUpdateAMRMTokenSecretManagerState needs to be fixed too. {quote} {{.002}} patch already includes this fix. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267077#comment-14267077 ] Varun Saxena commented on YARN-2936: Oh you mean, the newly added test code passes even without the change. Will look at it and change test accordingly. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2213: --- Attachment: (was: YARN-2213.patch) Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267108#comment-14267108 ] Naganarasimha G R commented on YARN-3003: - Hi [~leftnoteasy], Yes, my idea was to transpose as a given node can belong to multiple labels getting node-label then transposing would be better as there is already existing node to label mapping Data structure. Further would it be better to support label expression and get the list of nodes mapping to it rather than set of labels as input ? May be [~yuzhih...@gmail.com] can describe more about the scenario or usage of this interface ? Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266891#comment-14266891 ] Zhijie Shen commented on YARN-2427: --- bq. If you feel strongly about it, I can remove it. I realize we have done the similar thing for app state endpoint. Let's keep to it for app queue. The new patch looks good to me. Will commit it. Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2423: Attachment: YARN-2423.006.patch 006 patch is rebased on latest trunk TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266877#comment-14266877 ] Jason Lowe commented on YARN-1680: -- bq. It is possible that the app asks for some labeled nodes in its ResourceRequest but some of them have already been blocked listed by cluster. Yes, agreed. However it would be very useful to have a patch that just fixes the blacklisted node case in the interim since many clusters (most at this point) are not using labels. If it's easy to add label consideration into this then go for it. Otherwise I think it would be better to make incremental steps by fixing the existing issue of blacklisted nodes and address the label issue in a separate JIRA. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3002) YARN documentation needs updating post-shell rewrite
[ https://issues.apache.org/jira/browse/YARN-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3002: --- Attachment: YARN-3002-01.patch -01: * Make these documents consistent with and reference HADOOP-10908 appropriately. * Add quite a few missing commands and options. :( * Style fixes here and there * Alphabetize the subcommands YARN documentation needs updating post-shell rewrite Key: YARN-3002 URL: https://issues.apache.org/jira/browse/YARN-3002 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Attachments: YARN-3002-00.patch, YARN-3002-01.patch After HADOOP-9902, the YARN documentation is out of date. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266866#comment-14266866 ] Chen He commented on YARN-2556: --- Current benchmark only contains basic Timelineserver write / sec. Do we need to add more? Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: chang li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266930#comment-14266930 ] Jian He commented on YARN-2936: --- sorry for confusion, I meant the core/production code (not test code) change. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit
[ https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266590#comment-14266590 ] Chen He commented on YARN-2848: --- I guess the label is provide by users or applications to choose what nodes to run. The Blacklist is detected by system that what nodes are not stable to run. The blacklisted nodes could be regarded as a special label or NOT label. However, we need extra synchronization process to keep the consistency of users/apps requests and unstable nodes before making scheduling decision. YARN-1680 could be a solution before we actually settle down the label scope and the synchronization overhead issue. (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit -- Key: YARN-2848 URL: https://issues.apache.org/jira/browse/YARN-2848 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Likely solutions to [YARN-1680] (properly handling node and rack blacklisting with cluster level node additions and removals) will entail managing an application-level slice of the cluster resource available to the application for use in accurately calculating the application headroom and user limit. There is an assumption that events which impact this resource will occur less frequently than the need to calculate headroom, userlimit, etc (which is a valid assumption given that occurs per-allocation heartbeat). Given that, the application should (with assistance from cluster-level code...) detect changes to the composition of the cluster (node addition, removal) and when those have occurred, calculate an application specific cluster resource by comparing cluster nodes to it's own blacklist (both rack and individual node). I think it makes sense to include nodelabel considerations into this calculation as it will be efficient to do both at the same time and the single resource value reflecting both constraints could then be used for efficient frequent headroom and userlimit calculations while remaining highly accurate. The application would need to be made aware of nodelabel changes it is interested in (the application or removal of labels of interest to the application to/from nodes). For this purpose, the application submissions's nodelabel expression would be used to determine the nodelabel impact on the resource used to calculate userlimit and headroom (Cases where the application elected to request resources not using the application level label expression are out of scope for this - but for the common usecase of an application which uses a particular expression throughout, userlimit and headroom would be accurate) This could also provide an overall mechanism for handling application-specific resource constraints which might be added in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2556: -- Target Version/s: 2.7.0 (was: 2.6.0) Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: chang li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267210#comment-14267210 ] Chengbing Liu commented on YARN-2997: - Once got an RESYNC, NM calls {{getNMContainerStatuses}}, which will loop over all containers in the NM context, remove those whose app is not in the NM context, finally report to RM. The method {{getNMContainerStatuses}} remains unchanged before and after this patch. The logic of removing containers from context is also unchanged. From a different viewpoint, {{pendingCompletedContainers}} contains the following: * completed containers, whose app is stopped, and the container is removed from the NM context. * completed containers, whose app is NOT stopped (which implies their apps are in the NM context), and the container is NOT removed from the NM context. The first kind will not be reported to RM since they are not in the NM context, so they will not be looped. The second kind will be reported to RM since they are in the NM context, and their apps must be in the NM context. Finally, the changes of this patch can be summarized as follows: * Does not send finished container statuses repeatedly for running application * Send completed container statuses again in case of lost heartbeat (normal heartbeat, not RESYNC) I hope this will clarify your doubts. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267219#comment-14267219 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690467/YARN-2637.29.patch against trunk revision 788ee35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6264//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6264//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6264//console This message is automatically generated. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2428: --- Attachment: YARN-2428.001.patch LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Attachments: YARN-2428.001.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267125#comment-14267125 ] Varun Saxena commented on YARN-3003: [~Naganarasimha], the thing you are looking at is a Bidirectional Map. I think Guava has such functionality. Will explore and update once I start working on this issue. Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267159#comment-14267159 ] Hadoop QA commented on YARN-2213: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690459/YARN-2213.001.patch against trunk revision 788ee35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6262//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6262//console This message is automatically generated. Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Attachments: YARN-2213.001.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3011) NM dies because of the failure of resource localization
Wang Hao created YARN-3011: -- Summary: NM dies because of the failure of resource localization Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3011: -- Assignee: Varun Saxena NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.29.patch Take a go adding user am limit also (needs further verification/test), see test impact maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267153#comment-14267153 ] Hadoop QA commented on YARN-2936: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690457/YARN-2936.006.patch against trunk revision 788ee35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6261//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6261//console This message is automatically generated. YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, YARN-2936.006.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267246#comment-14267246 ] Wang Hao commented on YARN-3011: I submitted a job to oozie. In my workflow.xml, the value of the tag script is ended with '/' by mistake. workflow-app xmlns=uri:oozie:workflow:0.2 name=hive-wf start to=create_hive/ action name=create_hive hive xmlns=uri:oozie:hive-action:0.2 job-tracker${jobTracker}/job-tracker name-node${nameNode}/name-node configuration property nameoozie.action.sharelib.for.hive/name valuehive2/value /property property nameoozie.launcher.action.main.class/name valueorg.apache.oozie.action.hadoop.Hive2Main/value /property property namemapreduce.job.queuename/name value${queueName}/value /property /configuration scripttest_ooize_job1.sql//script paramhivevar:dbname=offline/param paramhivevar:partition_date=20141228/param /hive ok to=end/ error to=fail/ /action kill name=fail messageHive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]/message /kill end name=end/ /workflow-app When NM localized resource , the file test_ooize_job1.sql/ cause a exception in function getPathForLocalization of LocalResourcesTrackerImpl. In function getPathForLocalization, when created Path, the second parameter will get null. Path localPath = new Path(rPath, req.getPath().getName()); finally, the exception will cause AsyncDispatcher to shutdown the jvm. So, I think we should handle this Exception, otherwise, it will cause lots of NMs die. NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267329#comment-14267329 ] Hadoop QA commented on YARN-2807: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690300/YARN-2807.2.patch against trunk revision 788ee35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.security.ssl.TestReloadingX509TrustManager Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6265//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6265//console This message is automatically generated. Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267275#comment-14267275 ] Naganarasimha G R commented on YARN-3009: - Hi [~cwensel] Took a look @ code and test cases. Seems like its not a issue, if the filter value is placed within double quotes then its expected to be read as a string, if not it will read as numerical object itself (refer {{TestTimelineWebServices.testPrimaryFilterNumericString() testPrimaryFilterNumericStringWithQuotes()}} ) May be you can share the URL which you are using to store and accessing the timeline entities through webservice, which can help in validating this issue further TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2807: Component/s: documentation Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267322#comment-14267322 ] Zhijie Shen commented on YARN-2996: --- My bad. I mean MemoryRMStateStore#updateRMDelegationTokenState. It contains two other synchronized methods, but it's better to keep them atomic, and not interpolated by other operations. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267358#comment-14267358 ] Akira AJISAKA commented on YARN-2807: - Thanks [~iwasakims] for updating the patch. Mostly looks good to me. Minor comment: Would you remove trailing whitespaces in YarnCommands.apt.vm? Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Attachment: YARN-2996.003.patch OK, I see, update the patch. Thanks Zhijie. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)