[jira] [Commented] (YARN-2102) More generalized timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107898#comment-14107898 ] Hadoop QA commented on YARN-2102: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663830/YARN-2102.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4705//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4705//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4705//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4705//console This message is automatically generated. > More generalized timeline ACLs > -- > > Key: YARN-2102 > URL: https://issues.apache.org/jira/browse/YARN-2102 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch > > > We need to differentiate the access controls of reading and writing > operations, and we need to think about cross-entity access control. For > example, if we are executing a workflow of MR jobs, which writing the > timeline data of this workflow, we don't want other user to pollute the > timeline data of the workflow by putting something under it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107896#comment-14107896 ] Sunil G commented on YARN-2385: --- After checking code, *AbstractYarnScheduler#killAllAppsInQueue* and *ClientRMService#getApplications* can be changed by a combination of these apis as needed. Currently the behavior is different for Fair and CS here in these cases. A uniform decision can be derived and then these two news apis can be used in this context as needed. i feel for *killAllAppsInQueue* and *getApplications* both pending and running applications are needed. [~zjshen], [~wangda] [~subru] please suggest your thoughts. If you agree to this, I would like to take up this Jira. > Consider splitting getAppsinQueue to getRunningAppsInQueue + > getPendingAppsInQueue > -- > > Key: YARN-2385 > URL: https://issues.apache.org/jira/browse/YARN-2385 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler >Reporter: Subramaniam Krishnan > Labels: abstractyarnscheduler > > Currently getAppsinQueue returns both pending & running apps. The purpose of > the JIRA is to explore splitting it to getRunningAppsInQueue + > getPendingAppsInQueue that will provide more flexibility to callers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2182) Update ContainerId#toString() to avoid conflicts before and after RM restart
[ https://issues.apache.org/jira/browse/YARN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107893#comment-14107893 ] Tsuyoshi OZAWA commented on YARN-2182: -- s/Changed to prefix epoch to ContainerId#toString()/Updated ContainerId#toString() to suffix epoch/ > Update ContainerId#toString() to avoid conflicts before and after RM restart > > > Key: YARN-2182 > URL: https://issues.apache.org/jira/browse/YARN-2182 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2182.1.patch > > > ContainerId#toString() doesn't include any information about current cluster > id. This leads conflict between container ids. We can avoid the conflicts > without breaking backward compatibility by using epoch introduced on > YARN-2052. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2182) Update ContainerId#toString() to avoid conflicts before and after RM restart
[ https://issues.apache.org/jira/browse/YARN-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2182: - Attachment: YARN-2182.1.patch Changed to prefix epoch to ContainerId#toString(). > Update ContainerId#toString() to avoid conflicts before and after RM restart > > > Key: YARN-2182 > URL: https://issues.apache.org/jira/browse/YARN-2182 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2182.1.patch > > > ContainerId#toString() doesn't include any information about current cluster > id. This leads conflict between container ids. We can avoid the conflicts > without breaking backward compatibility by using epoch introduced on > YARN-2052. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2446) Using TimelineNamespace to shield the entities of a user
Zhijie Shen created YARN-2446: - Summary: Using TimelineNamespace to shield the entities of a user Key: YARN-2446 URL: https://issues.apache.org/jira/browse/YARN-2446 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the entities, preventing them from being accessed or affected by other users' operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2102) More generalized timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2102: -- Attachment: YARN-2102.1.patch I divided the work into two halves. In this Jira, I'd like to scope the work within defining TimelineNamespace data model, reading from and writing into timeline store, making REST APIs for users to operate on the namespace and TimelineClient wrapper over the PUT method. In other word, this Jira focuses on making the new TimelineNamespace work end to end. I'll create a follow up Jira to use TimelineNamespace to protect entities. > More generalized timeline ACLs > -- > > Key: YARN-2102 > URL: https://issues.apache.org/jira/browse/YARN-2102 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch > > > We need to differentiate the access controls of reading and writing > operations, and we need to think about cross-entity access control. For > example, if we are executing a workflow of MR jobs, which writing the > timeline data of this workflow, we don't want other user to pollute the > timeline data of the workflow by putting something under it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1879: - Attachment: YARN-1879.9.patch Refreshed a patch. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
[ https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107859#comment-14107859 ] Jonathan Eagles commented on YARN-2035: --- [~zjshen], can you please review this new version of the patch? > FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode > --- > > Key: YARN-2035 > URL: https://issues.apache.org/jira/browse/YARN-2035 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch > > > Small bug that prevents ResourceManager and ApplicationHistoryService from > coming up while Namenode is in safemode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
[ https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107852#comment-14107852 ] Hadoop QA commented on YARN-2035: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663820/YARN-2035-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4703//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4703//console This message is automatically generated. > FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode > --- > > Key: YARN-2035 > URL: https://issues.apache.org/jira/browse/YARN-2035 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch > > > Small bug that prevents ResourceManager and ApplicationHistoryService from > coming up while Namenode is in safemode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107851#comment-14107851 ] zhihai xu commented on YARN-1458: - The test failure is not related to my change. TestAMRestart is passed in my local build. T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 89.639 sec - in org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Results : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2035) FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode
[ https://issues.apache.org/jira/browse/YARN-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2035: -- Attachment: YARN-2035-v3.patch Addressed failing tests with last patch. > FileSystemApplicationHistoryStore blocks RM and AHS while NN is in safemode > --- > > Key: YARN-2035 > URL: https://issues.apache.org/jira/browse/YARN-2035 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-2035-v2.patch, YARN-2035-v3.patch, YARN-2035.patch > > > Small bug that prevents ResourceManager and ApplicationHistoryService from > coming up while Namenode is in safemode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time
[ https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107823#comment-14107823 ] Tsuyoshi OZAWA commented on YARN-1326: -- A patch is ready for review. [~kkambatl], . could you check it? > RM should log using RMStore at startup time > --- > > Key: YARN-1326 > URL: https://issues.apache.org/jira/browse/YARN-1326 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.5.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, > YARN-1326.4.patch, demo.png > > Original Estimate: 3h > Remaining Estimate: 3h > > Currently there are no way to know which RMStore RM uses. It's useful to log > the information at RM's startup time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107822#comment-14107822 ] Hadoop QA commented on YARN-1458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663814/YARN-1458.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4702//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4702//console This message is automatically generated. > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSh
[jira] [Commented] (YARN-2445) ATS does not reflect changes to uploaded TimelineEntity
[ https://issues.apache.org/jira/browse/YARN-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107808#comment-14107808 ] Billie Rinaldi commented on YARN-2445: -- ATS is only designed to support aggregation. In other words, each new primary filter or related entity is added to what is already there for the entity. You cannot remove previously put information. In this example, I would expect oldprop and newprop both to appear. > ATS does not reflect changes to uploaded TimelineEntity > --- > > Key: YARN-2445 > URL: https://issues.apache.org/jira/browse/YARN-2445 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Marcelo Vanzin >Priority: Minor > Attachments: ats2.java > > > If you make a change to the TimelineEntity and send it to the ATS, that > change is not reflected in the stored data. > For example, in the attached code, an existing primary filter is removed and > a new one is added. When you retrieve the entity from the ATS, it only > contains the old value: > {noformat} > {"entities":[{"events":[],"entitytype":"test","entity":"testid-ad5380c0-090e-4982-8da8-21676fe4e9f4","starttime":1408746026958,"relatedentities":{},"primaryfilters":{"oldprop":["val"]},"otherinfo":{}}]} > {noformat} > Perhaps this is what the design wanted, but from an API user standpoint, it's > really confusing, since to upload events I have to upload the entity itself, > and the changes are not reflected. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time
[ https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107806#comment-14107806 ] Hadoop QA commented on YARN-1326: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663809/YARN-1326.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4701//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4701//console This message is automatically generated. > RM should log using RMStore at startup time > --- > > Key: YARN-1326 > URL: https://issues.apache.org/jira/browse/YARN-1326 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.5.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, > YARN-1326.4.patch, demo.png > > Original Estimate: 3h > Remaining Estimate: 3h > > Currently there are no way to know which RMStore RM uses. It's useful to log > the information at RM's startup time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107807#comment-14107807 ] zhihai xu commented on YARN-1458: - I uploaded a new patch "YARN-1458.004.patch" to fix the test failure. The test failure is the following: Parent Queue: "root.parentB" have one Vcore steady fair share. But root.parentB have two child queues:root.parentB.childB1 and root.parentB.childB2. we can't split one Vcore to two child queues. The new patch will calculate conservatively to assign 0 Vcore to both child queues. The old code will assign 1 Vcore to both child queues, which will be over total resource limit. > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-1458: Attachment: YARN-1458.004.patch > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107785#comment-14107785 ] Hadoop QA commented on YARN-2395: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663799/YARN-2395-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4700//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4700//console This message is automatically generated. > FairScheduler: Preemption timeout should be configurable per queue > -- > > Key: YARN-2395 > URL: https://issues.apache.org/jira/browse/YARN-2395 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2395-1.patch, YARN-2395-2.patch > > > Currently in fair scheduler, the preemption logic considers fair share > starvation only at leaf queue level. This jira is created to implement it at > the parent queue as well. > It involves : > 1. Making "check for fair share starvation" and "amount of resource to > preempt" recursive such that they traverse the queue hierarchy from root to > leaf. > 2. Currently fairSharePreemptionTimeout is a global config. We could make it > configurable on a per queue basis,so that we can specify different timeouts > for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1326) RM should log using RMStore at startup time
[ https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1326: - Attachment: YARN-1326.4.patch Fixed failures of TestRMWebServices. > RM should log using RMStore at startup time > --- > > Key: YARN-1326 > URL: https://issues.apache.org/jira/browse/YARN-1326 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.5.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1326.1.patch, YARN-1326.2.patch, YARN-1326.3.patch, > YARN-1326.4.patch, demo.png > > Original Estimate: 3h > Remaining Estimate: 3h > > Currently there are no way to know which RMStore RM uses. It's useful to log > the information at RM's startup time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107754#comment-14107754 ] Hadoop QA commented on YARN-2360: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663761/YARN-2360-v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4699//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4699//console This message is automatically generated. > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, > YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-321: - Assignee: (was: Yu Gao) > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2395: -- Attachment: YARN-2395-2.patch Update a new patch which addresses Karthik's latest comments, and also add per-job preemption timeout configuration for min share. > FairScheduler: Preemption timeout should be configurable per queue > -- > > Key: YARN-2395 > URL: https://issues.apache.org/jira/browse/YARN-2395 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2395-1.patch, YARN-2395-2.patch > > > Currently in fair scheduler, the preemption logic considers fair share > starvation only at leaf queue level. This jira is created to implement it at > the parent queue as well. > It involves : > 1. Making "check for fair share starvation" and "amount of resource to > preempt" recursive such that they traverse the queue hierarchy from root to > leaf. > 2. Currently fairSharePreemptionTimeout is a global config. We could make it > configurable on a per queue basis,so that we can specify different timeouts > for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107721#comment-14107721 ] Hadoop QA commented on YARN-1458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663743/YARN-1458.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerFairShare {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4698//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4698//console This message is automatically generated. > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParent
[jira] [Assigned] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Gao reassigned YARN-321: --- Assignee: Yu Gao > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Yu Gao > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2445) ATS does not reflect changes to uploaded TimelineEntity
[ https://issues.apache.org/jira/browse/YARN-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated YARN-2445: - Attachment: ats2.java > ATS does not reflect changes to uploaded TimelineEntity > --- > > Key: YARN-2445 > URL: https://issues.apache.org/jira/browse/YARN-2445 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Marcelo Vanzin >Priority: Minor > Attachments: ats2.java > > > If you make a change to the TimelineEntity and send it to the ATS, that > change is not reflected in the stored data. > For example, in the attached code, an existing primary filter is removed and > a new one is added. When you retrieve the entity from the ATS, it only > contains the old value: > {noformat} > {"entities":[{"events":[],"entitytype":"test","entity":"testid-ad5380c0-090e-4982-8da8-21676fe4e9f4","starttime":1408746026958,"relatedentities":{},"primaryfilters":{"oldprop":["val"]},"otherinfo":{}}]} > {noformat} > Perhaps this is what the design wanted, but from an API user standpoint, it's > really confusing, since to upload events I have to upload the entity itself, > and the changes are not reflected. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2445) ATS does not reflect changes to uploaded TimelineEntity
Marcelo Vanzin created YARN-2445: Summary: ATS does not reflect changes to uploaded TimelineEntity Key: YARN-2445 URL: https://issues.apache.org/jira/browse/YARN-2445 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Marcelo Vanzin Priority: Minor Attachments: ats2.java If you make a change to the TimelineEntity and send it to the ATS, that change is not reflected in the stored data. For example, in the attached code, an existing primary filter is removed and a new one is added. When you retrieve the entity from the ATS, it only contains the old value: {noformat} {"entities":[{"events":[],"entitytype":"test","entity":"testid-ad5380c0-090e-4982-8da8-21676fe4e9f4","starttime":1408746026958,"relatedentities":{},"primaryfilters":{"oldprop":["val"]},"otherinfo":{}}]} {noformat} Perhaps this is what the design wanted, but from an API user standpoint, it's really confusing, since to upload events I have to upload the entity itself, and the changes are not reflected. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: YARN-2360-v5.patch > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, > YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: Screen_Shot_v5.png > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, > YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: (was: YARN-2360-v5.patch) > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, > YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107563#comment-14107563 ] Hadoop QA commented on YARN-2408: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663726/YARN-2408-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4697//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4697//console This message is automatically generated. > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408-3.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API: > {code:xml} > > 96256 > 94 > > application_ > appattempt_ > default > 96256 > 94 > 3 > > > 1024 > 1 > /default-rack > 94 > true > 20 > > > 1024 > 1 > * > 94 > true > 20 > > > 1024 > 1 > master > 94 > true > 20 > > > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: YARN-2360-v5.patch A new patch that adds description in the fair scheduler .apt.vm file, also shows the description in the web UI when the mouse hover over the "steady fair share" label or "instantaneous fair share" label. > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, > YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: (was: Screen_Shot_v5.png) > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, > YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: Screen_Shot_v5.png > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, > YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107479#comment-14107479 ] Hadoop QA commented on YARN-2360: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663715/YARN-2360-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4696//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4696//console This message is automatically generated. > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, > YARN-2360-v3.patch, YARN-2360-v4.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107471#comment-14107471 ] zhihai xu commented on YARN-1458: - I uploaded a new patch "YARN-1458.003.patch" to resolve merge conflict after rebase to latest code. > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-1458: Attachment: YARN-1458.003.patch > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107429#comment-14107429 ] Hadoop QA commented on YARN-2440: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663704/apache-yarn-2440.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4694//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4694//console This message is automatically generated. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: YARN-2408-3.patch Bug fix > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408-3.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API: > {code:xml} > > 96256 > 94 > > application_ > appattempt_ > default > 96256 > 94 > 3 > > > 1024 > 1 > /default-rack > 94 > true > 20 > > > 1024 > 1 > * > 94 > true > 20 > > > 1024 > 1 > master > 94 > true > 20 > > > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: (was: YARN-2408-2.patch) > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408-3.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API: > {code:xml} > > 96256 > 94 > > application_ > appattempt_ > default > 96256 > 94 > 3 > > > 1024 > 1 > /default-rack > 94 > true > 20 > > > 1024 > 1 > * > 94 > true > 20 > > > 1024 > 1 > master > 94 > true > 20 > > > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107379#comment-14107379 ] Hadoop QA commented on YARN-1458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663617/YARN-1458.002.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4695//console This message is automatically generated. > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107354#comment-14107354 ] Karthik Kambatla commented on YARN-2360: Agree with Ashwin - we should definitely describe them in the apt.vm file, defining them on the UI is also very useful. > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, > YARN-2360-v3.patch, YARN-2360-v4.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107351#comment-14107351 ] Ashwin Shankar commented on YARN-2360: -- [~ywskycn], patch looks good. Should we mention what "Instantaneous" and "Steady" fair share means in the fair scheduler doc ie apt.vm file, so that users know what it means ? I'm also torn on whether we should define these terms on the UI as part of the legend tool tip or some other way ? > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, > YARN-2360-v3.patch, YARN-2360-v4.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107331#comment-14107331 ] Jason Lowe commented on YARN-2440: -- Sure for this JIRA we can go with a percent of total CPU to limit YARN. For something like YARN-160 we'd need the user to specify some kind of relationship between vcores and physical cores. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: Screen_Shot_v4.png > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, > YARN-2360-v3.patch, YARN-2360-v4.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: YARN-2360-v4.patch > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, Screen_Shot_v4.png, YARN-2360-v1.txt, YARN-2360-v2.txt, > YARN-2360-v3.patch, YARN-2360-v4.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.
[ https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107276#comment-14107276 ] Marcelo Vanzin commented on YARN-2444: -- Ah, I'm using leveldb if that makes a difference. > Primary filters added after first submission not indexed, cause exceptions in > logs. > --- > > Key: YARN-2444 > URL: https://issues.apache.org/jira/browse/YARN-2444 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.5.0 >Reporter: Marcelo Vanzin > Attachments: ats.java > > > See attached code for an example. The code creates an entity with a primary > filter, submits it to the ATS. After that, a new primary filter value is > added and the entity is resubmitted. At that point two things can be seen: > - Searching for the new primary filter value does not return the entity > - The following exception shows up in the logs: > {noformat} > 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying > access for user dr.who (auth:SIMPLE) on the events of the timeline entity { > id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } > org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the > timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test > } is corrupted. > at > org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67) > at > org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107275#comment-14107275 ] Varun Vasudev commented on YARN-2440: - It might make things easier to go with [~sandyr] idea to add a configuration to add a config which expresses a % of node's CPU that is used by YARN. [~jlowe] would that address your concerns? > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.
[ https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107269#comment-14107269 ] Marcelo Vanzin commented on YARN-2444: -- The following search causes the problem described above: {noformat}/ws/v1/timeline/test?primaryFilter=prop2:val2{noformat} The following one works as expected: {noformat}/ws/v1/timeline/test?primaryFilter=prop1:val1{noformat} > Primary filters added after first submission not indexed, cause exceptions in > logs. > --- > > Key: YARN-2444 > URL: https://issues.apache.org/jira/browse/YARN-2444 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.5.0 >Reporter: Marcelo Vanzin > Attachments: ats.java > > > See attached code for an example. The code creates an entity with a primary > filter, submits it to the ATS. After that, a new primary filter value is > added and the entity is resubmitted. At that point two things can be seen: > - Searching for the new primary filter value does not return the entity > - The following exception shows up in the logs: > {noformat} > 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying > access for user dr.who (auth:SIMPLE) on the events of the timeline entity { > id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } > org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the > timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test > } is corrupted. > at > org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67) > at > org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.
Marcelo Vanzin created YARN-2444: Summary: Primary filters added after first submission not indexed, cause exceptions in logs. Key: YARN-2444 URL: https://issues.apache.org/jira/browse/YARN-2444 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.5.0 Reporter: Marcelo Vanzin Attachments: ats.java See attached code for an example. The code creates an entity with a primary filter, submits it to the ATS. After that, a new primary filter value is added and the entity is resubmitted. At that point two things can be seen: - Searching for the new primary filter value does not return the entity - The following exception shows up in the logs: {noformat} 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying access for user dr.who (auth:SIMPLE) on the events of the timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } is corrupted. at org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67) at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.
[ https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated YARN-2444: - Attachment: ats.java > Primary filters added after first submission not indexed, cause exceptions in > logs. > --- > > Key: YARN-2444 > URL: https://issues.apache.org/jira/browse/YARN-2444 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.5.0 >Reporter: Marcelo Vanzin > Attachments: ats.java > > > See attached code for an example. The code creates an entity with a primary > filter, submits it to the ATS. After that, a new primary filter value is > added and the entity is resubmitted. At that point two things can be seen: > - Searching for the new primary filter value does not return the entity > - The following exception shows up in the logs: > {noformat} > 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying > access for user dr.who (auth:SIMPLE) on the events of the timeline entity { > id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } > org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the > timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test > } is corrupted. > at > org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67) > at > org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107248#comment-14107248 ] Sandy Ryza commented on YARN-2440: -- We removed it because it wasn't consistent with the vmem-pmem-ratio and was an unnecessary layer of indirection. If automatically configuring a node's vcore resource based on its physical characteristics is a goal, I wouldn't be opposed to adding something back in. For the purposes of this JIRA, might it be simpler to express a config in terms of the % of the node's CPU power that YARN gets? > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107252#comment-14107252 ] Wei Yan commented on YARN-2360: --- Thanks, Karthik. Will update a patch with changes, also another problem in the FairSchedulerQueueInfo. > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107244#comment-14107244 ] Karthik Kambatla commented on YARN-2360: I would rename the legend to "Steady fairshare" and "Instantaneous fairshare". > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU
[ https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107221#comment-14107221 ] Wei Yan commented on YARN-810: -- bq. With your current implementation, on a machine with 4 cores(and 4 vcores), a container which requests 2 vcores will have cfs_period_us set to 4096 and cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my understanding wrong? Thanks, [~vvasudev]. I mentioned this problem after reading your YARN-2420 patch. I'll double check this problem, and will update the patch. > Support CGroup ceiling enforcement on CPU > - > > Key: YARN-810 > URL: https://issues.apache.org/jira/browse/YARN-810 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta, 2.0.5-alpha >Reporter: Chris Riccomini >Assignee: Sandy Ryza > Attachments: YARN-810.patch, YARN-810.patch > > > Problem statement: > YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. > Containers are then allowed to request vcores between the minimum and maximum > defined in the yarn-site.xml. > In the case where a single-threaded container requests 1 vcore, with a > pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of > the core it's using, provided that no other container is also using it. This > happens, even though the only guarantee that YARN/CGroups is making is that > the container will get "at least" 1/4th of the core. > If a second container then comes along, the second container can take > resources from the first, provided that the first container is still getting > at least its fair share (1/4th). > There are certain cases where this is desirable. There are also certain cases > where it might be desirable to have a hard limit on CPU usage, and not allow > the process to go above the specified resource requirement, even if it's > available. > Here's an RFC that describes the problem in more detail: > http://lwn.net/Articles/336127/ > Solution: > As it happens, when CFS is used in combination with CGroups, you can enforce > a ceiling using two files in cgroups: > {noformat} > cpu.cfs_quota_us > cpu.cfs_period_us > {noformat} > The usage of these two files is documented in more detail here: > https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html > Testing: > I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, > it behaves as described above (it is a soft cap, and allows containers to use > more than they asked for). I then tested CFS CPU quotas manually with YARN. > First, you can see that CFS is in use in the CGroup, based on the file names: > {noformat} > [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/ > total 0 > -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs > drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02 > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares > -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat > -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release > -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us > 10 > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us > -1 > {noformat} > Oddly, it appears that the cfs_period_us is set to .1s, not 1s. > We can place processes in hard limits. I have process 4370 running YARN > container container_1371141151815_0003_01_03 on a host. By default, it's > running at ~300% cpu usage. > {noformat} > CPU > 4370 criccomi 20 0 1157m 551m 14m S 240.3 0.8 87:10.91 ... > {noformat} > When I set the CFS quote: > {noformat} > echo 1000 > > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us > CPU > 4370 criccomi 20 0 1157m 563m 14m S 1.0 0.8 90:08.39 ... > {noformat} > It drops to 1% usage, and you can see the box has room to spare: > {noformat} > Cpu(s): 2.4%us, 1.0%sy, 0.0%ni, 92.2%id, 4.2%wa, 0.0%hi, 0.1%si, > 0.0%st > {noformat} > Turning the quota back to -1: > {noformat} > echo -1 > > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us > {noformat} > Burns the cores again: > {noformat} > Cpu(s): 11.1%us, 1.7%sy, 0.0%ni, 83.9%id, 3.1%wa, 0.0%hi, 0.2%si, > 0.0%st >
[jira] [Updated] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2440: Attachment: apache-yarn-2440.1.patch Uploaded a new patch to address the issue raised by [~jlowe] on the max value of cfs_quota_us. I'll upload further versions once there's clarity on vcore to physical core mapping. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: Screen_Shot_v3.png > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: YARN-2360-v3.patch Update a patch after YARN-2393. The Screen_Shot_v3.png is the fair scheduler web page. > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > Screen_Shot_v3.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: Screen_Shot_v3.png > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > YARN-2360-v1.txt, YARN-2360-v2.txt > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2360: -- Attachment: (was: Screen_Shot_v3.png) > Fair Scheduler : Display dynamic fair share for queues on the scheduler page > > > Key: YARN-2360 > URL: https://issues.apache.org/jira/browse/YARN-2360 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, > YARN-2360-v1.txt, YARN-2360-v2.txt > > > Based on the discussion in YARN-2026, we'd like to display dynamic fair > share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU
[ https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107172#comment-14107172 ] Varun Vasudev commented on YARN-810: [~ywskycn] thanks for letting me know! Some comments on your patch - 1. In CgroupsLCEResourcesHandler.java, you set cfs_period_us to nmShares and cfs_quota_us to cpuShares. From the RedHat documentation, cfs_period_us and cfs_quota_us operate on a CPU basis. From the documentation {quote} Note that the quota and period parameters operate on a CPU basis. To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 20 and cpu.cfs_period_us to 10. {quote} With your current implementation, on a machine with 4 cores(and 4 vcores), a container which requests 2 vcores will have cfs_period_us set to 4096 and cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my understanding wrong? 2. This is just nitpicking, but is it possible to change CpuEnforceCeilingEnabled(and its variants) to just CpuCeilingEnabled or CpuCeilingEnforced? > Support CGroup ceiling enforcement on CPU > - > > Key: YARN-810 > URL: https://issues.apache.org/jira/browse/YARN-810 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta, 2.0.5-alpha >Reporter: Chris Riccomini >Assignee: Sandy Ryza > Attachments: YARN-810.patch, YARN-810.patch > > > Problem statement: > YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. > Containers are then allowed to request vcores between the minimum and maximum > defined in the yarn-site.xml. > In the case where a single-threaded container requests 1 vcore, with a > pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of > the core it's using, provided that no other container is also using it. This > happens, even though the only guarantee that YARN/CGroups is making is that > the container will get "at least" 1/4th of the core. > If a second container then comes along, the second container can take > resources from the first, provided that the first container is still getting > at least its fair share (1/4th). > There are certain cases where this is desirable. There are also certain cases > where it might be desirable to have a hard limit on CPU usage, and not allow > the process to go above the specified resource requirement, even if it's > available. > Here's an RFC that describes the problem in more detail: > http://lwn.net/Articles/336127/ > Solution: > As it happens, when CFS is used in combination with CGroups, you can enforce > a ceiling using two files in cgroups: > {noformat} > cpu.cfs_quota_us > cpu.cfs_period_us > {noformat} > The usage of these two files is documented in more detail here: > https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html > Testing: > I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, > it behaves as described above (it is a soft cap, and allows containers to use > more than they asked for). I then tested CFS CPU quotas manually with YARN. > First, you can see that CFS is in use in the CGroup, based on the file names: > {noformat} > [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/ > total 0 > -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs > drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02 > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares > -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat > -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release > -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us > 10 > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us > -1 > {noformat} > Oddly, it appears that the cfs_period_us is set to .1s, not 1s. > We can place processes in hard limits. I have process 4370 running YARN > container container_1371141151815_0003_01_03 on a host. By default, it's > running at ~300% cpu usage. > {noformat} > CPU > 4370 criccomi 20 0 1157m 551m 14m S 240.3 0.8 87:10.91 ... > {noformat} > When I set the CFS quote: > {noformat} > echo 1000 > > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us > CPU > 4370 criccomi 20 0 1157m 563m 14m S
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107141#comment-14107141 ] Jason Lowe commented on YARN-2440: -- Interesting. [~sandyr] could you comment? I'm wondering how we're going to support automatically setting a node's vcore value based on the node's physical characteristics without some kind of property to specify how to convert from physical core to vcore. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1104) NMs to support rolling logs of stdout & stderr
[ https://issues.apache.org/jira/browse/YARN-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1104: Parent Issue: YARN-2443 (was: YARN-896) > NMs to support rolling logs of stdout & stderr > -- > > Key: YARN-1104 > URL: https://issues.apache.org/jira/browse/YARN-1104 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Steve Loughran >Assignee: Xuan Gong > > Currently NMs stream the stdout and stderr streams of a container to a file. > For longer lived processes those files need to be rotated so that the log > doesn't overflow -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2443) Log Handling for Long Running Service
Xuan Gong created YARN-2443: --- Summary: Log Handling for Long Running Service Key: YARN-2443 URL: https://issues.apache.org/jira/browse/YARN-2443 Project: Hadoop YARN Issue Type: Task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107101#comment-14107101 ] Varun Vasudev commented on YARN-2440: - There used to be a variable for that ratio but it was removed in YARN-782. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107093#comment-14107093 ] Jason Lowe commented on YARN-2440: -- bq. does it make sense to get the number of physical cores on the machine and derive the vcore to physical cpu ratio? Only if the user can specify the multiplier between a vcore and a physical CPU. Not all physical CPUs are created equal, and as I mentioned earlier, some sites will want to allow fractions of a physical CPU to be allocated. Otherwise we're limiting the number of containers to the number of physical cores, and not all tasks require a full core. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107069#comment-14107069 ] Varun Vasudev commented on YARN-2440: - I'll update the patch to limit cfs_quota_us. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107068#comment-14107068 ] Varun Vasudev commented on YARN-2440: - [~jlowe] does it make sense to get the number of physical cores on the machine and derive the vcore to physical cpu ratio? > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107066#comment-14107066 ] zhihai xu commented on YARN-1458: - [~shurong.mai], YARN-1458.patch will cause regression. It won't work if all the weight and MinShare in the active queues are less than 1. The type conversion from double to int in computeShare loses precision. {code} private static int computeShare(Schedulable sched, double w2rRatio, ResourceType type) { double share = sched.getWeights().getWeight(type) * w2rRatio; share = Math.max(share, getResourceValue(sched.getMinShare(), type)); share = Math.min(share, getResourceValue(sched.getMaxShare(), type)); return (int) share; } {code} In above code, the initial value w2rRatio is 1.0. If weight and MinShare are less than 1, computeShare will return 0. resourceUsedWithWeightToResourceRatio will return the sum of all these return values from computeShare(after lose precision). It will be zero if all the weight and MinShare in the active queues are less than 1. Then YARN-1458.patch will exit the loop earlier with "rMax" value 1.0. Then "right" variable will be less than "rMax"(1.0). Then all queues' fair share will be set to 0 in the following code. {code} for (Schedulable sched : schedulables) { setResourceValue(computeShare(sched, right, type), sched.getFairShare(), type); } {code} This is the reason why the TestFairScheduler is failed at line 1049. testIsStarvedForFairShare configure the queueA weight 0.25 and queueB weight 0.75 and total node resource 4 * 1024. It creates two applications: one is assigned to queueA and the other is assigned to queueB. After FaiScheduler(update) calculated the fair share, queueA fair share should be 1 * 1024 and queueB fair share should be 3 * 1024. but with YARN-1458.patch, both queueA fair share and queueB fair share are set to 0, It is because in this test there are two active queues:queueA and queueB, both weights are less than 1(0.25 and 0.75), MinShare(minResources) in queueA and queueB are not configured, both MinShare use default value(0). > In Fair Scheduler, size based weight can cause update thread to hold lock > indefinitely > -- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Labels: patch > Fix For: 2.2.1 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, YARN-1458.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcema
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107057#comment-14107057 ] Jason Lowe commented on YARN-2440: -- I think cfs_quota_us has a maximum value of 100, so we may have an issue if vcores>10. I don't see how this takes into account the mapping of vcores to actual CPUs. It's not safe to assume 1 vcore == 1 physical CPU, as some sites will map multiple vcores to a physical core to allow fractions of a physical CPU to be allocated or to account for varying CPU performance across a heterogeneous cluster. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2441: Priority: Major (was: Minor) > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) > 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) FairScheduler: Add the notion of steady fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107050#comment-14107050 ] Wei Yan commented on YARN-2393: --- Thanks, [~kasha], [~ashwinshankar77]. Will post a patch for the YARN-2360 for the UI. > FairScheduler: Add the notion of steady fair share > -- > > Key: YARN-2393 > URL: https://issues.apache.org/jira/browse/YARN-2393 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Fix For: 2.6.0 > > Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, > yarn-2393-4.patch > > > Static fair share is a fair share allocation considering all(active/inactive) > queues.It would be shown on the UI for better predictability of finish time > of applications. > We would compute static fair share only when needed, like on queue creation, > node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107025#comment-14107025 ] Wei Yan commented on YARN-2440: --- [~vvasudev], I misunderstood this jira. Will post comment later. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2431) NM restart: cgroup is not removed for reacquired containers
[ https://issues.apache.org/jira/browse/YARN-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107008#comment-14107008 ] Jason Lowe commented on YARN-2431: -- Release audit problems are unrelated, see HDFS-6905. > NM restart: cgroup is not removed for reacquired containers > --- > > Key: YARN-2431 > URL: https://issues.apache.org/jira/browse/YARN-2431 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2431.patch > > > The cgroup for a reacquired container is not being removed when the container > exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107012#comment-14107012 ] Varun Vasudev commented on YARN-2440: - [~ywskycn] this patch doesn't limit containers to container_vcores/NM_vcores ratio. What it does do is limit the overall YARN usage to the yarn.nodemanager.resource.cpu-vcores. If you have 4 cores on a machine and set yarn.nodemanager.resource.cpu-vcores 2, we don't restrict the YARN containers to 2 cores. The containers can create threads and use up as many cores as they want, which defeats the purpose of setting yarn.nodemanager.resource.cpu-vcores. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107005#comment-14107005 ] Wei Yan commented on YARN-2440: --- [~vvasudev], for general cases, we shouldn't strictly limit the cfs_quota_us. We always want to let co-located containers to share the cpu resource in a proportional way, not strictly follow the container_vcores/NM_vcores ratio. We have one runnable patch in YARN-810. I'll check with Sandy for the reviewing. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) FairScheduler: Add the notion of steady fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107007#comment-14107007 ] Hudson commented on YARN-2393: -- FAILURE: Integrated in Hadoop-trunk-Commit #6097 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6097/]) YARN-2393. FairScheduler: Add the notion of steady fair share. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619845) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueueMetrics.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerFairShare.java > FairScheduler: Add the notion of steady fair share > -- > > Key: YARN-2393 > URL: https://issues.apache.org/jira/browse/YARN-2393 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, > yarn-2393-4.patch > > > Static fair share is a fair share allocation considering all(active/inactive) > queues.It would be shown on the UI for better predictability of finish time > of applications. > We would compute static fair share only when needed, like on queue creation, > node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU
[ https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107000#comment-14107000 ] Wei Yan commented on YARN-810: -- [~vvasudev], thanks for the offer. I'm still working on this. > Support CGroup ceiling enforcement on CPU > - > > Key: YARN-810 > URL: https://issues.apache.org/jira/browse/YARN-810 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-beta, 2.0.5-alpha >Reporter: Chris Riccomini >Assignee: Sandy Ryza > Attachments: YARN-810.patch, YARN-810.patch > > > Problem statement: > YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. > Containers are then allowed to request vcores between the minimum and maximum > defined in the yarn-site.xml. > In the case where a single-threaded container requests 1 vcore, with a > pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of > the core it's using, provided that no other container is also using it. This > happens, even though the only guarantee that YARN/CGroups is making is that > the container will get "at least" 1/4th of the core. > If a second container then comes along, the second container can take > resources from the first, provided that the first container is still getting > at least its fair share (1/4th). > There are certain cases where this is desirable. There are also certain cases > where it might be desirable to have a hard limit on CPU usage, and not allow > the process to go above the specified resource requirement, even if it's > available. > Here's an RFC that describes the problem in more detail: > http://lwn.net/Articles/336127/ > Solution: > As it happens, when CFS is used in combination with CGroups, you can enforce > a ceiling using two files in cgroups: > {noformat} > cpu.cfs_quota_us > cpu.cfs_period_us > {noformat} > The usage of these two files is documented in more detail here: > https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html > Testing: > I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, > it behaves as described above (it is a soft cap, and allows containers to use > more than they asked for). I then tested CFS CPU quotas manually with YARN. > First, you can see that CFS is in use in the CGroup, based on the file names: > {noformat} > [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/ > total 0 > -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs > drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02 > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us > -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares > -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat > -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release > -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us > 10 > [criccomi@eat1-qa464 ~]$ sudo -u app cat > /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us > -1 > {noformat} > Oddly, it appears that the cfs_period_us is set to .1s, not 1s. > We can place processes in hard limits. I have process 4370 running YARN > container container_1371141151815_0003_01_03 on a host. By default, it's > running at ~300% cpu usage. > {noformat} > CPU > 4370 criccomi 20 0 1157m 551m 14m S 240.3 0.8 87:10.91 ... > {noformat} > When I set the CFS quote: > {noformat} > echo 1000 > > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us > CPU > 4370 criccomi 20 0 1157m 563m 14m S 1.0 0.8 90:08.39 ... > {noformat} > It drops to 1% usage, and you can see the box has room to spare: > {noformat} > Cpu(s): 2.4%us, 1.0%sy, 0.0%ni, 92.2%id, 4.2%wa, 0.0%hi, 0.1%si, > 0.0%st > {noformat} > Turning the quota back to -1: > {noformat} > echo -1 > > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us > {noformat} > Burns the cores again: > {noformat} > Cpu(s): 11.1%us, 1.7%sy, 0.0%ni, 83.9%id, 3.1%wa, 0.0%hi, 0.2%si, > 0.0%st > CPU > 4370 criccomi 20 0 1157m 563m 14m S 253.9 0.8 89:32.31 ... > {noformat} > On my dev box, I was testing CGroups by running a python process eight times, > to burn through all the cores, since it was doing as described above (giving > extra CPU to the process, even with a cpu.shares limit). T
[jira] [Commented] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106999#comment-14106999 ] Jason Lowe commented on YARN-2441: -- Ah, then this seems like a case where a client (likely an AM) is connecting to the NM before the NM has finished registering with the RM to get the secret keys. Trying to block new container requests at the app level probably isn't going to work in practice because the SASL layer in RPC doesn't let the connection get to the point where the app can try to reject the request. IMHO we should remove the "blocking client requests" code and instead do a delayed server start, sorta like the delay added by YARN-1337 when NM recovery is enabled. Ideally the RPC layer would support the ability to bind to a server socket but not start accepting requests until later. That would allow us to register with the RM knowing what our client port is but without trying to let clients through that port until we're really ready. Shorter term fix might be to have the secret manager throw an exception that can be retried by clients if the master key isn't set yet. > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty >Priority: Minor > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at or
[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106995#comment-14106995 ] Varun Vasudev commented on YARN-160: [~djp] {quote} Both physical id and core id are not guaranteed to have in /proc/cpuinfo (please see below for my local VM's info). We may use processor number instead in case these ids are 0 (like we did in Windows). Again, this weak my confidence that this automatic way of getting CPU/memory resources should happen by default (not sure if any cross-platform issues). May be a safer way here is to keep previous default behavior (with some static setting) with an extra config to enable this. We can wait this feature to be more stable later to change the default behavior. {noformat} processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 70 model name : Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz stepping: 1 cpu MHz : 2295.265 cache size : 6144 KB fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi ept vpid fsgsbase smep bogomips: 4590.53 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: {noformat} {quote} In the example you gave, where we have processors listed but no physical id or core id entries, the numProcessors will be set to the number of entries and numCores will be set to 1. From the diff - {noformat} + numCores = 1; {noformat} There is also a test case to ensure this behaviour. In addition, cluster administrators can decide whether the NodeManager should report numProcessors or numCores by toggling yarn.nodemanager.resource.count-logical-processors-as-vcores which by default is true. In the vm example, by default the NodeManager will report vcores as the number of processor entries in /proc/cpuinfo. If yarn.nodemanager.resource.count-logical-processors-as-vcores is set to false, the NodeManager will report vcores as 1(if there are no physical id or core id entries). > nodemanagers should obtain cpu/memory values from underlying OS > --- > > Key: YARN-160 > URL: https://issues.apache.org/jira/browse/YARN-160 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Assignee: Varun Vasudev > Fix For: 2.6.0 > > Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch > > > As mentioned in YARN-2 > *NM memory and CPU configs* > Currently these values are coming from the config of the NM, we should be > able to obtain those values from the OS (ie, in the case of Linux from > /proc/meminfo & /proc/cpuinfo). As this is highly OS dependent we should have > an interface that obtains this information. In addition implementations of > this interface should be able to specify a mem/cpu offset (amount of mem/cpu > not to be avail as YARN resource), this would allow to reserve mem/cpu for > the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2393) FairScheduler: Add the notion of steady fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2393: --- Issue Type: New Feature (was: Improvement) > FairScheduler: Add the notion of steady fair share > -- > > Key: YARN-2393 > URL: https://issues.apache.org/jira/browse/YARN-2393 > Project: Hadoop YARN > Issue Type: New Feature > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, > yarn-2393-4.patch > > > Static fair share is a fair share allocation considering all(active/inactive) > queues.It would be shown on the UI for better predictability of finish time > of applications. > We would compute static fair share only when needed, like on queue creation, > node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2393) FairScheduler: Add the notion of steady fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2393: --- Summary: FairScheduler: Add the notion of steady fair share (was: FairScheduler: Implement steady fair share) > FairScheduler: Add the notion of steady fair share > -- > > Key: YARN-2393 > URL: https://issues.apache.org/jira/browse/YARN-2393 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, > yarn-2393-4.patch > > > Static fair share is a fair share allocation considering all(active/inactive) > queues.It would be shown on the UI for better predictability of finish time > of applications. > We would compute static fair share only when needed, like on queue creation, > node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) FairScheduler: Implement steady fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106991#comment-14106991 ] Karthik Kambatla commented on YARN-2393: Committing this. > FairScheduler: Implement steady fair share > -- > > Key: YARN-2393 > URL: https://issues.apache.org/jira/browse/YARN-2393 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, > yarn-2393-4.patch > > > Static fair share is a fair share allocation considering all(active/inactive) > queues.It would be shown on the UI for better predictability of finish time > of applications. > We would compute static fair share only when needed, like on queue creation, > node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) FairScheduler: Implement steady fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106990#comment-14106990 ] Karthik Kambatla commented on YARN-2393: One of the reasons we (Sandy and I) wanted to make the fairshare being used for scheduling instantaneous was to address the case where the maxAMResource becomes so small when there are multiple queues that we can't run any applications at all. I think it is better to leave it as is. In case any one runs into (in testing) issues with maxAMResource, we can consider preempting AMs as an alternative. > FairScheduler: Implement steady fair share > -- > > Key: YARN-2393 > URL: https://issues.apache.org/jira/browse/YARN-2393 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, > yarn-2393-4.patch > > > Static fair share is a fair share allocation considering all(active/inactive) > queues.It would be shown on the UI for better predictability of finish time > of applications. > We would compute static fair share only when needed, like on queue creation, > node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106987#comment-14106987 ] Nishan Shetty commented on YARN-2441: - [~jlowe] Sorry i mentioned the wrong Affected Version. Its branch 2. Work-preserving NM is not enabled, its just plain restart > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty >Priority: Minor > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) > 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State
[ https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2442: Affects Version/s: (was: 3.0.0) 2.5.0 > ResourceManager JMX UI does not give HA State > - > > Key: YARN-2442 > URL: https://issues.apache.org/jira/browse/YARN-2442 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty >Priority: Trivial > > ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, > STOPPED) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2393) FairScheduler: Implement steady fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2393: --- Summary: FairScheduler: Implement steady fair share (was: Fair Scheduler : Implement steady fair share) > FairScheduler: Implement steady fair share > -- > > Key: YARN-2393 > URL: https://issues.apache.org/jira/browse/YARN-2393 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Wei Yan > Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch, > yarn-2393-4.patch > > > Static fair share is a fair share allocation considering all(active/inactive) > queues.It would be shown on the UI for better predictability of finish time > of applications. > We would compute static fair share only when needed, like on queue creation, > node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2441: Affects Version/s: (was: 3.0.0) 2.5.0 > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty >Priority: Minor > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) > 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2436) [post-HADOOP-9902] yarn application help doesn't work
[ https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106935#comment-14106935 ] Hudson commented on YARN-2436: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1871 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1871/]) YARN-2436. [post-HADOOP-9902] yarn application help doesn't work (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619603) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn > [post-HADOOP-9902] yarn application help doesn't work > - > > Key: YARN-2436 > URL: https://issues.apache.org/jira/browse/YARN-2436 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: newbie > Fix For: 3.0.0 > > Attachments: YARN-2436.patch > > > The previous version of the yarn command plays games with the command stack > for some commands. The new code needs duplicate this wackiness. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled
[ https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106932#comment-14106932 ] Hudson commented on YARN-2434: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1871 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1871/]) YARN-2434. RM should not recover containers from previously failed attempt when AM restart is not enabled. Contributed by Jian He (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619614) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java > RM should not recover containers from previously failed attempt when AM > restart is not enabled > -- > > Key: YARN-2434 > URL: https://issues.apache.org/jira/browse/YARN-2434 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Fix For: 3.0.0, 2.6.0 > > Attachments: YARN-2434.1.patch > > > If container-preserving AM restart is not enabled and AM failed during RM > restart, RM on restart should not recover containers from previously failed > attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106902#comment-14106902 ] Jason Lowe commented on YARN-2441: -- Was this truly running trunk as the Affected Versions field indicates or was this some other version of Hadoop? Also was this a work-preserving NM restart scenario (i.e.: yarn.nodemanager.recovery.enabled=true) or a typical NM startup? > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Nishan Shetty >Priority: Minor > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) > 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106893#comment-14106893 ] Varun Vasudev commented on YARN-2440: - [~nroberts] there's already a ticket for your request - YARN-810. That's next on my todo list. I've left a comment there asking if I can take it over. > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2440) Cgroups should limit YARN containers to cores allocated in yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106889#comment-14106889 ] Nathan Roberts commented on YARN-2440: -- Thanks Varun for the patch. I'm wondering if it would be possible to make this configurable at the system level and per-app. For example, I'd like an application to be able to specify that it wants to run with strict container limits (to verify SLA's for example), but in general I don't want these limits in place (why not let a container use additional CPU if it's available?). > Cgroups should limit YARN containers to cores allocated in yarn-site.xml > > > Key: YARN-2440 > URL: https://issues.apache.org/jira/browse/YARN-2440 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2440.0.patch, > screenshot-current-implementation.jpg > > > The current cgroups implementation does not limit YARN containers to the > cores allocated in yarn-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled
[ https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106860#comment-14106860 ] Hudson commented on YARN-2434: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1845 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1845/]) YARN-2434. RM should not recover containers from previously failed attempt when AM restart is not enabled. Contributed by Jian He (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619614) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java > RM should not recover containers from previously failed attempt when AM > restart is not enabled > -- > > Key: YARN-2434 > URL: https://issues.apache.org/jira/browse/YARN-2434 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Fix For: 3.0.0, 2.6.0 > > Attachments: YARN-2434.1.patch > > > If container-preserving AM restart is not enabled and AM failed during RM > restart, RM on restart should not recover containers from previously failed > attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2436) [post-HADOOP-9902] yarn application help doesn't work
[ https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106863#comment-14106863 ] Hudson commented on YARN-2436: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1845 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1845/]) YARN-2436. [post-HADOOP-9902] yarn application help doesn't work (aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619603) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn > [post-HADOOP-9902] yarn application help doesn't work > - > > Key: YARN-2436 > URL: https://issues.apache.org/jira/browse/YARN-2436 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: newbie > Fix For: 3.0.0 > > Attachments: YARN-2436.patch > > > The previous version of the yarn command plays games with the command stack > for some commands. The new code needs duplicate this wackiness. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2345) yarn rmadmin -report
[ https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106816#comment-14106816 ] Allen Wittenauer edited comment on YARN-2345 at 8/22/14 1:28 PM: - [~leftnoteasy]], this is to bring consistency between HDFS and YARN.hdfs dfsadmin -report has existed for a very long time while YARN doesn't have one. From a user perspective, it's irrelevant what is happening on the inside, just that YARN is "weird" if the equivalent is "yarn node -all -list". was (Author: aw): [~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin -report has existed for a very long time while YARN doesn't have one. From a user perspective, it's irrelevant what is happening on the inside, just that YARN is "weird" if the equivalent is "yarn node -all -list". > yarn rmadmin -report > > > Key: YARN-2345 > URL: https://issues.apache.org/jira/browse/YARN-2345 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Allen Wittenauer >Assignee: Hao Gao > Labels: newbie > Attachments: YARN-2345.1.patch > > > It would be good to have an equivalent of hdfs dfsadmin -report in YARN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2345) yarn rmadmin -report
[ https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106816#comment-14106816 ] Allen Wittenauer commented on YARN-2345: [~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin -report has existed for a very long time while the RM doesn't have one. From a user perspective, it's irrelevant what is happening on the inside, just that YARN is "weird" if the equivalent is "yarn node -all -list". > yarn rmadmin -report > > > Key: YARN-2345 > URL: https://issues.apache.org/jira/browse/YARN-2345 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Allen Wittenauer >Assignee: Hao Gao > Labels: newbie > Attachments: YARN-2345.1.patch > > > It would be good to have an equivalent of hdfs dfsadmin -report in YARN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-2345) yarn rmadmin -report
[ https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106816#comment-14106816 ] Allen Wittenauer edited comment on YARN-2345 at 8/22/14 1:26 PM: - [~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin -report has existed for a very long time while YARN doesn't have one. From a user perspective, it's irrelevant what is happening on the inside, just that YARN is "weird" if the equivalent is "yarn node -all -list". was (Author: aw): [~wangda], this is to bring consistency between HDFS and YARN.hdfs dfsadmin -report has existed for a very long time while the RM doesn't have one. From a user perspective, it's irrelevant what is happening on the inside, just that YARN is "weird" if the equivalent is "yarn node -all -list". > yarn rmadmin -report > > > Key: YARN-2345 > URL: https://issues.apache.org/jira/browse/YARN-2345 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Allen Wittenauer >Assignee: Hao Gao > Labels: newbie > Attachments: YARN-2345.1.patch > > > It would be good to have an equivalent of hdfs dfsadmin -report in YARN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2442) ResourceManager JMX UI does not give HA State
Nishan Shetty created YARN-2442: --- Summary: ResourceManager JMX UI does not give HA State Key: YARN-2442 URL: https://issues.apache.org/jira/browse/YARN-2442 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Priority: Trivial ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, STOPPED) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2441) NPE in nodemanager after restart
Nishan Shetty created YARN-2441: --- Summary: NPE in nodemanager after restart Key: YARN-2441 URL: https://issues.apache.org/jira/browse/YARN-2441 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Priority: Minor {code} 2014-08-22 16:43:19,640 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting. 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 45026: starting 2014-08-22 16:43:20,029 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : host-10-18-40-95:45026 2014-08-22 16:43:20,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at /10.18.40.95:45026 2014-08-22 16:43:20,030 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45027 2014-08-22 16:43:20,158 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 45027: starting 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 45026: readAndProcess from client 10.18.40.84 threw exception [java.lang.NullPointerException] java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) at org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) at org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) at org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) at org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) at org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 45026: readAndProcess from client 10.18.40.84 threw exception [java.lang.NullPointerException] java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled
[ https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106717#comment-14106717 ] Hudson commented on YARN-2434: -- FAILURE: Integrated in Hadoop-Yarn-trunk #654 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/654/]) YARN-2434. RM should not recover containers from previously failed attempt when AM restart is not enabled. Contributed by Jian He (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1619614) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java > RM should not recover containers from previously failed attempt when AM > restart is not enabled > -- > > Key: YARN-2434 > URL: https://issues.apache.org/jira/browse/YARN-2434 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Fix For: 3.0.0, 2.6.0 > > Attachments: YARN-2434.1.patch > > > If container-preserving AM restart is not enabled and AM failed during RM > restart, RM on restart should not recover containers from previously failed > attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)