[jira] [Commented] (YARN-396) Rationalize AllocateResponse in RM scheduler API

2014-03-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940073#comment-13940073
 ] 

ASF GitHub Bot commented on YARN-396:
-

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/151#issuecomment-38011309
  
@tgravescs FWIW I think something like this is the case, yes. The change 
happened in https://issues.apache.org/jira/browse/YARN-396 which occurred for 
the first(?) YARN TLP release with Hadoop 2.1. And CDH 4.4 was the first 
release I see that picked up this change. I assume it was useful/necessary to 
float this 'alpha' API ahead.

I also would have thought it's possible the `yarn` profile works with this 
release, but I do not know. Just making sure that has been tried?

Otherwise yeah it looks like a question of supporting another intermediate 
flavor of YARN here since it did change in breaking ways several times between 
0.23.x and 2.2


 Rationalize AllocateResponse in RM scheduler API
 

 Key: YARN-396
 URL: https://issues.apache.org/jira/browse/YARN-396
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
  Labels: incompatible
 Fix For: 2.1.0-beta

 Attachments: YARN-396_1.patch, YARN-396_2.patch, YARN-396_3.patch, 
 YARN-396_4.patch, YARN-396_5.patch


 AllocateResponse contains an AMResponse and cluster node count. AMResponse 
 that more data. Unless there is a good reason for this object structure, 
 there should be either AMResponse or AllocateResponse.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046547#comment-15046547
 ] 

ASF GitHub Bot commented on YARN-4434:
--

GitHub user bwtakacy opened a pull request:

https://github.com/apache/hadoop/pull/62

YARN-4434.NodeManager Disk Checker parameter documentation is not cor…

…rect

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bwtakacy/hadoop feature/YARN-4434

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/62.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #62


commit d1fabcaa9000f50b839f54e53868fa3ee921fa80
Author: Takashi Ohnishi 
Date:   2015-12-08T08:02:14Z

YARN-4434.NodeManager Disk Checker parameter documentation is not correct




> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Takashi Ohnishi
>Priority: Minor
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048226#comment-15048226
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user aajisaka commented on the pull request:

https://github.com/apache/hadoop/pull/62#issuecomment-163142634
  
I've committed the patch (B), so would you close this pull request?


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048225#comment-15048225
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user aajisaka commented on the pull request:

https://github.com/apache/hadoop/pull/62#issuecomment-163142562
  
Thank you for the pull request. I reviewed the patch (A) and the another 
patch in YARN-4434 jira (B) and decided to commit the patch (B) because the 
patch (B) replaces "i.e. the entire disk" with "i.e. 90% of the disk" as well.


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation

2015-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046033#comment-15046033
 ] 

ASF GitHub Bot commented on YARN-3305:
--

Github user smarella commented on the pull request:


https://github.com/apache/hadoop/commit/968425e9f7b850ff9c2ab8ca37a64c3fdbe77dbf#commitcomment-14835835
  
In 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java:
In 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
 on line 397:
YARN-3305 seem to break the Myriad builds (see below). As indicated in 
YARN-3305, https://issues.apache.org/jira/browse/YARN-3996 is trying to fix the 
problem. YARN-3996 is currently work-in-progress, but we need to back port some 
of it immediately to unblock Myriad. Once YARN-3996 is fully fixed, it needs to 
be fully back ported. I've requested Sarjeet to raise two bugs for this (one to 
provide an immediate fix and second to back port YARN-3996 once it's resolved).

Currently, the latest Myriad builds are effected by this change. The 
problem is that, the AM container resources are normalized during app 
submission with ```scheduler.getminimumResourceCapability()```, which comes 
from yarn.scheduler.minimum-allocation-{mb,vcores,disks}. These values are set 
to "0" by Myriad. Hence, the AM container allocated will be of zero size and is 
killed by NM as soon as it is launched.

The reason why Myriad needs 
yarn.scheduler.minimum-allocation-{mb,vcores,disks} to be 0 is for discussed in 
https://issues.apache.org/jira/browse/MYRIAD-139.


> AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is 
> less than minimumAllocation
> 
>
> Key: YARN-3305
> URL: https://issues.apache.org/jira/browse/YARN-3305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 
> 0002-YARN-3305.patch, 0003-YARN-3305.patch
>
>
> For given any ResourceRequest, {{CS#allocate}} normalizes request to 
> minimumAllocation if requested memory is less than minimumAllocation.
> But AM-used resource is updated with actual ResourceRequest made by user. 
> This results in AM container allocation more than Max ApplicationMaster 
> Resource.
> This is because AM-Used is updated with actual ResourceRequest made by user 
> while activating the applications. But during allocation of container, 
> ResourceRequest is normalized to minimumAllocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050079#comment-15050079
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user bwtakacy commented on the pull request:

https://github.com/apache/hadoop/pull/62#issuecomment-163488815
  
OK.
I will close this PR.

Thanks!



> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct

2015-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050080#comment-15050080
 ] 

ASF GitHub Bot commented on YARN-4434:
--

Github user bwtakacy closed the pull request at:

https://github.com/apache/hadoop/pull/62


> NodeManager Disk Checker parameter documentation is not correct
> ---
>
> Key: YARN-4434
> URL: https://issues.apache.org/jira/browse/YARN-4434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Takashi Ohnishi
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch
>
>
> In the description of 
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage,
>  it says
> {noformat}
> The default value is 100 i.e. the entire disk can be used.
> {noformat}
> But, in yarn-default.xml and source code, the default value is 90.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089647#comment-15089647
 ] 

ASF GitHub Bot commented on YARN-2571:
--

GitHub user steveloughran opened a pull request:

https://github.com/apache/hadoop/pull/66

YARN-2571 RM to support YARN registry



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/hadoop 
YARN-913/YARN-2571-RM-on-trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/66.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #66


commit 25a56da6037bb50d3ce5bbcc3001914e51ea2457
Author: Steve Loughran 
Date:   2015-11-11T20:20:23Z

YARN-2571 RM setup of registry, reapplied to trunk




> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: BB2015-05-TBR
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4567) javadoc failing on java 8

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089809#comment-15089809
 ] 

ASF GitHub Bot commented on YARN-4567:
--

GitHub user steveloughran opened a pull request:

https://github.com/apache/hadoop/pull/67

YARN-4567 javadoc failing on java 8



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/hadoop 
stevel/patches/YARN-4567-java8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/67.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #67


commit eda776bdb3153f4821d4b7e05554ff8d1a0f0ab8
Author: Steve Loughran 
Date:   2016-01-08T19:42:37Z

YARN-4567 javadoc failing on java 8




> javadoc failing on java 8
> -
>
> Key: YARN-4567
> URL: https://issues.apache.org/jira/browse/YARN-4567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>
> Javadocs on Java 8 failing in the Yarn bit of the build. Jenkins is sad.
> {code}[ERROR] 
> /Users/stevel/Projects/Hortonworks/Projects/hadoop-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java:1009:
>  error: exception not thrown: java.lang.Exception
> [ERROR] * @throws Exception
> [ERROR] ^
> [ERROR] 
> [ERROR] Command line was: 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/javadoc 
> @options @packages
> [ERROR] 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090020#comment-15090020
 ] 

ASF GitHub Bot commented on YARN-679:
-

GitHub user steveloughran opened a pull request:

https://github.com/apache/hadoop/pull/68

YARN-679 service launcher

Pull-request version of YARN-679; initially the 005 patch plus corrections 
of javadocs and checkstyles

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/hadoop stevel/YARN-679-launcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/68.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #68


commit 8190fcbea75d203a43052339736ea2a412d44f16
Author: Steve Loughran 
Date:   2014-06-03T17:09:26Z

YARN-679: launcher code move

commit 5216a290371eb9050bf1fc98cd82aeea05f2f9d5
Author: Steve Loughran 
Date:   2014-06-03T18:43:41Z

YARN-679 service launcher adapting to changes in ExitUtil; passng params 
down as a list


commit a8ea0b26cb101dbfc47bb8349bfa3510d0701efe
Author: Steve Loughran 
Date:   2014-06-04T09:46:45Z

YARN-679 add javadocs & better launching for service-launcher

commit dcb4599ca9ed1feadff2d0149819640740405201
Author: Steve Loughran 
Date:   2014-06-04T10:54:44Z

YARN-679 move IRQ escalation into its own class for cleanliness and 
testability; lots of javadocs

commit bdd41f632deeb60a0b309e891755630d93956280
Author: Steve Loughran 
Date:   2014-06-04T13:19:53Z

YARN-679 initial TestInterruptHandling test

commit ff422b3dd70a9a39d7668b063811acee285fcbba
Author: Steve Loughran 
Date:   2014-06-04T14:26:33Z

YARN-679 TestInterruptHandling

commit 1d35197f8a8d80d3ca9aa4691b7f086686fcb454
Author: Steve Loughran 
Date:   2014-06-04T14:40:13Z

YARN-679 TestInterruptHandling final test -that blocking service stops 
don't stop shutdown from kicking in


commit ddbdfae3f7e2ce79f3c0138bc5c855bde8094c2f
Author: Steve Loughran 
Date:   2014-06-04T15:41:19Z

YARN-679: service exception handling improvements during creation, 
sprintf-formatted exception creation

commit db0a2ef4e8a46bfab6db4ec7a89cde70779432c8
Author: Steve Loughran 
Date:   2014-06-04T15:56:11Z

YARN-679 service instantiation failures

commit 2a95da1a320811c381b93c14125d56e2d21798c1
Author: Steve Loughran 
Date:   2014-06-04T17:41:54Z

YARN-679 lots more on exception handling and error code propagation, 
including making ServiceStateException have an exit code and propagate any 
inner one

commit 6fc00fa46e47d1ae6039d2e6d16b8bfb61c87ea1
Author: Steve Loughran 
Date:   2014-06-04T19:12:13Z

YARN-679 move test services into their own package; test for stop in 
runnable

commit 4dfed85a0a86440784583c41d2249d6c1106889d
Author: Steve Loughran 
Date:   2014-06-04T20:00:20Z

YARN-679 conf arg passdown validated

commit 6c12bb43a1d4554e7e196db7f9994562bd899fee
Author: Steve Loughran 
Date:   2014-06-05T10:03:34Z

YARN-679 test for service launch

commit 803250fb6810e7bd2373c53c9c07d9548c9eb71d
Author: Steve Loughran 
Date:   2014-06-05T12:46:26Z

YARN-679 test for bindArgs operations

commit f21f0fe6bdd8b1815080bdead225572c93430a24
Author: Steve Loughran 
Date:   2014-06-05T13:05:30Z

YARN-679 add AbstractLaunchedService base class for launched services, 
tests to verify that a subclass of this rejects arguments -but doesn't reject 
--conf args as they are stripped

commit a7056381a61fac239f71c7ecb8c40b74c4330864
Author: Steve Loughran 
Date:   2014-06-05T13:24:24Z

YARN-679 exception throwing/catching in execute

commit 24b74787dc52ca41dd6fde9db6c1dddb471ba1b8
Author: Steve Loughran 
Date:   2014-06-05T14:13:35Z

YARN-679 verify that constructor inits are handled

commit 554e317a0f2ef6ea353887e0cc501e0d27eb9a27
Author: Steve Loughran 
Date:   2014-06-05T14:28:14Z

services that only have a (String) constructor are handled by giving them 
their classname as a name

commit 49e457785752c1c27137ce1bb448b00f565cff20
Author: Steve Loughran 
Date:   2014-06-05T14:31:43Z

YARN-679 optimise imports

commit 62984ff26819965e86eb8727d1c5d8b73cd7fce9
Author: Steve Loughran 
Date:   2014-06-05T16:45:29Z

YARN-679 inner Launching logic with assertions and checks that Throwables 
get picked up and wrapped

commit ad8b79023536ebb1975613abbeb592f8826c06b2
Author: Steve Loughran 
Date:   

[jira] [Commented] (YARN-3477) TimelineClientImpl swallows exceptions

2015-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012190#comment-15012190
 ] 

ASF GitHub Bot commented on YARN-3477:
--

GitHub user steveloughran opened a pull request:

https://github.com/apache/hadoop/pull/47

YARN-3477 timeline diagnostics

YARN-3477 timeline diagnostics: add more details on why things are failing, 
including stack traces (at debug level sometimes)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/hadoop 
stevel/YARN-3477-timeline-diagnostics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #47


commit 5278ac3de77e866e6528b5b6fb6f8d294c541a5f
Author: Steve Loughran 
Date:   2015-04-23T14:18:26Z

YARN-3477 TimelineClientImpl swallows exceptions

commit 7a3701b66ef415ff8c5f9fdeec4ebe292d0eab75
Author: Steve Loughran 
Date:   2015-04-24T12:03:09Z

YARN-3477 patch 002
# rethrowing runtime exception on timeout, but including the IOE as an 
inner exception
# using constant strings in the error messages
# clean up tests to (a) use those constant strings in tests, (b) throw the 
original exception on any mismatch, plus other improvements

commit 43b6b1fc126bff5b4be95bdd2fbab3bf686edde5
Author: Steve Loughran 
Date:   2015-11-18T22:25:18Z

YARN-3277 make sure there's spaces; chop line > 80 chars wide




> TimelineClientImpl swallows exceptions
> --
>
> Key: YARN-3477
> URL: https://issues.apache.org/jira/browse/YARN-3477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-3477-001.patch, YARN-3477-002.patch
>
>
> If timeline client fails more than the retry count, the original exception is 
> not thrown. Instead some runtime exception is raised saying "retries run out"
> # the failing exception should be rethrown, ideally via 
> NetUtils.wrapException to include URL of the failing endpoing
> # Otherwise, the raised RTE should (a) state that URL and (b) set the 
> original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4387) Fix FairScheduler log message

2015-11-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023962#comment-15023962
 ] 

ASF GitHub Bot commented on YARN-4387:
--

GitHub user vesense opened a pull request:

https://github.com/apache/hadoop/pull/57

[YARN-4387] Fix FairScheduler log message



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vesense/hadoop patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/57.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #57


commit 26e1ab545ce0f16508e97237e5750ac9b4602069
Author: Xin Wang 
Date:   2015-11-24T08:09:46Z

Fix FairScheduler log message




> Fix FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4387) Fix FairScheduler log message

2015-11-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023978#comment-15023978
 ] 

ASF GitHub Bot commented on YARN-4387:
--

Github user vesense closed the pull request at:

https://github.com/apache/hadoop/pull/57


> Fix FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4387) Fix FairScheduler log message

2015-11-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023977#comment-15023977
 ] 

ASF GitHub Bot commented on YARN-4387:
--

Github user vesense commented on the pull request:

https://github.com/apache/hadoop/pull/57#issuecomment-159192789
  
Reported the issue to JIRA: https://issues.apache.org/jira/browse/YARN-4387
So, close this PR.


> Fix FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4567) javadoc failing on java 8

2016-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093370#comment-15093370
 ] 

ASF GitHub Bot commented on YARN-4567:
--

Github user asfgit closed the pull request at:

https://github.com/apache/hadoop/pull/67


> javadoc failing on java 8
> -
>
> Key: YARN-4567
> URL: https://issues.apache.org/jira/browse/YARN-4567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: 67.patch
>
>
> Javadocs on Java 8 failing in the Yarn bit of the build. Jenkins is sad.
> {code}[ERROR] 
> /Users/stevel/Projects/Hortonworks/Projects/hadoop-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java:1009:
>  error: exception not thrown: java.lang.Exception
> [ERROR] * @throws Exception
> [ERROR] ^
> [ERROR] 
> [ERROR] Command line was: 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/bin/javadoc 
> @options @packages
> [ERROR] 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1564) add some basic workflow YARN services

2016-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089577#comment-15089577
 ] 

ASF GitHub Bot commented on YARN-1564:
--

GitHub user steveloughran opened a pull request:

https://github.com/apache/hadoop/pull/65

YARN-1564 add some basic workflow YARN services

YARN-1564 add some basic workflow YARN services

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/hadoop 
stevel/YARN-1564-workflow-services

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/65.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #65


commit d9f0172abfd904bad80e9e245fc792560a874834
Author: Steve Loughran 
Date:   2014-06-03T15:46:54Z

YARN-1564 - Patch-001

commit b0101e65cb1e6ed4171212f78c6d75ac855ccb5c
Author: Steve Loughran 
Date:   2014-12-04T15:30:52Z

YARN-1564 sync up with slider tweaks, primarily the disabling of tests on 
windows which only work if the relevant external commands are on the path

commit 8f262578c67edc4e6cbdef3e9bc338dabcd2ecf4
Author: Steve Loughran 
Date:   2015-05-05T12:56:14Z

YARN-1564 review and update workflow services, including making original 
CompositeService a ServiceParent; moving to Java 7 and SLF4J everywhere

commit 38df053d60467b587c0c7ec03f3368a84bf1cfea
Author: Steve Loughran 
Date:   2015-05-06T14:06:43Z

YARN-1564 -pick up enhancements/fixes from Slider-0.70-incubating versions 
of these classes

commit bf608f92a1c791c28fabf84dc2cca97daf35e1cf
Author: Steve Loughran 
Date:   2016-01-08T17:35:52Z

YARN-1564 turn off the findbugs warnings that are very much wrong




> add some basic workflow YARN services
> -
>
> Key: YARN-1564
> URL: https://issues.apache.org/jira/browse/YARN-1564
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-1564-001.patch, YARN-1564-002.patch
>
>   Original Estimate: 24h
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> I've been using some alternative composite services to help build workflows 
> of process execution in a YARN AM.
> They and their tests could be moved in YARN for the use by others -this would 
> make it easier to build aggregate services in an AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4563) ContainerMetrics deadlocks

2016-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126383#comment-15126383
 ] 

ASF GitHub Bot commented on YARN-4563:
--

GitHub user steveloughran opened a pull request:

https://github.com/apache/hadoop/pull/72

YARN-4563

Attempt to document YARN security, including HADOOP_TOKEN_FILE_LOCATION 
propagation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/hadoop 
HADOOP-12649-security/YARN-4653-yarn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/72.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #72


commit 73baa11ff74201faa56ce1fc18941bdad43263fe
Author: Steve Loughran 
Date:   2016-01-28T20:04:14Z

YARN-4653 document YARN security: first pass

commit 778f623f7c436a975a1020d8a1eea55b67a630bf
Author: Steve Loughran 
Date:   2016-01-29T20:04:53Z

YARN-4653 document YARN security: more, though more is needed

commit 6b4ce5fa7ed6a83e471d14994bd26aa01bf37552
Author: Steve Loughran 
Date:   2016-02-01T15:20:35Z

YARN-4653 document YARN security with instructions on propagating oozie 
credentials




> ContainerMetrics deadlocks
> --
>
> Key: YARN-4563
> URL: https://issues.apache.org/jira/browse/YARN-4563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: HDP 2.3.2 (Hadoop 2.7.1 + patches)
>Reporter: Akira AJISAKA
>Priority: Blocker
> Attachments: 0001-YARN-4563.patch, jstack.log
>
>
> On one of our environment, some NodeManagers' webapp do not working. I found 
> a dead lock in the thread dump.
> {noformat}
> Found one Java-level deadlock:
> =
> "1193752357@qtp-907815246-22238":
>   waiting to lock monitor 0x05e20a18 (object 0xf6afa048, a 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics),
>   which is held by "2107307914@qtp-907815246-19994"
> "2107307914@qtp-907815246-19994":
>   waiting to lock monitor 0x01a000a8 (object 0xd4f1e1f8, a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl),
>   which is held by "Timer for 'NodeManager' metrics system"
> "Timer for 'NodeManager' metrics system":
>   waiting to lock monitor 0x027ade88 (object 0xf6582df0, a 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics),
>   which is held by "1530638165@qtp-907815246-19992"
> "1530638165@qtp-907815246-19992":
>   waiting to lock monitor 0x01a000a8 (object 0xd4f1e1f8, a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl),
>   which is held by "Timer for 'NodeManager' metrics system"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service

2016-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355358#comment-15355358
 ] 

ASF GitHub Bot commented on YARN-679:
-

Github user steveloughran closed the pull request at:

https://github.com/apache/hadoop/pull/68


> add an entry point that can start any Yarn service
> --
>
> Key: YARN-679
> URL: https://issues.apache.org/jira/browse/YARN-679
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-679-001.patch, YARN-679-002.patch, 
> YARN-679-002.patch, YARN-679-003.patch, YARN-679-004.patch, 
> YARN-679-005.patch, YARN-679-006.patch, YARN-679-007.patch, 
> YARN-679-008.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf
>
>  Time Spent: 72h
>  Remaining Estimate: 0h
>
> There's no need to write separate .main classes for every Yarn service, given 
> that the startup mechanism should be identical: create, init, start, wait for 
> stopped -with an interrupt handler to trigger a clean shutdown on a control-c 
> interrupt.
> Provide one that takes any classname, and a list of config files/options



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847821#comment-15847821
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r98808165
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -819,21 +820,29 @@ public void handle(RMFatalEvent event) {
 }
   }
 
-  public void handleTransitionToStandBy() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
+  /**
+   * Transition to standby in a new thread.
+   */
+  public void handleTransitionToStandByInNewThread() {
+new Thread() {
--- End diff --

Instead of using an anonymous class, can we define this as a separate 
Thread and name it for easier debugging? 


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847820#comment-15847820
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r98808465
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
 ---
@@ -349,4 +356,94 @@ static String getRefreshURL(String url) {
 }
 return redirectUrl;
   }
+
+  /**
+   * Throw {@link RuntimeException} inside a thread of
+   * {@link ResourceManager} with HA enabled and check if the
+   * {@link ResourceManager} is transited to standby state.
+   *
+   * @throws InterruptedException if any
+   */
+  @Test
+  public void testUncaughtExceptionHandlerWithHAEnabled()
--- End diff --

Nice tests!


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848632#comment-15848632
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r98942361
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) {
* Transition to standby in a new thread.
*/
   public void handleTransitionToStandByInNewThread() {
-new Thread() {
-  @Override
-  public void run() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
-}
-  } catch (Exception e) {
-LOG.fatal("Failed to transition RM to Standby mode.", e);
-ExitUtil.terminate(1, e);
+Thread standByTransitionThread = new Thread(new 
StandByTransitionThread());
+standByTransitionThread.setName("StandByTransitionThread Handler");
+standByTransitionThread.start();
+  }
+
+  private class StandByTransitionThread implements Runnable {
--- End diff --

Naming the Runnable a Thread sounds confusing. Can we change it to 
TransitionToStandbyRunnable or some such? 


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848631#comment-15848631
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r98942179
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) {
* Transition to standby in a new thread.
*/
   public void handleTransitionToStandByInNewThread() {
-new Thread() {
-  @Override
-  public void run() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
-}
-  } catch (Exception e) {
-LOG.fatal("Failed to transition RM to Standby mode.", e);
-ExitUtil.terminate(1, e);
+Thread standByTransitionThread = new Thread(new 
StandByTransitionThread());
--- End diff --

Also, would it make sense to create an instance of the Runnable on 
transition to active, and start a new thread on a need-to basis. If all threads 
use a single instance of the Runnable, may be it is easier to coordinate?


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848630#comment-15848630
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r98941743
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -824,25 +824,29 @@ public void handle(RMFatalEvent event) {
* Transition to standby in a new thread.
*/
   public void handleTransitionToStandByInNewThread() {
-new Thread() {
-  @Override
-  public void run() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
-}
-  } catch (Exception e) {
-LOG.fatal("Failed to transition RM to Standby mode.", e);
-ExitUtil.terminate(1, e);
+Thread standByTransitionThread = new Thread(new 
StandByTransitionThread());
--- End diff --

Sorry for not identifying this earlier. We should make this thread-safe in 
case this is triggered by two critical threads failing at the same time. 




> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856917#comment-15856917
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99943541
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
 ---
@@ -2036,10 +2036,10 @@ public void testPreemptionIsNotDelayedToNextRound() 
throws Exception {
 .getLeafQueue("queueA.queueA2", false), clock.getTime());
 assertEquals(3277, toPreempt.getMemorySize());
 
-// verify if the 3 containers required by queueA2 are preempted in the 
same
+// verify if the 4 containers required by queueA2 are preempted in the 
same
 // round
 scheduler.preemptResources(toPreempt);
-assertEquals(3, 
scheduler.getSchedulerApp(app1).getPreemptionContainers()
+assertEquals(4, 
scheduler.getSchedulerApp(app1).getPreemptionContainers()
 .size());
   }
--- End diff --

Can we add a new test that verifies the exact scenario in the JIRA 
description? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856915#comment-15856915
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99942869
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
+assert parent != null;
--- End diff --

Why is this necessary? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856953#comment-15856953
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99948742
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java
 ---
@@ -0,0 +1,60 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.resourcemanager;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
+import org.apache.hadoop.classification.InterfaceStability.Evolving;
+import org.apache.hadoop.yarn.conf.HAUtil;
+
+/**
+ * This class either shutdowns {@link ResourceManager} or makes
--- End diff --

- s/shutdowns/shuts down
- s/makes RM transition/ transitions the RM
- s/if any uncaught exception.../if a critical thread throws an uncaught 
exception. 


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856954#comment-15856954
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99949581
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
 }
   }
 
-  public void handleTransitionToStandBy() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
+  /**
+   * Transition to standby in a new thread.
+   */
+  public void handleTransitionToStandByInNewThread() {
+Thread standByTransitionThread =
+new Thread(activeServices.standByTransitionRunnable);
+standByTransitionThread.setName("StandByTransitionThread");
+standByTransitionThread.start();
+  }
+
+  private class StandByTransitionRunnable implements Runnable {
--- End diff --

Let us add javadoc for this class, and include details on how we use the 
same runnable.


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856952#comment-15856952
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99949771
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
 }
   }
 
-  public void handleTransitionToStandBy() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
+  /**
+   * Transition to standby in a new thread.
+   */
+  public void handleTransitionToStandByInNewThread() {
+Thread standByTransitionThread =
+new Thread(activeServices.standByTransitionRunnable);
+standByTransitionThread.setName("StandByTransitionThread");
+standByTransitionThread.start();
+  }
+
+  private class StandByTransitionRunnable implements Runnable {
+private AtomicBoolean hasRun = new AtomicBoolean(false);
+
+@Override
+public void run() {
+  // Prevent from running again if it has run.
--- End diff --

Add more detail here: "Run this only once, even if multiple threads end up 
triggering this simultaneously."


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857004#comment-15857004
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953469
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -191,4 +164,12 @@ public abstract boolean checkIfUsageOverFairShare(
   public abstract Resource getHeadroom(Resource queueFairShare,
   Resource queueUsage, Resource maxAvailable);
 
+  /**
+   * Check whether the policy of a child queue are allowed.
--- End diff --

s/are/is


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857003#comment-15857003
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99954456
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -91,20 +91,22 @@ public FSQueue(String name, FairScheduler scheduler, 
FSParentQueue parent) {
 this.queueEntity = new PrivilegedEntity(EntityType.QUEUE, name);
 this.metrics = FSQueueMetrics.forQueue(getName(), parent, true, 
scheduler.getConf());
 this.parent = parent;
+
setPolicy(scheduler.getAllocationConfiguration().getSchedulingPolicy(name));
 reinit(false);
   }
 
   /**
* Initialize a queue by setting its queue-specific properties and its
-   * metrics.
+   * metrics. This function don't set the policy for queues since there is
--- End diff --

s/function/method - there is one other instance of this in the javadoc

s/don't/does not

Instead of saying there is different logic, can we call out what method 
does that for easier code navigability? And, it might be worth mentioning why 
that logic is separated, either here or at the other method.


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857007#comment-15857007
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99955463
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingPolicy.java
 ---
@@ -79,66 +79,6 @@ public void testParseSchedulingPolicy()
   }
 
   /**
-   * Trivial tests that make sure
-   * {@link SchedulingPolicy#isApplicableTo(SchedulingPolicy, byte)} works 
as
-   * expected for the possible values of depth
-   * 
-   * @throws AllocationConfigurationException
-   */
-  @Test(timeout = 1000)
-  public void testIsApplicableTo() throws AllocationConfigurationException 
{
--- End diff --

Are all the cases in this test covered by other tests added here? If not, 
can we keep the test, maybe rename it, and capture the cases that are not 
covered? 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857001#comment-15857001
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953850
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 ---
@@ -175,7 +179,13 @@ public boolean checkIfUsageOverFairShare(Resource 
usage, Resource fairShare) {
   }
 
   @Override
-  public byte getApplicableDepth() {
-return SchedulingPolicy.DEPTH_ANY;
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy) {
+if (childPolicy instanceof DominantResourceFairnessPolicy) {
+  LOG.info("Queue policies can't be " + 
DominantResourceFairnessPolicy.NAME
--- End diff --

s/policies/policy


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857002#comment-15857002
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953234
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 ---
@@ -21,10 +21,13 @@
 import java.util.Collection;
 import java.util.Comparator;
 
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience.Private;
 import org.apache.hadoop.classification.InterfaceStability.Unstable;
 import org.apache.hadoop.yarn.api.records.Resource;
 import org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException;
--- End diff --

Unused import. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857000#comment-15857000
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99955335
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
 ---
@@ -5096,4 +5097,178 @@ public void testUpdateDemand() throws IOException {
 Resources.equals(bQueue.getDemand(), maxResource));
   }
 
+  @Test
+  public void testSchedulingPolicyViolation() throws IOException {
--- End diff --

TestFairScheduler is awfully long. Can we please add these methods 
elsewhere? TestSchedulingPolicy and TestQueueManager are potential candidates. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857005#comment-15857005
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99954759
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -463,4 +461,33 @@ boolean fitsInMaxShare(Resource additionalResource) {
 }
 return true;
   }
+
+  /**
+   * Recursively check policies for queues in pre-order. Get queue policies
+   * from the allocation file instead of properties of {@link FSQueue} 
objects.
+   * Set the policy for current queue if there is no policy violation for 
its
+   * children.
+   *
+   * @param queueConf allocation configuration
+   * @return true if no policy violation and successfully set polices
+   * for queues; false otherwise
+   */
+  public boolean verifyAndSetPolicyFromConf(AllocationConfiguration 
queueConf) {
--- End diff --

It might be worthwhile to point out the intended caller for this method. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857006#comment-15857006
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99954044
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 ---
@@ -175,7 +179,13 @@ public boolean checkIfUsageOverFairShare(Resource 
usage, Resource fairShare) {
   }
 
   @Override
-  public byte getApplicableDepth() {
-return SchedulingPolicy.DEPTH_ANY;
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy) {
+if (childPolicy instanceof DominantResourceFairnessPolicy) {
+  LOG.info("Queue policies can't be " + 
DominantResourceFairnessPolicy.NAME
+  + " if the parent policy is " + getName() + ". Please choose "
+  + "other polices for child queues instead.");
--- End diff --

IMO, we should either (1) not say anything about other policies or (2) list 
the policies that are allowed. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857008#comment-15857008
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953540
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -191,4 +164,12 @@ public abstract boolean checkIfUsageOverFairShare(
   public abstract Resource getHeadroom(Resource queueFairShare,
   Resource queueUsage, Resource maxAvailable);
 
+  /**
+   * Check whether the policy of a child queue are allowed.
+   *
+   * @param childPolicy the policy of child queue
+   */
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy) {
--- End diff --

Like that we are adding a non-abstract method. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856951#comment-15856951
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99949838
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
 }
   }
 
-  public void handleTransitionToStandBy() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
+  /**
+   * Transition to standby in a new thread.
+   */
+  public void handleTransitionToStandByInNewThread() {
+Thread standByTransitionThread =
+new Thread(activeServices.standByTransitionRunnable);
+standByTransitionThread.setName("StandByTransitionThread");
+standByTransitionThread.start();
+  }
+
+  private class StandByTransitionRunnable implements Runnable {
+private AtomicBoolean hasRun = new AtomicBoolean(false);
--- End diff --

Maybe, rename this to hasAlreadyRun? And, again add some javadoc here too? 


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856921#comment-15856921
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99944929
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
--- End diff --

It should be, but allowPreemptionFrom is introduced after 2.8.x. 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856916#comment-15856916
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99943676
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
--- End diff --

Should the check of the allowPreemptionFrom flag also be part of this 
method? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857120#comment-15857120
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99967724
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
--- End diff --

Aah, I keep forgetting branch-2.8 was cut years ago. :(


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857129#comment-15857129
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99968314
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -242,7 +244,9 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
* @return true if the queue can be preempted
*/
   public boolean canBePreempted() {
-assert parent != null;
+Preconditions.checkNotNull(parent, "Parent queue can't be null since"
--- End diff --

Maybe, we could make this message more clear. "Parent queue is null. Looks 
like we are checking if root can be preempted."

Alternatively, can we make the if check (parent != null && ...)? That way, 
else would capture the null case and things should work fine? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857130#comment-15857130
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99967958
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
 ---
@@ -2043,6 +2043,78 @@ public void testPreemptionIsNotDelayedToNextRound() 
throws Exception {
 .size());
   }
 
+  @Test
+  public void testPreemptionFilterOutNonPreemptableQueues() throws 
Exception {
--- End diff --

Can we add this test to TestFairSchedulerPreemption instead? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858673#comment-15858673
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r100194505
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingPolicy.java
 ---
@@ -79,66 +79,6 @@ public void testParseSchedulingPolicy()
   }
 
   /**
-   * Trivial tests that make sure
-   * {@link SchedulingPolicy#isApplicableTo(SchedulingPolicy, byte)} works 
as
-   * expected for the possible values of depth
-   * 
-   * @throws AllocationConfigurationException
-   */
-  @Test(timeout = 1000)
-  public void testIsApplicableTo() throws AllocationConfigurationException 
{
--- End diff --

Add a new test case to check if fifo policy is only for leaf queues.


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858672#comment-15858672
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r100194434
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -463,4 +461,33 @@ boolean fitsInMaxShare(Resource additionalResource) {
 }
 return true;
   }
+
+  /**
+   * Recursively check policies for queues in pre-order. Get queue policies
+   * from the allocation file instead of properties of {@link FSQueue} 
objects.
+   * Set the policy for current queue if there is no policy violation for 
its
+   * children.
+   *
+   * @param queueConf allocation configuration
+   * @return true if no policy violation and successfully set polices
+   * for queues; false otherwise
+   */
+  public boolean verifyAndSetPolicyFromConf(AllocationConfiguration 
queueConf) {
--- End diff --

Fixed.


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854851#comment-15854851
 ] 

ASF GitHub Bot commented on YARN-6151:
--

GitHub user flyrain opened a pull request:

https://github.com/apache/hadoop/pull/188

YARN-6151. FS Preemption doesn't filter out queues which cannot be pr…

…eempted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/flyrain/hadoop branch-2.8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #188


commit 5691fdff2b400571bb4669daf6e257910db9
Author: Yufei Gu 
Date:   2017-02-06T20:57:05Z

YARN-6151. FS Preemption doesn't filter out queues which cannot be 
preempted.




> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847477#comment-15847477
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r98765429
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMCriticalThreadUncaughtExceptionHandler.java
 ---
@@ -0,0 +1,75 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.resourcemanager;
+
+import org.apache.hadoop.util.ExitUtil;
+import org.apache.hadoop.yarn.event.AsyncDispatcher;
+import org.junit.Test;
+
+import static org.junit.Assert.assertSame;
+import static org.mockito.Mockito.spy;
+import static org.mockito.Mockito.verify;
+
+/**
+ * This class is to test {@link RMCriticalThreadUncaughtExceptionHandler}.
+ */
+public class TestRMCriticalThreadUncaughtExceptionHandler {
+  /**
+   * Throw {@link RuntimeException} inside thread and
+   * check {@link RMCriticalThreadUncaughtExceptionHandler} instance.
+   *
+   * Used {@link ExitUtil} class to avoid jvm exit through
+   * {@code System.exit(-1)}.
+   *
+   * @throws InterruptedException if any
+   */
+  @Test
+  public void testUncaughtExceptionHandlerWithError()
+  throws InterruptedException {
+ExitUtil.disableSystemExit();
+
+// Create a MockRM and start it
+ResourceManager resourceManager = new MockRM();
--- End diff --

Add one unit test to check if RM transition to standby.


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5830) Avoid preempting AM containers

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836438#comment-15836438
 ] 

ASF GitHub Bot commented on YARN-5830:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/180#discussion_r97617498
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
 ---
@@ -119,39 +116,81 @@ public void run() {
 continue;
   }
 
-  // Figure out list of containers to consider
-  List containersToCheck =
-  node.getCopiedListOfRunningContainers();
-  containersToCheck.removeAll(node.getContainersForPreemption());
-
-  // Initialize potential with unallocated resources
-  Resource potential = Resources.clone(node.getUnallocatedResource());
-  for (RMContainer container : containersToCheck) {
-FSAppAttempt app =
-scheduler.getSchedulerApp(container.getApplicationAttemptId());
-
-if (app.canContainerBePreempted(container)) {
-  // Flag container for preemption
-  containers.add(container);
-  Resources.addTo(potential, container.getAllocatedResource());
+  int maxAMContainers = bestContainers == null ?
+  Integer.MAX_VALUE : bestContainers.numAMContainers;
+  PreemptableContainers preemptableContainers =
+  identifyContainersToPreemptOnNode(requestCapability, node,
+  maxAMContainers);
+  if (preemptableContainers != null) {
+if (preemptableContainers.numAMContainers == 0) {
+  return preemptableContainers;
+} else {
+  bestContainers = preemptableContainers;
 }
+  }
+}
 
-// Check if we have already identified enough containers
-if (Resources.fitsIn(requestCapability, potential)) {
-  // Mark the containers as being considered for preemption on the 
node.
-  // Make sure the containers are subsequently removed by calling
-  // FSSchedulerNode#removeContainerForPreemption.
-  node.addContainersForPreemption(containers);
-  return containers;
-} else {
-  // TODO (YARN-5829): Unreserve the node for the starved app.
+return bestContainers;
+  }
+
+  /**
+   * Identify containers to preempt on a given node. Try to find a list 
with
+   * least AM containers to avoid preempt AM containers. This method 
returns a
--- End diff --

s/preempt/preempting


> Avoid preempting AM containers
> --
>
> Key: YARN-5830
> URL: https://issues.apache.org/jira/browse/YARN-5830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-5830.001.patch, YARN-5830.002.patch, 
> YARN-5830.003.patch, YARN-5830.004.patch, YARN-5830.005.patch, 
> YARN-5830.006.patch
>
>
> While considering containers for preemption, avoid AM containers unless 
> absolutely necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836571#comment-15836571
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97623212
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ---
@@ -45,7 +45,7 @@
 /**
  * Maintains a list of queues as well as scheduling parameters for each 
queue,
  * such as guaranteed share allocations, from the fair scheduler config 
file.
- * 
+ *
--- End diff --

Should we drop this line altogether? 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836573#comment-15836573
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97641986
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
 ---
@@ -132,4 +133,12 @@ public Resource getHeadroom(Resource queueFairShare,
   public byte getApplicableDepth() {
 return SchedulingPolicy.DEPTH_LEAF;
   }
+
+  @Override
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy)
+  throws AllocationConfigurationException {
+throw new AllocationConfigurationException(getName() + " policy is 
only for"
--- End diff --

return false


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836575#comment-15836575
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97641888
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 ---
@@ -178,4 +179,15 @@ public boolean checkIfUsageOverFairShare(Resource 
usage, Resource fairShare) {
   public byte getApplicableDepth() {
 return SchedulingPolicy.DEPTH_ANY;
   }
+
+  @Override
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy)
+  throws AllocationConfigurationException {
+if (childPolicy instanceof DominantResourceFairnessPolicy) {
--- End diff --

Based on my earlier comment, this should be `return ! (childPolicy 
instanceof DominantResourceFairnessPolicy`


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836577#comment-15836577
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97640737
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ---
@@ -272,6 +272,16 @@ private FSQueue createNewQueues(FSQueueType queueType,
   FSParentQueue newParent = null;
   String queueName = i.next();
 
+  // Check if child policy is allowed
--- End diff --

Should this check be in setPolicy or the FSQueue constructor instead? 

For instance, FSLeafQueue#setPolicy already checks if the level is 
appropriate. This brings up another point - do we need this check of 
parent-child policies AND the depth? Should we get rid of depth either in this 
JIRA or a follow-up? 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836578#comment-15836578
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97642235
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
 ---
@@ -305,4 +305,24 @@ public void recoverContainer(Resource clusterResource,
 // TODO Auto-generated method stub
 
   }
+
+  /**
+   * Recursively check policies for queues in pre-order. Get queue policies
+   * from the allocation file instead of properties of {@link FSQueue} 
objects.
+   *
+   * @param queueConf allocation configuration
+   * @throws AllocationConfigurationException if there is any policy 
violation
+   */
+  public void checkPoliciesFromConf(AllocationConfiguration queueConf)
--- End diff --

Can this be verifyAndSetPolicyFromConf, and part of FSQueue instead of 
FSParentQueue. 

In that case, we will not need a separate call to setPolicy. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5830) Avoid preempting AM containers

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836439#comment-15836439
 ] 

ASF GitHub Bot commented on YARN-5830:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/180#discussion_r97618432
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java
 ---
@@ -285,4 +309,43 @@ public void testNoPreemptionFromDisallowedQueue() 
throws Exception {
 submitApps("root.nonpreemptable.child-1", "root.preemptable.child-1");
 verifyNoPreemption();
   }
+
+  /**
+   * Set the number of AM containers for each node.
+   *
+   * @param numAMContainersPerNode number of AM containers per node
+   */
+  private void setNumAmContainersPerNode(int numAMContainersPerNode) {
--- End diff --

Nit: I would prefer setNumAMContainersPerNode.


> Avoid preempting AM containers
> --
>
> Key: YARN-5830
> URL: https://issues.apache.org/jira/browse/YARN-5830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-5830.001.patch, YARN-5830.002.patch, 
> YARN-5830.003.patch, YARN-5830.004.patch, YARN-5830.005.patch, 
> YARN-5830.006.patch
>
>
> While considering containers for preemption, avoid AM containers unless 
> absolutely necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836576#comment-15836576
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97641755
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -191,4 +191,13 @@ public abstract boolean checkIfUsageOverFairShare(
   public abstract Resource getHeadroom(Resource queueFairShare,
   Resource queueUsage, Resource maxAvailable);
 
+  /**
+   * Check whether the policy of a child queue are allowed.
+   *
+   * @param childPolicy the policy of child queue
+   */
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy)
+  throws AllocationConfigurationException {
--- End diff --

It makes sense for this method to be boolean. I am not sure we should be 
throwing an exception here. The caller can decide that based on the return 
value. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836572#comment-15836572
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97641557
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ---
@@ -282,6 +292,13 @@ private FSQueue createNewQueues(FSQueueType queueType,
 queue = newParent;
   }
 
+  try {
+policy.initialize(scheduler.getClusterResource());
+queue.setPolicy(policy);
--- End diff --

setPolicy should likely be called in the constructor. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836574#comment-15836574
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97641519
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ---
@@ -282,6 +292,13 @@ private FSQueue createNewQueues(FSQueueType queueType,
 queue = newParent;
   }
 
+  try {
+policy.initialize(scheduler.getClusterResource());
--- End diff --

Initialize should likely go to setPolicy


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839013#comment-15839013
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97920043
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
 ---
@@ -305,4 +305,24 @@ public void recoverContainer(Resource clusterResource,
 // TODO Auto-generated method stub
 
   }
+
+  /**
+   * Recursively check policies for queues in pre-order. Get queue policies
+   * from the allocation file instead of properties of {@link FSQueue} 
objects.
+   *
+   * @param queueConf allocation configuration
+   * @throws AllocationConfigurationException if there is any policy 
violation
+   */
+  public void checkPoliciesFromConf(AllocationConfiguration queueConf)
--- End diff --

Good idea! 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839007#comment-15839007
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97919802
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ---
@@ -272,6 +272,16 @@ private FSQueue createNewQueues(FSQueueType queueType,
   FSParentQueue newParent = null;
   String queueName = i.next();
 
+  // Check if child policy is allowed
--- End diff --

Yes,  my original thought is to do that in another JIRA. The depth and 
parent-child policy are not the same. It mighty a good idea to combine them 
since the logic of depth checking only prevent fifo policy to be non-leaf 
queue. The current implementation seems a bit heavy. I can do it in this JIRA.


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839009#comment-15839009
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97919979
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ---
@@ -282,6 +292,13 @@ private FSQueue createNewQueues(FSQueueType queueType,
 queue = newParent;
   }
 
+  try {
+policy.initialize(scheduler.getClusterResource());
--- End diff --

My first thought of this one is similar to the depth checking, planned to 
refactor it in next JIRA. Another question in my mind is - do we need to 
initialize policy every time setting the policy?


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836926#comment-15836926
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r97685895
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
 ---
@@ -45,7 +45,7 @@
 /**
  * Maintains a list of queues as well as scheduling parameters for each 
queue,
  * such as guaranteed share allocations, from the fair scheduler config 
file.
- * 
+ *
--- End diff --

Sure.


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869392#comment-15869392
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user kambatla closed the pull request at:

https://github.com/apache/hadoop/pull/192


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Fix For: 2.9.0
>
> Attachments: YARN-6163.004.patch, YARN-6163.005.patch, 
> YARN-6163.006.patch, yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869391#comment-15869391
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user kambatla commented on the issue:

https://github.com/apache/hadoop/pull/192
  
Committed this to trunk and branch-2.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Fix For: 2.9.0
>
> Attachments: YARN-6163.004.patch, YARN-6163.005.patch, 
> YARN-6163.006.patch, yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872805#comment-15872805
 ] 

ASF GitHub Bot commented on YARN-6194:
--

GitHub user flyrain opened a pull request:

https://github.com/apache/hadoop/pull/196

YARN-6194. Cluster capacity in SchedulingPolicy is updated only on 
allocation file reload

This patch passes the ClusterNodeTracker instead of ClusterResource into 
the DRF policy.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/flyrain/hadoop yarn-6194

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #196


commit 78c6303b15d5531fc0bd22331a0720803ad5c416
Author: Yufei Gu 
Date:   2017-02-17T23:57:11Z

YARN-6194. Cluster capacity in SchedulingPolicy is updated only on 
allocation file reload.




> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879096#comment-15879096
 ] 

ASF GitHub Bot commented on YARN-6194:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/196#discussion_r102555178
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/TestDominantResourceFairnessPolicy.java
 ---
@@ -40,7 +43,10 @@
   private Comparator createComparator(int clusterMem,
   int clusterCpu) {
 DominantResourceFairnessPolicy policy = new 
DominantResourceFairnessPolicy();
-policy.initialize(BuilderUtils.newResource(clusterMem, clusterCpu));
+FSContext fsContext = mock(FSContext.class);
+when(fsContext.getClusterResource()).
+thenReturn(BuilderUtils.newResource(clusterMem, clusterCpu));
--- End diff --

Let us use Resources.create instead of BuilterUtils.newInstance. 


> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879094#comment-15879094
 ] 

ASF GitHub Bot commented on YARN-6194:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/196#discussion_r102556457
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
 ---
@@ -3369,7 +3369,48 @@ public void testBasicDRFWithQueues() throws 
Exception {
 scheduler.handle(updateEvent);
 Assert.assertEquals(1, app2.getLiveContainers().size());
   }
-  
+
+  @Test
+  public void testDRFWithClusterResourceChanges() throws Exception {
--- End diff --

Using a "real" scheduler with mock nodes seems excessive for this. Can we 
just mock the scheduler and context? Also, this might be a good test for 
TestDRF than TestFairScheduler.


> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879093#comment-15879093
 ] 

ASF GitHub Bot commented on YARN-6194:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/196#discussion_r102554581
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSContext.java
 ---
@@ -27,28 +29,37 @@
   private boolean preemptionEnabled = false;
   private float preemptionUtilizationThreshold;
   private FSStarvedApps starvedApps;
+  private FairScheduler scheduler;
+
+  FSContext(FairScheduler scheduler) {
+this.scheduler = scheduler;
+  }
 
-  public boolean isPreemptionEnabled() {
+  boolean isPreemptionEnabled() {
 return preemptionEnabled;
   }
 
-  public void setPreemptionEnabled() {
+  void setPreemptionEnabled() {
 this.preemptionEnabled = true;
 if (starvedApps == null) {
   starvedApps = new FSStarvedApps();
 }
   }
 
-  public FSStarvedApps getStarvedApps() {
+  FSStarvedApps getStarvedApps() {
 return starvedApps;
   }
 
-  public float getPreemptionUtilizationThreshold() {
+  float getPreemptionUtilizationThreshold() {
 return preemptionUtilizationThreshold;
   }
 
-  public void setPreemptionUtilizationThreshold(
+  void setPreemptionUtilizationThreshold(
   float preemptionUtilizationThreshold) {
 this.preemptionUtilizationThreshold = preemptionUtilizationThreshold;
   }
+
+  public Resource getClusterResource() {
+return scheduler.getClusterResource();
--- End diff --

This looks okay for now, but this allows a scheduling policy to modify the 
overall cluster's resources. Making a copy could be expensive, as this is 
called on every compare call. 


> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879122#comment-15879122
 ] 

ASF GitHub Bot commented on YARN-6194:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/196#discussion_r102559501
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -92,7 +92,7 @@ public static SchedulingPolicy parse(String policy)
 return getInstance(clazz);
   }
   
-  public void initialize(Resource clusterCapacity) {}
+  public void initialize(FSContext fsContext) {}
--- End diff --

Since this method is in an @Public class, let us add a new method and 
deprecate the old method.

Let us also add javadoc for both methods. 


> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879240#comment-15879240
 ] 

ASF GitHub Bot commented on YARN-6194:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/196#discussion_r102573113
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -91,7 +91,23 @@ public static SchedulingPolicy parse(String policy)
 }
 return getInstance(clazz);
   }
-  
+
+  /**
+   * Initialize the scheduling policy with cluster resources. Deprecated 
since
--- End diff --

Want to add a @deprecated tag in the javadoc and point to what to use? 


> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879239#comment-15879239
 ] 

ASF GitHub Bot commented on YARN-6194:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/196#discussion_r102573406
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -91,7 +91,23 @@ public static SchedulingPolicy parse(String policy)
 }
 return getInstance(clazz);
   }
-  
+
+  /**
+   * Initialize the scheduling policy with cluster resources. Deprecated 
since
+   * it doesn't track cluster resource changes.
+   *
+   * @param clusterCapacity cluster resources
+   */
+  @Deprecated
+  public void initialize(Resource clusterCapacity) {}
+
+  /**
+   * Initialize the scheduling policy with a {@link FSContext} object 
which can
--- End diff --

In the future, different policies could use different information from the 
FSContext. Maybe, instead of referring to it as the only thing, say something 
like "FSContext, which has a pointer to the cluster resources among other 
information. 


> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6194) Cluster capacity in SchedulingPolicy is updated only on allocation file reload

2017-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879189#comment-15879189
 ] 

ASF GitHub Bot commented on YARN-6194:
--

Github user flyrain commented on the issue:

https://github.com/apache/hadoop/pull/196
  
Thanks Karthik for the review. Push a new commit for your comments.


> Cluster capacity in SchedulingPolicy is updated only on allocation file reload
> --
>
> Key: YARN-6194
> URL: https://issues.apache.org/jira/browse/YARN-6194
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>
> Some of the {{SchedulingPolicy}} methods need cluster capacity which is set 
> using {{#initialize}} today. However, {{initialize()}} is called only on 
> allocation reload. If nodes are added between reloads, the cluster capacity 
> is not considered until the next reload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6042) Fairscheduler: Dump scheduler state in log

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866465#comment-15866465
 ] 

ASF GitHub Bot commented on YARN-6042:
--

GitHub user flyrain opened a pull request:

https://github.com/apache/hadoop/pull/193

YARN-6042. Fairscheduler: Dump scheduler state in log.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/flyrain/hadoop yarn-6042

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193


commit de6c969973d5b50aa2b461be1560b1e9b80cc9dc
Author: Yufei Gu 
Date:   2017-01-13T01:35:17Z

YARN-6042. Fairscheduler: Dump scheduler state in log.




> Fairscheduler: Dump scheduler state in log
> --
>
> Key: YARN-6042
> URL: https://issues.apache.org/jira/browse/YARN-6042
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6042.001.patch, YARN-6042.002.patch
>
>
> To improve the debugging of scheduler issues it would be a big improvement to 
> be able to dump the scheduler state into a log on request. 
> The Dump the scheduler state at a point in time would allow debugging of a 
> scheduler that is not hung (deadlocked) but also not assigning containers. 
> Currently we do not have a proper overview of what state the scheduler and 
> the queues are in and we have to make assumptions or guess
> The scheduler and queue state needed would include (not exhaustive):
> - instantaneous and steady fair share (app / queue)
> - AM share and resources
> - weight
> - app demand
> - application run state (runnable/non runnable)
> - last time at fair/min share



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860658#comment-15860658
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r100468005
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java
 ---
@@ -0,0 +1,60 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.resourcemanager;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
+import org.apache.hadoop.classification.InterfaceStability.Evolving;
+import org.apache.hadoop.yarn.conf.HAUtil;
+
+/**
+ * This class either shuts down {@link ResourceManager} or transitions the
+ * {@link ResourceManager} to standby state if a critical thread throws an
+ * uncaught exception. It is intended to be installed by calling
+ * {@code setUncaughtExceptionHandler(Thread.UncaughtExceptionHandler)}
+ * in the thread entry point or after creation of threads.
+ */
+@Public
--- End diff --

We don't need to expose this outside YARN at all. This should be @Private. 
Let us remove @Evolving altogether. 


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch, YARN-6061.008.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862041#comment-15862041
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100640868
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -1106,6 +,97 @@ boolean isStarvedForFairShare() {
 return !Resources.isNone(fairshareStarvation);
   }
 
+  /**
+   * Helper method for {@link #getStarvedResourceRequests()}:
+   * Given a map of visited {@link ResourceRequest}s, it checks if
+   * {@link ResourceRequest} 'rr' has already been visited. The map is 
updated
+   * to reflect visiting 'rr'.
+   */
+  private static boolean checkAndMarkRRVisited(
+  Map visitedRRs, ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+if (visitedRRs.containsKey(priority)) {
+  List rrList = visitedRRs.get(priority);
+  if (rrList.contains(capability)) {
+return true;
+  } else {
+rrList.add(capability);
+return false;
+  }
+} else {
+  List newRRList = new ArrayList<>();
+  newRRList.add(capability);
+  visitedRRs.put(priority, newRRList);
+  return false;
+}
+  }
+
+  /**
+   * Fetch a list of RRs corresponding to the extent the app is starved
+   * (fairshare and minshare). This method considers the number of 
containers
+   * in a RR and also only one locality-level (the first encountered
+   * resourceName).
+   *
+   * @return list of {@link ResourceRequest}s corresponding to the amount 
of
+   * starvation.
+   */
+  List getStarvedResourceRequests() {
+List ret = new ArrayList<>();
+Map visitedRRs= new HashMap<>();
+
+Resource pending = getStarvation();
+for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) {
+  if (Resources.isNone(pending)) {
+break;
+  }
+  if (checkAndMarkRRVisited(visitedRRs, rr)) {
+continue;
+  }
+
+  // Compute the number of containers of this capability that fit in 
the
+  // pending amount
+  int ratio = (int) Math.floor(
--- End diff --

This is super obtuse logic.  Please document it thoroughly.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862034#comment-15862034
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100639340
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -1106,6 +,97 @@ boolean isStarvedForFairShare() {
 return !Resources.isNone(fairshareStarvation);
   }
 
+  /**
+   * Helper method for {@link #getStarvedResourceRequests()}:
+   * Given a map of visited {@link ResourceRequest}s, it checks if
+   * {@link ResourceRequest} 'rr' has already been visited. The map is 
updated
+   * to reflect visiting 'rr'.
+   */
+  private static boolean checkAndMarkRRVisited(
+  Map visitedRRs, ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+if (visitedRRs.containsKey(priority)) {
+  List rrList = visitedRRs.get(priority);
+  if (rrList.contains(capability)) {
--- End diff --

Can we assume that resource requests are this unique?


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862036#comment-15862036
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100633456
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
 ---
@@ -114,12 +114,24 @@
   protected static final String PREEMPTION_THRESHOLD =
   CONF_PREFIX + "preemption.cluster-utilization-threshold";
   protected static final float DEFAULT_PREEMPTION_THRESHOLD = 0.8f;
-  
-  protected static final String PREEMPTION_INTERVAL = CONF_PREFIX + 
"preemptionInterval";
-  protected static final int DEFAULT_PREEMPTION_INTERVAL = 5000;
+
   protected static final String WAIT_TIME_BEFORE_KILL = CONF_PREFIX + 
"waitTimeBeforeKill";
   protected static final int DEFAULT_WAIT_TIME_BEFORE_KILL = 15000;
 
+  /**
+   * Configurable delay before an app's starvation is considered after it 
is
+   * identified. This is to give the scheduler enough time to
+   * allocate containers post preemption. This delay is added to the
+   * {@link #WAIT_TIME_BEFORE_KILL} and enough heartbeats.
+   *
+   * This is intended as a backdoor on production clusters, and hence
+   * intentionally not documented.
+   */
+  protected static final String WAIT_TIME_BEFORE_NEXT_STARVATION_CHECK =
--- End diff --

The name and description should include the units.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862033#comment-15862033
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100641581
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -1106,6 +,97 @@ boolean isStarvedForFairShare() {
 return !Resources.isNone(fairshareStarvation);
   }
 
+  /**
+   * Helper method for {@link #getStarvedResourceRequests()}:
+   * Given a map of visited {@link ResourceRequest}s, it checks if
+   * {@link ResourceRequest} 'rr' has already been visited. The map is 
updated
+   * to reflect visiting 'rr'.
+   */
+  private static boolean checkAndMarkRRVisited(
+  Map visitedRRs, ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+if (visitedRRs.containsKey(priority)) {
+  List rrList = visitedRRs.get(priority);
+  if (rrList.contains(capability)) {
+return true;
+  } else {
+rrList.add(capability);
+return false;
+  }
+} else {
+  List newRRList = new ArrayList<>();
+  newRRList.add(capability);
+  visitedRRs.put(priority, newRRList);
+  return false;
+}
+  }
+
+  /**
+   * Fetch a list of RRs corresponding to the extent the app is starved
+   * (fairshare and minshare). This method considers the number of 
containers
+   * in a RR and also only one locality-level (the first encountered
+   * resourceName).
+   *
+   * @return list of {@link ResourceRequest}s corresponding to the amount 
of
+   * starvation.
+   */
+  List getStarvedResourceRequests() {
+List ret = new ArrayList<>();
+Map visitedRRs= new HashMap<>();
+
+Resource pending = getStarvation();
+for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) {
+  if (Resources.isNone(pending)) {
+break;
+  }
+  if (checkAndMarkRRVisited(visitedRRs, rr)) {
+continue;
+  }
+
+  // Compute the number of containers of this capability that fit in 
the
+  // pending amount
+  int ratio = (int) Math.floor(
+  Resources.ratio(scheduler.getResourceCalculator(),
+  pending, rr.getCapability()));
+  if (ratio == 0) {
+continue;
+  }
+
+  // If the RR is only partially being satisfied, include only the
+  // partial number of containers.
+  if (ratio < rr.getNumContainers()) {
+rr = ResourceRequest.newInstance(
+rr.getPriority(), rr.getResourceName(), rr.getCapability(), 
ratio);
+  }
+  ret.add(rr);
+  Resources.subtractFromNonNegative(pending,
--- End diff --

Here you're subtracting rr * floor(pending / rr), which isn't what you want 
if ratio > numContainers.  Should just subtract rr * numContainers in that case.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862035#comment-15862035
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100639608
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -1106,6 +,97 @@ boolean isStarvedForFairShare() {
 return !Resources.isNone(fairshareStarvation);
   }
 
+  /**
+   * Helper method for {@link #getStarvedResourceRequests()}:
+   * Given a map of visited {@link ResourceRequest}s, it checks if
+   * {@link ResourceRequest} 'rr' has already been visited. The map is 
updated
+   * to reflect visiting 'rr'.
+   */
+  private static boolean checkAndMarkRRVisited(
+  Map visitedRRs, ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+if (visitedRRs.containsKey(priority)) {
+  List rrList = visitedRRs.get(priority);
+  if (rrList.contains(capability)) {
+return true;
+  } else {
+rrList.add(capability);
+return false;
+  }
+} else {
+  List newRRList = new ArrayList<>();
+  newRRList.add(capability);
+  visitedRRs.put(priority, newRRList);
+  return false;
+}
+  }
+
+  /**
+   * Fetch a list of RRs corresponding to the extent the app is starved
+   * (fairshare and minshare). This method considers the number of 
containers
+   * in a RR and also only one locality-level (the first encountered
+   * resourceName).
+   *
+   * @return list of {@link ResourceRequest}s corresponding to the amount 
of
+   * starvation.
+   */
+  List getStarvedResourceRequests() {
+List ret = new ArrayList<>();
+Map visitedRRs= new HashMap<>();
+
+Resource pending = getStarvation();
+for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) {
+  if (Resources.isNone(pending)) {
--- End diff --

Why is this check inside the loop?


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862030#comment-15862030
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100643061
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -220,6 +220,62 @@ public void updateInternal(boolean checkStarvation) {
   }
 
   /**
+   * Compute the extent of fairshare starvation for a set of apps.
+   *
+   * @param appsWithDemand apps to compute fairshare starvation for
+   * @return aggregate fairshare starvation for all apps
+   */
+  private Resource updateStarvedAppsFairshare(
+  TreeSet appsWithDemand) {
+Resource fairShareStarvation = Resources.clone(none());
+// Fetch apps with unmet demand sorted by fairshare starvation
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (!Resources.isNone(appStarvation))  {
+context.getStarvedApps().addStarvedApp(app);
+Resources.addTo(fairShareStarvation, appStarvation);
+  } else {
+break;
+  }
+}
+return fairShareStarvation;
+  }
+
+  /**
+   * Distribute minshare starvation to a set of apps
+   * @param appsWithDemand set of apps
+   * @param minShareStarvation minshare starvation to distribute
+   */
+  private void updateStarvedAppsMinshare(
+  TreeSet appsWithDemand, Resource minShareStarvation) {
+// Keep adding apps to the starved list until the unmet demand goes 
over
+// the remaining minshare
+for (FSAppAttempt app : appsWithDemand) {
+  if (!Resources.isNone(minShareStarvation())) {
--- End diff --

Does the value of minShareStarvation() really change in the loop?


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862031#comment-15862031
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100641784
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -1106,6 +,97 @@ boolean isStarvedForFairShare() {
 return !Resources.isNone(fairshareStarvation);
   }
 
+  /**
+   * Helper method for {@link #getStarvedResourceRequests()}:
+   * Given a map of visited {@link ResourceRequest}s, it checks if
+   * {@link ResourceRequest} 'rr' has already been visited. The map is 
updated
+   * to reflect visiting 'rr'.
+   */
+  private static boolean checkAndMarkRRVisited(
+  Map visitedRRs, ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+if (visitedRRs.containsKey(priority)) {
+  List rrList = visitedRRs.get(priority);
+  if (rrList.contains(capability)) {
+return true;
+  } else {
+rrList.add(capability);
+return false;
+  }
+} else {
+  List newRRList = new ArrayList<>();
+  newRRList.add(capability);
+  visitedRRs.put(priority, newRRList);
+  return false;
+}
+  }
+
+  /**
+   * Fetch a list of RRs corresponding to the extent the app is starved
+   * (fairshare and minshare). This method considers the number of 
containers
+   * in a RR and also only one locality-level (the first encountered
+   * resourceName).
+   *
+   * @return list of {@link ResourceRequest}s corresponding to the amount 
of
+   * starvation.
+   */
+  List getStarvedResourceRequests() {
+List ret = new ArrayList<>();
+Map visitedRRs= new HashMap<>();
+
+Resource pending = getStarvation();
+for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) {
+  if (Resources.isNone(pending)) {
+break;
+  }
+  if (checkAndMarkRRVisited(visitedRRs, rr)) {
+continue;
+  }
+
+  // Compute the number of containers of this capability that fit in 
the
+  // pending amount
+  int ratio = (int) Math.floor(
+  Resources.ratio(scheduler.getResourceCalculator(),
+  pending, rr.getCapability()));
+  if (ratio == 0) {
+continue;
+  }
+
+  // If the RR is only partially being satisfied, include only the
+  // partial number of containers.
+  if (ratio < rr.getNumContainers()) {
+rr = ResourceRequest.newInstance(
+rr.getPriority(), rr.getResourceName(), rr.getCapability(), 
ratio);
+  }
+  ret.add(rr);
+  Resources.subtractFromNonNegative(pending,
+  Resources.multiply(rr.getCapability(), ratio));
+}
+
+return ret;
+  }
+
+  /**
+   * Notify this app that preemption has been triggered to make room for
+   * outstanding demand. The app should not be considered starved until 
after
+   * the specified delay.
+   *
+   * @param delayBeforeNextStarvationCheck duration to wait
+   */
+  void preemptionTriggered(long delayBeforeNextStarvationCheck) {
+nextStarvationCheck =
+scheduler.getClock().getTime() + delayBeforeNextStarvationCheck;
+  }
+
+  /**
+   * Whether this app's starvation should be considered.
+   */
+  boolean shouldCheckForStarvation() {
+long now = scheduler.getClock().getTime();
+return now > nextStarvationCheck;
--- End diff --

\>= ?


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862040#comment-15862040
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100647513
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -220,6 +220,62 @@ public void updateInternal(boolean checkStarvation) {
   }
 
   /**
+   * Compute the extent of fairshare starvation for a set of apps.
+   *
+   * @param appsWithDemand apps to compute fairshare starvation for
+   * @return aggregate fairshare starvation for all apps
+   */
+  private Resource updateStarvedAppsFairshare(
+  TreeSet appsWithDemand) {
+Resource fairShareStarvation = Resources.clone(none());
+// Fetch apps with unmet demand sorted by fairshare starvation
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (!Resources.isNone(appStarvation))  {
+context.getStarvedApps().addStarvedApp(app);
+Resources.addTo(fairShareStarvation, appStarvation);
+  } else {
+break;
+  }
+}
+return fairShareStarvation;
+  }
+
+  /**
+   * Distribute minshare starvation to a set of apps
+   * @param appsWithDemand set of apps
+   * @param minShareStarvation minshare starvation to distribute
+   */
+  private void updateStarvedAppsMinshare(
+  TreeSet appsWithDemand, Resource minShareStarvation) {
+// Keep adding apps to the starved list until the unmet demand goes 
over
+// the remaining minshare
+for (FSAppAttempt app : appsWithDemand) {
+  if (!Resources.isNone(minShareStarvation())) {
+Resource appMinShare =  app.getPendingDemand();
+Resources.subtractFromNonNegative(
+appMinShare, app.getFairshareStarvation());
+
+if (Resources.greaterThan(policy.getResourceCalculator(),
+scheduler.getClusterResource(),
+appMinShare, minShareStarvation)) {
+  Resources.subtractFromNonNegative(
+  appMinShare, minShareStarvation);
+  minShareStarvation = none();
+} else {
+  Resources.subtractFrom(minShareStarvation, appMinShare);
+}
+app.setMinshareStarvation(appMinShare);
+context.getStarvedApps().addStarvedApp(app);
+  } else {
+// Reset minshare starvation in case we had set it in a previous
+// iteration
+app.resetMinshareStarvation();
--- End diff --

Why wouldn't we want to do this in every iteration after the first?


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862032#comment-15862032
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100642853
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -220,6 +220,62 @@ public void updateInternal(boolean checkStarvation) {
   }
 
   /**
+   * Compute the extent of fairshare starvation for a set of apps.
+   *
+   * @param appsWithDemand apps to compute fairshare starvation for
+   * @return aggregate fairshare starvation for all apps
+   */
+  private Resource updateStarvedAppsFairshare(
+  TreeSet appsWithDemand) {
+Resource fairShareStarvation = Resources.clone(none());
+// Fetch apps with unmet demand sorted by fairshare starvation
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (!Resources.isNone(appStarvation))  {
+context.getStarvedApps().addStarvedApp(app);
+Resources.addTo(fairShareStarvation, appStarvation);
+  } else {
+break;
+  }
+}
+return fairShareStarvation;
+  }
+
+  /**
+   * Distribute minshare starvation to a set of apps
+   * @param appsWithDemand set of apps
+   * @param minShareStarvation minshare starvation to distribute
+   */
+  private void updateStarvedAppsMinshare(
+  TreeSet appsWithDemand, Resource minShareStarvation) {
+// Keep adding apps to the starved list until the unmet demand goes 
over
+// the remaining minshare
+for (FSAppAttempt app : appsWithDemand) {
+  if (!Resources.isNone(minShareStarvation())) {
+Resource appMinShare =  app.getPendingDemand();
+Resources.subtractFromNonNegative(
+appMinShare, app.getFairshareStarvation());
+
+if (Resources.greaterThan(policy.getResourceCalculator(),
+scheduler.getClusterResource(),
+appMinShare, minShareStarvation)) {
+  Resources.subtractFromNonNegative(
+  appMinShare, minShareStarvation);
+  minShareStarvation = none();
--- End diff --

Not a fan of modifying an arg.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862039#comment-15862039
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100646588
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -220,6 +220,62 @@ public void updateInternal(boolean checkStarvation) {
   }
 
   /**
+   * Compute the extent of fairshare starvation for a set of apps.
+   *
+   * @param appsWithDemand apps to compute fairshare starvation for
+   * @return aggregate fairshare starvation for all apps
+   */
+  private Resource updateStarvedAppsFairshare(
+  TreeSet appsWithDemand) {
+Resource fairShareStarvation = Resources.clone(none());
+// Fetch apps with unmet demand sorted by fairshare starvation
+for (FSAppAttempt app : appsWithDemand) {
+  Resource appStarvation = app.fairShareStarvation();
+  if (!Resources.isNone(appStarvation))  {
+context.getStarvedApps().addStarvedApp(app);
+Resources.addTo(fairShareStarvation, appStarvation);
+  } else {
+break;
+  }
+}
+return fairShareStarvation;
+  }
+
+  /**
+   * Distribute minshare starvation to a set of apps
+   * @param appsWithDemand set of apps
+   * @param minShareStarvation minshare starvation to distribute
+   */
+  private void updateStarvedAppsMinshare(
+  TreeSet appsWithDemand, Resource minShareStarvation) {
+// Keep adding apps to the starved list until the unmet demand goes 
over
+// the remaining minshare
+for (FSAppAttempt app : appsWithDemand) {
+  if (!Resources.isNone(minShareStarvation())) {
+Resource appMinShare =  app.getPendingDemand();
--- End diff --

Extra space after the equals


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862038#comment-15862038
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100648444
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerWithMockPreemption.java
 ---
@@ -21,6 +21,8 @@
 import java.util.Set;
 
 public class FairSchedulerWithMockPreemption extends FairScheduler {
+  static final long DELAY_FOR_NEXT_STARVATION_CHECK = 10 * 60 * 1000;
--- End diff --

Name should include a unit


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862037#comment-15862037
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r100632402
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
 ---
@@ -163,6 +164,9 @@ public void serviceInit(Configuration conf) throws 
Exception {
 nmExpireInterval =
 conf.getInt(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
   YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS);
+nmHeartbeatInterval =
--- End diff --

Is it supported to have different NMs heartbeat at different intervals?  I 
kinda assume so, though I doubt it's a good idea.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6175) Negative vcore for resource needed to preempt

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866573#comment-15866573
 ] 

ASF GitHub Bot commented on YARN-6175:
--

GitHub user flyrain opened a pull request:

https://github.com/apache/hadoop/pull/194

YARN-6175. Negative vcore for resource needed to preempt.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/flyrain/hadoop yarn-6175

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #194


commit 1151efe4afb5390c43e61f1b578d5c2772dd9017
Author: Yufei Gu 
Date:   2017-02-14T20:22:18Z

YARN-6175. Negative vcore for resource needed to preempt.




> Negative vcore for resource needed to preempt
> -
>
> Key: YARN-6175
> URL: https://issues.apache.org/jira/browse/YARN-6175
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> Both old preemption code (2.8 and before) and new preemption code could have 
> negative vcores while calculating resources needed to preempt.
> For old preemption, you can find following messages in RM logs:
> {code}
> Should preempt  
> {code}
> The related code is in method {{resourceDeficit()}}. 
> For new preemption code, there are no messages in RM logs, the related code 
> is in method {{fairShareStarvation()}}. 
> The negative value isn't only a display issue, but also may cause missing 
> necessary preemption. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866936#comment-15866936
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101172131
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
+ */
+class VisitedResourceRequestTracker {
+  private final Map> 
map =
+  new HashMap<>();
+  private final ClusterNodeTracker nodeTracker;
+
+  VisitedResourceRequestTracker(
+  ClusterNodeTracker nodeTracker) {
+this.nodeTracker = nodeTracker;
+  }
+
+  /**
+   * Check if the {@link ResourceRequest} is visited before, and track it.
+   * @param rr {@link ResourceRequest} to visit
+   * @return true if rr this is the first visit across all
+   * locality levels, false otherwise
+   */
+  boolean visit(ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+
+Map subMap = map.get(priority);
+if (subMap == null) {
+  subMap = new HashMap<>();
+  map.put(priority, subMap);
+}
+
+TrackerPerPriorityResource tracker = subMap.get(capability);
+if (tracker == null) {
+  tracker = new TrackerPerPriorityResource();
+  subMap.put(capability, tracker);
+}
+
+return tracker.visit(rr.getResourceName());
+  }
+
+  private class TrackerPerPriorityResource {
+private Set racksWithNodesVisited = new HashSet<>();
+private Set racksVisted = new HashSet<>();
+private boolean anyVisited;
+
+private boolean visitAny() {
+  if (racksVisted.isEmpty() && racksWithNodesVisited.isEmpty()) {
+anyVisited = true;
+  }
+  return anyVisited;
+}
+
+private boolean visitRack(String rackName) {
+  if (anyVisited || racksWithNodesVisited.contains(rackName)) {
+return false;
+  } else {
+racksVisted.add(rackName);
+return true;
+  }
+}
+
+private boolean visitNode(String rackName) {
--- End diff --

Maybe add Javadoc on these guys to explain the return value?  I know it's 
already explained above, but doesn't hurt to be clear.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: 

[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866940#comment-15866940
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101171109
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
+ */
+class VisitedResourceRequestTracker {
+  private final Map> 
map =
+  new HashMap<>();
+  private final ClusterNodeTracker nodeTracker;
+
+  VisitedResourceRequestTracker(
+  ClusterNodeTracker nodeTracker) {
+this.nodeTracker = nodeTracker;
+  }
+
+  /**
+   * Check if the {@link ResourceRequest} is visited before, and track it.
+   * @param rr {@link ResourceRequest} to visit
+   * @return true if rr this is the first visit across all
+   * locality levels, false otherwise
+   */
+  boolean visit(ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+
+Map subMap = map.get(priority);
--- End diff --

Maybe use a Guava MultiMap?


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866841#comment-15866841
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101159725
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -1147,24 +1147,32 @@ private static boolean checkAndMarkRRVisited(
* starvation.
*/
   List getStarvedResourceRequests() {
+// List of RRs we build in this method to return
 List ret = new ArrayList<>();
+
+// Track visited RRs to avoid the same RR at multiple locality levels
 Map visitedRRs= new HashMap<>();
 
+// Start with current starvation and track the pending amount
 Resource pending = getStarvation();
 for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) {
   if (Resources.isNone(pending)) {
+// Found enough RRs to match the starvation
 break;
   }
+
+  // See if we have already seen this RR
   if (checkAndMarkRRVisited(visitedRRs, rr)) {
 continue;
   }
 
-  // Compute the number of containers of this capability that fit in 
the
-  // pending amount
+  // A RR can have multiple containers of a capability. We need to
+  // compute the number of containers that fit in "pending".
   int ratio = (int) Math.floor(
--- End diff --

Given that ratio is the number of containers that fit in "pending," ratio 
is probably a bad name.  That was a good chunk of my initial confusion.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866842#comment-15866842
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101160719
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -247,23 +247,23 @@ private Resource updateStarvedAppsFairshare(
* @param minShareStarvation minshare starvation to distribute
*/
   private void updateStarvedAppsMinshare(
-  TreeSet appsWithDemand, Resource minShareStarvation) {
+  TreeSet appsWithDemand, final Resource 
minShareStarvation) {
--- End diff --

I would make both or neither final


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866847#comment-15866847
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101161807
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestVisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,101 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at*
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import org.apache.hadoop.yarn.server.resourcemanager.MockNodes;
+import org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNode;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+import org.apache.hadoop.yarn.util.resource.Resources;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.List;
+
+public class TestVisitedResourceRequestTracker {
+  private final ClusterNodeTracker
+  nodeTracker = new ClusterNodeTracker<>();
+  private final ResourceRequest
+  anyRequest, rackRequest, node1Request, node2Request;
+
+  public TestVisitedResourceRequestTracker() {
+List rmNodes =
+MockNodes.newNodes(1, 2, Resources.createResource(8192, 8));
+
+FSSchedulerNode node1 = new FSSchedulerNode(rmNodes.get(0), false);
+nodeTracker.addNode(node1);
+node1Request = createRR(node1.getNodeName(), 1);
+
+FSSchedulerNode node2 = new FSSchedulerNode(rmNodes.get(1), false);
+node2Request = createRR(node2.getNodeName(), 1);
+nodeTracker.addNode(node2);
+
+anyRequest = createRR(ResourceRequest.ANY, 2);
+rackRequest = createRR(node1.getRackName(), 2);
+  }
+
+  private ResourceRequest createRR(String resourceName, int count) {
+return ResourceRequest.newInstance(
+Priority.UNDEFINED, resourceName, Resources.none(), count);
+  }
+
+  @Test
+  public void testVisitAnyRequestFirst() {
+VisitedResourceRequestTracker tracker =
+new VisitedResourceRequestTracker(nodeTracker);
+
+// Visit ANY request first
+Assert.assertTrue(tracker.visit(anyRequest));
--- End diff --

assertTrue() without a message is evil


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866848#comment-15866848
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101161043
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
--- End diff --

I love you.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866938#comment-15866938
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101171746
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
+ */
+class VisitedResourceRequestTracker {
+  private final Map> 
map =
+  new HashMap<>();
+  private final ClusterNodeTracker nodeTracker;
+
+  VisitedResourceRequestTracker(
+  ClusterNodeTracker nodeTracker) {
+this.nodeTracker = nodeTracker;
+  }
+
+  /**
+   * Check if the {@link ResourceRequest} is visited before, and track it.
+   * @param rr {@link ResourceRequest} to visit
+   * @return true if rr this is the first visit across all
+   * locality levels, false otherwise
+   */
+  boolean visit(ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+
+Map subMap = map.get(priority);
+if (subMap == null) {
+  subMap = new HashMap<>();
+  map.put(priority, subMap);
+}
+
+TrackerPerPriorityResource tracker = subMap.get(capability);
+if (tracker == null) {
+  tracker = new TrackerPerPriorityResource();
+  subMap.put(capability, tracker);
+}
+
+return tracker.visit(rr.getResourceName());
+  }
+
+  private class TrackerPerPriorityResource {
+private Set racksWithNodesVisited = new HashSet<>();
+private Set racksVisted = new HashSet<>();
+private boolean anyVisited;
+
+private boolean visitAny() {
+  if (racksVisted.isEmpty() && racksWithNodesVisited.isEmpty()) {
+anyVisited = true;
+  }
+  return anyVisited;
+}
+
+private boolean visitRack(String rackName) {
+  if (anyVisited || racksWithNodesVisited.contains(rackName)) {
+return false;
+  } else {
+racksVisted.add(rackName);
+return true;
+  }
+}
+
+private boolean visitNode(String rackName) {
+  if (anyVisited || racksVisted.contains(rackName)) {
+return false;
+  } else {
+racksWithNodesVisited.add(rackName);
+return true;
+  }
+}
+
+private boolean visit(String resourceName) {
+  if (resourceName.equals(ResourceRequest.ANY)) {
+return visitAny();
+  }
+
+  List nodes =
+  nodeTracker.getNodesByResourceName(resourceName);
+  switch (nodes.size()) {
+case 0:
+  // Log error
+  return false;
+case 1:
+  // Node
+  

[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866937#comment-15866937
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101171862
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
+ */
+class VisitedResourceRequestTracker {
+  private final Map> 
map =
+  new HashMap<>();
+  private final ClusterNodeTracker nodeTracker;
+
+  VisitedResourceRequestTracker(
+  ClusterNodeTracker nodeTracker) {
+this.nodeTracker = nodeTracker;
+  }
+
+  /**
+   * Check if the {@link ResourceRequest} is visited before, and track it.
+   * @param rr {@link ResourceRequest} to visit
+   * @return true if rr this is the first visit across all
+   * locality levels, false otherwise
+   */
+  boolean visit(ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+
+Map subMap = map.get(priority);
+if (subMap == null) {
+  subMap = new HashMap<>();
+  map.put(priority, subMap);
+}
+
+TrackerPerPriorityResource tracker = subMap.get(capability);
+if (tracker == null) {
+  tracker = new TrackerPerPriorityResource();
+  subMap.put(capability, tracker);
+}
+
+return tracker.visit(rr.getResourceName());
+  }
+
+  private class TrackerPerPriorityResource {
+private Set racksWithNodesVisited = new HashSet<>();
+private Set racksVisted = new HashSet<>();
+private boolean anyVisited;
+
+private boolean visitAny() {
+  if (racksVisted.isEmpty() && racksWithNodesVisited.isEmpty()) {
+anyVisited = true;
+  }
+  return anyVisited;
+}
+
+private boolean visitRack(String rackName) {
+  if (anyVisited || racksWithNodesVisited.contains(rackName)) {
+return false;
+  } else {
+racksVisted.add(rackName);
+return true;
+  }
+}
+
+private boolean visitNode(String rackName) {
+  if (anyVisited || racksVisted.contains(rackName)) {
+return false;
+  } else {
+racksWithNodesVisited.add(rackName);
+return true;
+  }
+}
+
+private boolean visit(String resourceName) {
+  if (resourceName.equals(ResourceRequest.ANY)) {
+return visitAny();
+  }
+
+  List nodes =
+  nodeTracker.getNodesByResourceName(resourceName);
+  switch (nodes.size()) {
--- End diff --

I don't love this as a switch.


> FS Preemption is a trickle for severely starved applications
> 

[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866939#comment-15866939
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user templedf commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101172059
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
+ */
+class VisitedResourceRequestTracker {
+  private final Map> 
map =
+  new HashMap<>();
+  private final ClusterNodeTracker nodeTracker;
+
+  VisitedResourceRequestTracker(
+  ClusterNodeTracker nodeTracker) {
+this.nodeTracker = nodeTracker;
+  }
+
+  /**
+   * Check if the {@link ResourceRequest} is visited before, and track it.
+   * @param rr {@link ResourceRequest} to visit
+   * @return true if rr this is the first visit across all
--- End diff --

Extraneous "this"


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867224#comment-15867224
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101201549
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestVisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,101 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at*
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import org.apache.hadoop.yarn.server.resourcemanager.MockNodes;
+import org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNode;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+import org.apache.hadoop.yarn.util.resource.Resources;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.List;
+
+public class TestVisitedResourceRequestTracker {
+  private final ClusterNodeTracker
+  nodeTracker = new ClusterNodeTracker<>();
+  private final ResourceRequest
+  anyRequest, rackRequest, node1Request, node2Request;
+
+  public TestVisitedResourceRequestTracker() {
+List rmNodes =
+MockNodes.newNodes(1, 2, Resources.createResource(8192, 8));
+
+FSSchedulerNode node1 = new FSSchedulerNode(rmNodes.get(0), false);
+nodeTracker.addNode(node1);
+node1Request = createRR(node1.getNodeName(), 1);
+
+FSSchedulerNode node2 = new FSSchedulerNode(rmNodes.get(1), false);
+node2Request = createRR(node2.getNodeName(), 1);
+nodeTracker.addNode(node2);
+
+anyRequest = createRR(ResourceRequest.ANY, 2);
+rackRequest = createRR(node1.getRackName(), 2);
+  }
+
+  private ResourceRequest createRR(String resourceName, int count) {
+return ResourceRequest.newInstance(
+Priority.UNDEFINED, resourceName, Resources.none(), count);
+  }
+
+  @Test
+  public void testVisitAnyRequestFirst() {
+VisitedResourceRequestTracker tracker =
+new VisitedResourceRequestTracker(nodeTracker);
+
+// Visit ANY request first
+Assert.assertTrue(tracker.visit(anyRequest));
--- End diff --

I usually insist on a message. In this case, I found it hard to come up 
with a message that adds value without looking at the code. And, the comments 
in the code call out the expectations clearly. 

If you absolutely insist, I can add the messages. 


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867180#comment-15867180
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101199897
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
+ */
+class VisitedResourceRequestTracker {
+  private final Map> 
map =
+  new HashMap<>();
+  private final ClusterNodeTracker nodeTracker;
+
+  VisitedResourceRequestTracker(
+  ClusterNodeTracker nodeTracker) {
+this.nodeTracker = nodeTracker;
+  }
+
+  /**
+   * Check if the {@link ResourceRequest} is visited before, and track it.
+   * @param rr {@link ResourceRequest} to visit
+   * @return true if rr this is the first visit across all
--- End diff --

Fixed. 


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867178#comment-15867178
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101199879
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
 ---
@@ -247,23 +247,23 @@ private Resource updateStarvedAppsFairshare(
* @param minShareStarvation minshare starvation to distribute
*/
   private void updateStarvedAppsMinshare(
-  TreeSet appsWithDemand, Resource minShareStarvation) {
+  TreeSet appsWithDemand, final Resource 
minShareStarvation) {
--- End diff --

Fixed


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867177#comment-15867177
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101199875
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
 ---
@@ -1147,24 +1147,32 @@ private static boolean checkAndMarkRRVisited(
* starvation.
*/
   List getStarvedResourceRequests() {
+// List of RRs we build in this method to return
 List ret = new ArrayList<>();
+
+// Track visited RRs to avoid the same RR at multiple locality levels
 Map visitedRRs= new HashMap<>();
 
+// Start with current starvation and track the pending amount
 Resource pending = getStarvation();
 for (ResourceRequest rr : appSchedulingInfo.getAllResourceRequests()) {
   if (Resources.isNone(pending)) {
+// Found enough RRs to match the starvation
 break;
   }
+
+  // See if we have already seen this RR
   if (checkAndMarkRRVisited(visitedRRs, rr)) {
 continue;
   }
 
-  // Compute the number of containers of this capability that fit in 
the
-  // pending amount
+  // A RR can have multiple containers of a capability. We need to
+  // compute the number of containers that fit in "pending".
   int ratio = (int) Math.floor(
--- End diff --

Fixed.


> FS Preemption is a trickle for severely starved applications
> 
>
> Key: YARN-6163
> URL: https://issues.apache.org/jira/browse/YARN-6163
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6163-1.patch, yarn-6163-2.patch
>
>
> With current logic, only one RR is considered per each instance of marking an 
> application starved. This marking happens only on the update call that runs 
> every 500ms.  Due to this, an application that is severely starved takes 
> forever to reach fairshare based on preemptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6163) FS Preemption is a trickle for severely starved applications

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867215#comment-15867215
 ] 

ASF GitHub Bot commented on YARN-6163:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/192#discussion_r101201228
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/VisitedResourceRequestTracker.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.api.records.ResourceRequest;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Applications place {@link ResourceRequest}s at multiple levels. This is 
a
+ * helper class that allows tracking if a {@link ResourceRequest} has been
+ * visited at a different locality level.
+ *
+ * This is implemented for {@link 
FSAppAttempt#getStarvedResourceRequests()}.
+ * The implementation is not thread-safe.
+ */
+class VisitedResourceRequestTracker {
+  private final Map> 
map =
+  new HashMap<>();
+  private final ClusterNodeTracker nodeTracker;
+
+  VisitedResourceRequestTracker(
+  ClusterNodeTracker nodeTracker) {
+this.nodeTracker = nodeTracker;
+  }
+
+  /**
+   * Check if the {@link ResourceRequest} is visited before, and track it.
+   * @param rr {@link ResourceRequest} to visit
+   * @return true if rr this is the first visit across all
+   * locality levels, false otherwise
+   */
+  boolean visit(ResourceRequest rr) {
+Priority priority = rr.getPriority();
+Resource capability = rr.getCapability();
+
+Map subMap = map.get(priority);
+if (subMap == null) {
+  subMap = new HashMap<>();
+  map.put(priority, subMap);
+}
+
+TrackerPerPriorityResource tracker = subMap.get(capability);
+if (tracker == null) {
+  tracker = new TrackerPerPriorityResource();
+  subMap.put(capability, tracker);
+}
+
+return tracker.visit(rr.getResourceName());
+  }
+
+  private class TrackerPerPriorityResource {
+private Set racksWithNodesVisited = new HashSet<>();
+private Set racksVisted = new HashSet<>();
+private boolean anyVisited;
+
+private boolean visitAny() {
+  if (racksVisted.isEmpty() && racksWithNodesVisited.isEmpty()) {
+anyVisited = true;
+  }
+  return anyVisited;
+}
+
+private boolean visitRack(String rackName) {
+  if (anyVisited || racksWithNodesVisited.contains(rackName)) {
+return false;
+  } else {
+racksVisted.add(rackName);
+return true;
+  }
+}
+
+private boolean visitNode(String rackName) {
+  if (anyVisited || racksVisted.contains(rackName)) {
+return false;
+  } else {
+racksWithNodesVisited.add(rackName);
+return true;
+  }
+}
+
+private boolean visit(String resourceName) {
+  if (resourceName.equals(ResourceRequest.ANY)) {
+return visitAny();
+  }
+
+  List nodes =
+  nodeTracker.getNodesByResourceName(resourceName);
+  switch (nodes.size()) {
--- End diff --

To me, this is one of those rare cases you could actually use a switch. 
IAC, I changed it to if 

  1   2   3   4   5   6   7   8   9   10   >