[
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484674#comment-16484674
]
Jason Lowe commented on YARN-8292:
----------------------------------
Thanks for updating the patch!
Why does isAnyMajorResourceZeroOrNegative explicitly use a floating point zero
constant and force the implicit conversion of the getMemorySize() result from a
long to a float? This is done in a few other places in
DefaultResourceCalculator and they all seem like wasteful conversions to me.
The logger that was added to AbstractPreemptableResourceCalculator is not used.
Also I'm curious why commons logging was used here instead of SLF4J.
stepFactor is a constant that should be precomputed in the
AbstractPreemptableResourceCalculator constructor rather than computing it from
scratch each time.
Do we really want to use Resources.lessThanOrEqual(rc, totGuarant, unassigned,
none) here? For DRF that requires computing shares in each resource dimension
for both resources which is relatively expensive. I think
Resources.fitsIn(unassigned, none) is more along what what is called for here
(although fitsIn does some unit checking and conversions we don't want either).
Really what we want is something like a isAnyMajorResourceRequested() which
returns true if any resource dimension is > 0. Not a fan of the proposed
method name, but hopefully it gets across what I'm talking about here. Of
course if we're going to always componentwiseMax unassigned with
Resources.none() to make sure no resource dimension in unassigned can ever go
negative then the check can be simplified to if
(Resources.none().equals(unassigned)).
Similar "do we really want a full DRF comparison here" comment for the
Resources.greaterThan(rc, clusterResource, toObtainByPartition,
Resources.none()) check and the Resources.lessThan check that occurs a bit
later.
The comment says:
{code}
* When true:
* stop preempt container when any resource type < 0 for to-
* preempt.
{code}
but the code will stop preempting if any resource dimension <= 0 since it does:
{code}
if (conservativeDRF) {
doPreempt = !Resources.isAnyMajorResourceZeroOrNegative(rc,
toObtainByPartition);
{code}
I agree with Eric that this essentially means conservativeDRF is badly broken
if there is a resource dimension that is not requested by every container, and
that raises the question of whether it makes sense to make conservativeDRF the
default.
It would be good to cleanup the unused imports as flagged by checkstyle.
> Fix the dominant resource preemption cannot happen when some of the resource
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Sumana Sathish
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch,
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch,
> YARN-8292.006.patch
>
>
> This is an example of the problem:
>
> {code}
> // guaranteed, max, used, pending
> "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]