[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

Jason Lowe (JIRA) Wed, 27 Jul 2016 12:22:48 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396200#comment-15396200
 ]


Jason Lowe commented on YARN-4280:
----------------------------------

Thanks for updating the patch, Kuhu!

The copy constructor for ResourceLimits doesn't copy all of the fields, 
specifically allowPreempt is missing.  Intentional?  If so, maybe an alternate 
method that generates a new ResourceLimit should be used rather than what looks 
like a copy constructor?  Otherwise it's going to be confusing to others that 
assume it really is a full-blown copy of all fields or whether or not the copy 
constructor should be updated when a new field is added to the class.

Nit: Adding a "_SKIPPED" suffix to every enum is somewhat redundant when the 
enum type is SkippedType.  These could be simplified to NONE, QUEUE_LIMIT, and 
OTHER.

Similar copy constructor comment for CSAssignment since increaseAllocation and 
containersToKill are not copied.

The debug log statement in ParentQueue needs to be wrapped with a log debug 
check otherwise we'll always do the expensive string construction even when we 
won't log it.

Do we realy want the folllowing to execute when the assignments skipped type is 
OTHER_SKIPPED or the assignment is not the null assigment?  Seems like if there 
is a real assignment of resources we should return it without overriding the 
skip type.  Or am I missing a scenario where we need to set QUEUE_LIMIT_SKIPPED 
even when there's a real assignment in a sibling queue?
{code}
    if(!skippedType.equals(assignment.getSkippedType())) {
      //Make a copy of the assignment to avoid changing when
      // returned assignment == NULL_ASSIGNMENT
      CSAssignment csAssignment = new CSAssignment(assignment);
      csAssignment.setSkippedType(skippedType);
      return csAssignment;
    }
{code}

Why does MockNodes have a new static method that creates a Resource?  Seems to 
have nothing to do with the rest of the class and creates a maintenance burden 
when the Resource class gets updated in the future. Anyone who needs a Resource 
should be calling Resource.newInstance.




> CapacityScheduler reservations may not prevent indefinite postponement on a 
> busy cluster
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-4280
>                 URL: https://issues.apache.org/jira/browse/YARN-4280
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.6.1, 2.8.0, 2.7.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-4280-branch-2.009.patch, 
> YARN-4280-branch-2.8.001.patch, YARN-4280-branch-2.8.002.patch, 
> YARN-4280-branch-2.8.003.patch, YARN-4280.001.patch, YARN-4280.002.patch, 
> YARN-4280.003.patch, YARN-4280.004.patch, YARN-4280.005.patch, 
> YARN-4280.006.patch, YARN-4280.007.patch, YARN-4280.008.patch, 
> YARN-4280.008_.patch, YARN-4280.009.patch
>
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at 
> total cluster capacity. There are 2 applications, appX that runs on Queue A, 
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2 
> GB containers.
> The user limit is high enough for the application to reach 100% of the 
> cluster resource. 
> appX is running at total cluster capacity, full with 1G containers releasing 
> only one container at a time. appY comes in with a request of 2GB container 
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it 
> has higher priority and should reserve for its 2 GB request. Since this 
> request puts the alloc+reserve above total capacity of the cluster, 
> reservation is not made. appX comes in with a 1GB request and since 1GB is 
> still available, the request is allocated. 
> This can continue indefinitely causing priority inversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

Reply via email to