[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960541#comment-13960541
 ] 

Jason Lowe commented on YARN-1769:
----------------------------------

The patch no longer applies cleanly after YARN-1512.  Other comments on the 
patch:

- Nit: In LeafQueue.assignToQueue we could cache Resource.add(usedResources, 
required) in a local when we're computing potentialNewCapacity so we don't have 
to recompute it as part of the potentialNewWithoutReservedCapacity computation
- LeafQueue.assignToQueue and LeafQueue.assignToUser don't seem to need the new 
priority argument, and therefore LeafQueue.checkLimitsToReserve wouldn't seem 
to need it either once those others are updated.
- Should FiCaSchedulerApp getAppToUnreserve really be called 
getNodeIdToUnreserve or geNodeToUnreserve, since it's returning a node ID 
rather than an app?
- In LeafQueue.findNodeToUnreserve, isn't it kinda bad if the app thinks it has 
reservations on the node but the scheduler doesn't know about it?  Wondering if 
the bookkeeping is messed up at that point therefore something a bit more than 
debug is an appropriate log level and if further fixup is needed.
- LeafQueue.findNodeToUnreserve is adjusting the headroom when it unreserves, 
but I don't see other unreservations doing a similar calculation.  Wondering if 
this fixup is something that should have been in completedContainer or needs to 
be done elsewhere?  I could easily be missing something here but asking just in 
case other unreservation situations also need to have the headroom fixed.
- LeafQueue.assignContainer uses the much more expensive 
scheduler.getConfiguration().getReservationContinueLook() when it should be 
able to use the reservationsContinueLooking member instead.
- LeafQueue.getReservationContinueLooking should be package private
- Nit: LeafQueue.assignContainer has some reformatting of the log message after 
the "// Inform the node" comment which was clearer to read/maintain before 
since the label and the value were always on a line by themselves.  Same goes 
for the "Reserved container" log towards the end of the method.
- Ultra-Nit: ParentQueue.setupQueueConfig's log message should have the 
reservationsContinueLooking on the previous line to match the style of other 
label/value pairs in the log message.
- ParentQueue.getReservationContinueLooking should be package private.

> CapacityScheduler:  Improve reservations
> ----------------------------------------
>
>                 Key: YARN-1769
>                 URL: https://issues.apache.org/jira/browse/YARN-1769
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 2.3.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests 
> for large containers and the fact there might not currently be enough space 
> available on a single host.
> The current algorithm for reservations is to reserve as many containers as 
> currently required and then it will start to reserve more above that after a 
> certain number of re-reservations (currently biased against larger 
> containers).  Anytime it hits the limit of number reserved it stops looking 
> at any other nodes. This results in potentially missing nodes that have 
> enough space to fullfill the request.   
> The other place for improvement is currently reservations count against your 
> queue capacity.  If you have reservations you could hit the various limits 
> which would then stop you from looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to 
> take a long time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming 
> nodes to see if we could potentially swap out a reservation for an actual 
> allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to