wilfred-s commented on a change in pull request #173:
URL: 
https://github.com/apache/incubator-yunikorn-core/pull/173#discussion_r444944133



##########
File path: pkg/scheduler/scheduling_application.go
##########
@@ -418,12 +431,34 @@ func (sa *SchedulingApplication) tryAllocate(headRoom 
*resources.Resource, ctx *
        for _, request := range sa.sortedRequests {
                // resource must fit in headroom otherwise skip the request
                if !resources.FitIn(headRoom, request.AllocatedResource) {
+                       // if the queue (or any of its parent) has max capacity 
is defined,
+                       // get the max headroom, this represents the configured 
queue quota.
+                       // if queue quota is enough, but headroom is not, 
usually this means
+                       // the cluster needs to scale up to meet the its 
capacity.
+                       maxHeadRoom := sa.queue.getMaxHeadRoom()
+                       if resources.FitIn(maxHeadRoom, 
request.AllocatedResource) {
+                               
sa.updateContainerSchedulingStateIfNeeded(request,
+                                       
si.UpdateContainerSchedulingStateRequest_FAILED,
+                                       "failed to schedule the request because 
partition resource is not enough")
+                       }
+                       // skip the request
                        continue
                }
                if nodeIterator := ctx.getNodeIterator(); nodeIterator != nil {
                        alloc := sa.tryNodes(request, nodeIterator)
-                       // have a candidate return it
-                       if alloc != nil {
+                       if alloc == nil {
+                               // we have enough headroom, but we could not 
find a node for this request,
+                               // this can happen when non of the nodes is 
qualified for this request,
+                               // by satisfying both conditions:
+                               //   1) node has enough resources;
+                               //   2) node satisfies all placement 
constraints of the request (e.g predicates)

Review comment:
       In this case we might not have been able to allocate or reserve due to 
affinity. Scaling up a node will not help. This is the case that in earlier 
discussions we tried to prevent from causing the cluster to scale up.
   
   In a smaller cluster we might have already reserved all nodes and thus 
return no allocation. These reservations would have already triggered scale up.
   I think we need to be conservative in this case and scale up on reservation:
   ```
        if alloc.result == reserved {
                sa.updateContainerSchedulingState(...)
        }
        return alloc
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to