yangwwei commented on a change in pull request #173:
URL: 
https://github.com/apache/incubator-yunikorn-core/pull/173#discussion_r447400394



##########
File path: pkg/scheduler/scheduling_application.go
##########
@@ -418,12 +431,34 @@ func (sa *SchedulingApplication) tryAllocate(headRoom 
*resources.Resource, ctx *
        for _, request := range sa.sortedRequests {
                // resource must fit in headroom otherwise skip the request
                if !resources.FitIn(headRoom, request.AllocatedResource) {
+                       // if the queue (or any of its parent) has max capacity 
is defined,
+                       // get the max headroom, this represents the configured 
queue quota.
+                       // if queue quota is enough, but headroom is not, 
usually this means
+                       // the cluster needs to scale up to meet the its 
capacity.
+                       maxHeadRoom := sa.queue.getMaxHeadRoom()
+                       if resources.FitIn(maxHeadRoom, 
request.AllocatedResource) {
+                               
sa.updateContainerSchedulingStateIfNeeded(request,
+                                       
si.UpdateContainerSchedulingStateRequest_FAILED,
+                                       "failed to schedule the request because 
partition resource is not enough")
+                       }
+                       // skip the request
                        continue
                }
                if nodeIterator := ctx.getNodeIterator(); nodeIterator != nil {
                        alloc := sa.tryNodes(request, nodeIterator)
-                       // have a candidate return it
-                       if alloc != nil {
+                       if alloc == nil {
+                               // we have enough headroom, but we could not 
find a node for this request,
+                               // this can happen when non of the nodes is 
qualified for this request,
+                               // by satisfying both conditions:
+                               //   1) node has enough resources;
+                               //   2) node satisfies all placement 
constraints of the request (e.g predicates)

Review comment:
       There are a few issues if we only trigger auto-scale based on 
reservation:
   - Compute nodes might be 0, that means no reservation at all at the initial 
phase
   - On a small cluster, this approach will dramatically slow down the pace of 
auto-scaling. If we have lots of pods pending, it may only trigger the 
auto-scaling 1 node at a time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to