Re: [PR] [YUNIKORN-3118] Parallelize TryNode evaluations [yunikorn-core]

via GitHub Fri, 31 Oct 2025 01:45:01 -0700


pbacsko commented on code in PR #1043:
URL: https://github.com/apache/yunikorn-core/pull/1043#discussion_r2480506675



##########
pkg/scheduler/objects/application.go:
##########
@@ -1460,6 +1469,171 @@ func (sa *Application) tryNodesNoReserve(ask 
*Allocation, iterator NodeIterator,
 
 // Try all the nodes for a request. The resultType is an allocation or 
reservation of a node.
 // New allocations can only be reserved after a delay.
+func (sa *Application) tryNodesInParallel(ask *Allocation, iterator 
NodeIterator, tryNodesThreadCount int) *AllocationResult { //nolint:funlen
+       var nodeToReserve *Node
+       scoreReserved := math.Inf(1)
+       allocKey := ask.GetAllocationKey()
+       reserved := sa.reservations[allocKey]
+       var allocResult *AllocationResult
+       var predicateErrors map[string]int
+
+       var mu sync.Mutex
+
+       // Channel to signal completion
+       done := make(chan struct{})
+       defer close(done)
+
+       // Function to process each batch
+       processBatch := func(batch []*Node) {
+               var wg sync.WaitGroup
+               semaphore := make(chan struct{}, tryNodesThreadCount)
+               candidateNodes := make([]*Node, len(batch))
+               errors := make([]error, len(batch))
+
+               for idx, node := range batch {
+                       wg.Add(1)
+                       semaphore <- struct{}{}
+                       go func(idx int, node *Node) {
+                               defer wg.Done()
+                               defer func() { <-semaphore }()
+                               dryRunResult, err := sa.tryNodeDryRun(node, ask)
+
+                               mu.Lock()
+                               defer mu.Unlock()
+                               if err != nil {
+                                       errors[idx] = err
+                               } else if dryRunResult != nil {
+                                       candidateNodes[idx] = node
+                               }
+                       }(idx, node)
+               }

Review Comment:
   Interesting, but do I have a concern with this approach. What if a large 
cluster (eg 5000 nodes), we have unschedulable pods? In this case, we'd create 
5000 goroutines for a single request in every scheduling cycle. If we have 10 
unschedulable pods, that's 50000 gorutines - once per cycle. Overall, 500k 
goroutines per second.
   Goroutines are cheap, but not free.
   
   This might be an extreme, but we have to think about extremes, even if 
they're less common. 
   
   I'd definitely think about some sort of pooling solution, essentially worker 
goroutines which are always running and waiting for asks to evaluate. Shouldn't 
be hard to implement.
   
   Anyway, I do have a simple test case which checks performance in the shim 
under `pkg/shim/scheduling_perf_test.go`. It's called 
`BenchmarkSchedulingThroughPut()`. This could be modified to submit 
unschedulable pods (eg. ones with a node selector that never matches) to see 
how it affects performance. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [YUNIKORN-3118] Parallelize TryNode evaluations [yunikorn-core]

Reply via email to