[ 
https://issues.apache.org/jira/browse/YUNIKORN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796508#comment-17796508
 ] 

Weiwei Yang commented on YUNIKORN-2270:
---------------------------------------

Based on my investigation. I think the issue is because of these lines of code: 
https://github.com/apache/yunikorn-core/blob/620687afe10638d3e191edffbc81959985abbbb4/pkg/scheduler/objects/preemption.go#L576-L587.
 In my case, because all GPUs are used and there are 300 pods pending on other 
queues. The head room for GPU is always 0. So it did not go the reserve code. 
And goes to: "Preempting allocations for ask, but not reserving yet as queue is 
still above capacity". So the asks in queue a are marked as triggered 
preemption, but unable to get the preempted resources.


> GPU Preemption is not triggered as expected when all available GPUs are used
> ----------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2270
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2270
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Weiwei Yang
>            Priority: Major
>
> I am testing an important scenario of preemption for GPU. The design a 
> scenario is like the following:
> queue structure is pretty simple:
> {code}
> root.a (min=100, max=300)
> root.b (min=0, max=300)
> {code}
> the cluster has a total of 300 GPUs available, no autoscaling. Reproducing 
> steps:
> 1. Create 600 pods in root.b queue, each needs 1 GPU. This will consume all 
> 300 GPUs available in the cluster, and 300 pods pending
> 2. Create 100 pods in root.a queue, each needs 1 GPU. The expectation is 
> queue a will preempt 100 GPU from queue b reach the guarantee. 
> observation: a small number of pods preempted resources from queue b got 
> started on queue a, the result is not stable. it could not reach guaranteed 
> resources. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to