pbacsko commented on code in PR #933:
URL: https://github.com/apache/yunikorn-core/pull/933#discussion_r1707022863
##########
pkg/scheduler/objects/queue.go:
##########
@@ -1017,17 +1016,18 @@ func (sq *Queue) IncAllocatedResource(alloc
*resources.Resource, nodeReported bo
sq.Lock()
defer sq.Unlock()
// all OK update this queue
- sq.allocatedResource = newAllocated
+ sq.allocatedResource = resources.Add(sq.allocatedResource, alloc)
sq.updateAllocatedResourceMetrics()
return nil
}
+// allocatedResFits adds the passed in resource to the allocatedResource of
the queue and checks if it still fits in the
+// queues' maximum. If the resource fits it returns true otherwise false.
// small helper method to access sq.maxResource+sq.allocatedResource and avoid
Clone() call
-func (sq *Queue) allocatedResFits(res *resources.Resource) (bool,
*resources.Resource) {
+func (sq *Queue) allocatedResFits(res *resources.Resource) bool {
sq.RLock()
defer sq.RUnlock()
- newAllocated := resources.Add(sq.allocatedResource, res)
Review Comment:
I'm thinking about this...
This is the error message which Craig reproduced:
```
2024-07-26T14:42:30.364Z DPANIC core.scheduler.application
objects/application.go:1496 queue update failed unexpectedly {"error":
"allocation (map[pods:1]) puts queue 'root' over maximum allocation
(map[[devices.kubevirt.io/kvm:0](http://devices.kubevirt.io/kvm:0)
[devices.kubevirt.io/tun:0](http://devices.kubevirt.io/tun:0)
[devices.kubevirt.io/vhost-net:0](http://devices.kubevirt.io/vhost-net:0)
ephemeral-storage:744815809367 hugepages-1Gi:0 hugepages-2Mi:0
memory:1115240587264 [nvidia.com/gpu:0](http://nvidia.com/gpu:0) pods:660
vcore:320000]), current usage (map[memory:314572800
[nvidia.com/gpu:1](http://nvidia.com/gpu:1) pods:23 vcore:200])"}
```
It says `[nvidia.com/gpu:0]` and this is printed from
`sq.allocatedResource`, which means that this resource type is actually
present. That is, if we want to prevent this bug, this change won't solve it
because both `AddOnlyExisting()` and `FitInMaxUnder()` will consider the gpu
type. Am I missing something here?
I haven't run any code locally, just trying to have a good understanding by
simply looking at it :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]