pbacsko commented on code in PR #957:
URL: https://github.com/apache/yunikorn-core/pull/957#discussion_r1744077986
##########
pkg/scheduler/objects/allocation.go:
##########
@@ -565,3 +567,39 @@ func (a *Allocation) setUserQuotaCheckPassed() {
a.askEvents.SendRequestFitsInUserQuota(a.allocationKey,
a.applicationID, a.allocatedResource)
}
}
+
+func (a *Allocation) setPreemptionPreConditionsCheckPassed() {
+ a.Lock()
+ defer a.Unlock()
+ if a.preemptionPreConditionsCheckFailed {
+ a.preemptionPreConditionsCheckFailed = false
+
a.askEvents.SendPreemptionPreConditionsCheckPassed(a.allocationKey,
a.applicationID, a.allocatedResource)
+ }
+}
+
+func (a *Allocation) setPreemptionPreConditionsCheckFailed() {
+ a.Lock()
+ defer a.Unlock()
+ if !a.preemptionPreConditionsCheckFailed {
+ a.preemptionPreConditionsCheckFailed = true
+
a.askEvents.SendPreemptionPreConditionsCheckFailed(a.allocationKey,
a.applicationID, a.allocatedResource)
+ }
+}
+
+func (a *Allocation) setPreemptionQueueGuaranteesCheckPassed() {
+ a.Lock()
+ defer a.Unlock()
+ if a.preemptionQueueGuaranteesCheckFailed {
+ a.preemptionQueueGuaranteesCheckFailed = false
+
a.askEvents.SendPreemptionQueueGuaranteesCheckPassed(a.allocationKey,
a.applicationID, a.allocatedResource)
+ }
+}
+
+func (a *Allocation) setPreemptionQueueGuaranteesCheckFailed() {
+ a.Lock()
+ defer a.Unlock()
+ if !a.preemptionQueueGuaranteesCheckFailed {
+ a.preemptionQueueGuaranteesCheckFailed = true
+
a.askEvents.SendPreemptionQueueGuaranteesCheckFailed(a.allocationKey,
a.applicationID, a.allocatedResource)
+ }
+}
Review Comment:
I think an acceptable approach is recording stuff like the allocation log
for each request. That looks reasonable to me. We can have a type like
`AllocationLog` and we define fixed causes about why the preemption failed:
* "Preconditions checks failed" (`preemptor.CheckPreconditions()` false)
* "No guarantee to free up resources" (`p.checkPreemptionQueueGuarantees()`
false)
* "Preemption shortfall"
(`p.ask.GetAllocatedResource().StrictlyGreaterThanOnlyExisting()` false)
Every time something fails, you increase a counter for it. You also count
the number of total attempts.
So I'm thinking sth like:
```
// Mutable fields which need protection
allocated bool
allocLog map[string]*AllocationLogEntry
preemptionLog map[string]*AllocationLogEntry // new field
preemptionAttempts int64 // new field
preemptionTriggered bool
preemptCheckTime time.Time
...
var (
ErrPreemptionPrecondFailed = errors.New("Preconditions checks failed")
ErrPreemptionNoGuarantee = errors.New("No guarantee to free up resources")
ErrPreemptionShortfall = errors.New("Preemption shortfall")
)
```
Then in `application.go`:
```
if fullIterator != nil {
request.IncPreemptionAttempts()
if result, err :=
sa.tryPreemption(headRoom, preemptionDelay, request, fullIterator, false); err
!= nil {
// preemption occurred, and
possibly reservation
return result
}
request.LogPreemptionFailure(err)
}
}
```
Obviously you need to make sure that problems propagate back so you can log
it. Then this whole thing can be exposed on the REST API just like the
allocation log.
It's obviously not as detailed as a separate log entry printed to the
stdout, but I see this as a good compromise.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]