wilfred-s commented on code in PR #458:
URL: https://github.com/apache/yunikorn-core/pull/458#discussion_r1036719716


##########
pkg/scheduler/objects/application.go:
##########
@@ -1093,6 +1097,9 @@ func (sa *Application) tryReservedAllocate(headRoom 
*resources.Resource, nodeIte
 
                // Do we need a specific node?
                if ask.GetRequiredNode() != "" {
+                       if ask.HasTriggeredPreemption() || 
time.Since(ask.GetLastPreemptionAttempt()) < sa.preemptionAttemptInterval {

Review Comment:
   I think @craigcondit was right: once we triggered preemption the kills will 
happen and cannot be reversed. We do not re-trigger *ever* in any case not even 
after a short period of time.
   The idea is to find a victim list and if we have a list:
   -  mark the ask as preemption triggered 
   - kill all victims
   - mark all victims as preempted
   
   Next scheduling cycle if the ask is still there and has the preemption 
triggered flag skip it and do not re-trigger. The node is reserved for the ask 
so it will get scheduled on that node. It could take up to the grace time for 
the pod to shutdown, which could be loooong in scheduling times.



##########
pkg/scheduler/objects/allocation_ask.go:
##########
@@ -245,3 +247,31 @@ func (aa *AllocationAsk) GetAllocationLog() 
[]*AllocationLogEntry {
        }
        return res
 }
+
+// SetTriggeredPreemption sets whether preemption has been triggered for this 
ask
+func (aa *AllocationAsk) SetTriggeredPreemption(triggered bool) {
+       aa.Lock()
+       defer aa.Unlock()
+       aa.preemptionTriggered = triggered
+}
+
+// HasTriggeredPreemption returns whether this ask has triggered preemption
+func (aa *AllocationAsk) HasTriggeredPreemption() bool {
+       aa.RLock()
+       defer aa.RUnlock()
+       return aa.preemptionTriggered
+}
+
+// SetLastPreemptionAttempt sets the time which tells when the last preemption 
attempt was performed for this ask.
+func (aa *AllocationAsk) SetLastPreemptionAttempt(attempt time.Time) {

Review Comment:
   I would expect that the method would generate the time by calling 
`time.Now()` instead of passing in the pre-created object.
   
   If this was done to allow resetting the time during tests we should add a 
method that allows clearing both time and the flag in one call. There could 
even be an argument for not having the boolean: if the time is set the 
preemption has been triggered.



##########
pkg/scheduler/objects/allocation.go:
##########
@@ -356,6 +357,20 @@ func (a *Allocation) GetAllocatedResource() 
*resources.Resource {
        return a.allocatedResource
 }
 
+// SetPreempted marks the allocation as preempted.
+func (a *Allocation) SetPreempted(preempted bool) {

Review Comment:
   Should we log something at a `Debug` or even `Error` level if we try to set 
this flag when already set? I would consider it a big problem.



##########
pkg/scheduler/objects/allocation.go:
##########
@@ -356,6 +357,20 @@ func (a *Allocation) GetAllocatedResource() 
*resources.Resource {
        return a.allocatedResource
 }
 
+// SetPreempted marks the allocation as preempted.
+func (a *Allocation) SetPreempted(preempted bool) {

Review Comment:
   I would highly doubt that we would ever unset this flag.
   When we trigger preemption we ask the shim to kill the pod. That is not 
going to be undone.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1742,3 +1754,10 @@ func (sa *Application) 
SetTimedOutPlaceholder(taskGroupName string, timedOut int
                sa.placeholderData[taskGroupName].TimedOut = timedOut
        }
 }
+
+// test only
+func (sa *Application) SetPreemptionAttemptInterval(interval time.Duration) {

Review Comment:
   See earlier comment I think this is not needed at all.



##########
pkg/scheduler/objects/application.go:
##########
@@ -1128,6 +1135,7 @@ func (sa *Application) tryReservedAllocate(headRoom 
*resources.Resource, nodeIte
 }
 
 func (sa *Application) tryPreemption(reserve *reservation, ask *AllocationAsk) 
bool {
+       ask.SetLastPreemptionAttempt(time.Now())

Review Comment:
   We should not set this unless we have really preempted otherwise it looks 
like we did but we really did not. See comment above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to