Hi Bill,
The logic supporting gang scheduling is also used to resume jobs which
have been suspended for a higher priority job. All of the data
structures in that module (src/slurmctld/gang.c) are designed to
support preemption based upon job partition and there is no logic
present to support preemption based upon QOS. It certainly could be
added at some time, but is completely absent today.
It would be great if you would make the documentation changes to
reflect the current behavior. If you send a patch, we'll get it into
the next release.
Thanks,
Moe Jette
Quoting [email protected]:
There are inconsistencies in both the documentation and operation of slurm
regarding the combination of PreemptType=preempt/qos and
PreemptMode=suspend,gang. Some documentation
says PreemptType=preempt/qos isn't compatible with PreemptMode=SUSPEND and
other documentation shows it as a valid combination. The system does not
allow designating PreemptType=preempt/qos
and PreemptMode=suspend,gang in slurm.conf, but sacctmgr allows modifying
a qos to set PreemptMode=suspend when the system is configured with
PreemptType=preempt/qos.
slurm.conf man for PreemptType=preempt/qos:
This is not compatible with PreemptMode=OFF or
Preempt-
Mode=SUSPEND (i.e. preempted jobs must be removed
from
the resources).
sacctmgr man in SPECIFICATIONS FOR QOS:
PreemptMode
Mechanism used to preempt jobs of this QOS if the clusters
Pre-
emptType is configured to preempt/qos. The default
preemption
mechanism is specified by the cluster-wide PreemptMode
configu-
ration parameter. Possible values are "Cluster" (meaning
use
cluster default), "Cancel", "Checkpoint", "Requeue" and
"Sus-
pend".
preempt.html page :
preempt/qos indicates that jobs from one Quality Of Service (QOS)
can preempt jobs from a lower QOS. These jobs can be in the same
partition or different partitions. PreemptMode must be set to
CANCEL,
CHECKPOINT, SUSPEND or REQUEUE.
I ran some experiments to see how slurm would respond. First I designate
PreemptType=preempt/qos with PreemptMode=suspend,gang. When I started
slurm with these options I see the following message:
slurmd: fatal: PreemptType and PreemptMode values incompatible
I changed those options so that slurm would start and issued the following
command:
'sacctmgr modify qos where name=lowpri set preemptmode=suspend'
This modification was accepted & when I issued 'sacctmgr show qos' it did
display the PreemptMode of 'suspend'.
I am willing to make changes to make this consistent, but need to know
whether the intent is to support PreemptType=preempt/qos and
PreemptMode=suspend,gang or not. If not, I need to know the
reasoning so I can update documentation and logic accordingly.
Best Regards,
Bill