There are inconsistencies in both the documentation and operation of slurm
regarding the combination of PreemptType=preempt/qos and
PreemptMode=suspend,gang. Some documentation
says PreemptType=preempt/qos isn't compatible with PreemptMode=SUSPEND and
other documentation shows it as a valid combination. The system does not
allow designating PreemptType=preempt/qos
and PreemptMode=suspend,gang in slurm.conf, but sacctmgr allows modifying
a qos to set PreemptMode=suspend when the system is configured with
PreemptType=preempt/qos.
slurm.conf man for PreemptType=preempt/qos:
This is not compatible with PreemptMode=OFF or
Preempt-
Mode=SUSPEND (i.e. preempted jobs must be removed
from
the resources).
sacctmgr man in SPECIFICATIONS FOR QOS:
PreemptMode
Mechanism used to preempt jobs of this QOS if the clusters
Pre-
emptType is configured to preempt/qos. The default
preemption
mechanism is specified by the cluster-wide PreemptMode
configu-
ration parameter. Possible values are "Cluster" (meaning
use
cluster default), "Cancel", "Checkpoint", "Requeue" and
"Sus-
pend".
preempt.html page :
preempt/qos indicates that jobs from one Quality Of Service (QOS)
can preempt jobs from a lower QOS. These jobs can be in the same
partition or different partitions. PreemptMode must be set to
CANCEL,
CHECKPOINT, SUSPEND or REQUEUE.
I ran some experiments to see how slurm would respond. First I designate
PreemptType=preempt/qos with PreemptMode=suspend,gang. When I started
slurm with these options I see the following message:
slurmd: fatal: PreemptType and PreemptMode values incompatible
I changed those options so that slurm would start and issued the following
command:
'sacctmgr modify qos where name=lowpri set preemptmode=suspend'
This modification was accepted & when I issued 'sacctmgr show qos' it did
display the PreemptMode of 'suspend'.
I am willing to make changes to make this consistent, but need to know
whether the intent is to support PreemptType=preempt/qos and
PreemptMode=suspend,gang or not. If not, I need to know the
reasoning so I can update documentation and logic accordingly.
Best Regards,
Bill