Hi Bill,

The logic supporting gang scheduling is also used to resume jobs which have been suspended for a higher priority job. All of the data structures in that module (src/slurmctld/gang.c) are designed to support preemption based upon job partition and there is no logic present to support preemption based upon QOS. It certainly could be added at some time, but is completely absent today.

It would be great if you would make the documentation changes to reflect the current behavior. If you send a patch, we'll get it into the next release.

Thanks,
Moe Jette


Quoting [email protected]:

There are inconsistencies in both the documentation and operation of slurm
regarding the combination of PreemptType=preempt/qos and
PreemptMode=suspend,gang.  Some documentation
says PreemptType=preempt/qos isn't compatible with PreemptMode=SUSPEND and
other documentation shows it as a valid combination.  The system does not
allow designating PreemptType=preempt/qos
and PreemptMode=suspend,gang in slurm.conf, but sacctmgr allows modifying
a qos to set PreemptMode=suspend when the system is configured with
PreemptType=preempt/qos.

slurm.conf man for PreemptType=preempt/qos:

                    This  is  not compatible with PreemptMode=OFF or
Preempt-
                     Mode=SUSPEND (i.e. preempted jobs must  be  removed
from
                     the resources).

sacctmgr man in SPECIFICATIONS FOR QOS:

PreemptMode
              Mechanism used to preempt jobs of this QOS if the clusters
Pre-
              emptType  is  configured to preempt/qos.  The default
preemption
              mechanism is specified by the cluster-wide PreemptMode
configu-
              ration  parameter.   Possible  values are "Cluster" (meaning
use
              cluster default), "Cancel", "Checkpoint",  "Requeue"  and
"Sus-
              pend".

preempt.html page :

        preempt/qos indicates that jobs from one Quality Of Service (QOS)
        can preempt jobs from a lower QOS. These jobs can be in the same
        partition or different partitions. PreemptMode must be set to
CANCEL,
        CHECKPOINT, SUSPEND or REQUEUE.

I ran some experiments to see how slurm would respond.  First I designate
PreemptType=preempt/qos with PreemptMode=suspend,gang.  When I started
slurm with these options I see the following message:

slurmd: fatal: PreemptType and PreemptMode values incompatible

I changed those options so that slurm would start and issued the following
command:

'sacctmgr modify qos where name=lowpri set preemptmode=suspend'

This modification was accepted & when I issued 'sacctmgr show qos' it did
display the PreemptMode of 'suspend'.

I am willing to make changes to make this consistent, but need to know
whether the intent is to support PreemptType=preempt/qos and
PreemptMode=suspend,gang or not.  If not, I need to know the
reasoning so I can update documentation and logic accordingly.

Best Regards,
Bill





Reply via email to