I am running slurm 2.6.0 with PreemptType=preempt/partition_prio and the
following partition setup
PartitionName=DEFAULT Shared=NO State=UP Default=NO Priority=5000 MaxNodes=32
MaxTime=5760
PartitionName=ncfs Nodes=compute-[0-3]-[0-31] PreemptMode=REQUEUE
Priority=10000 AllowGroups=ncfs,itgroup
PartitionName=batch Nodes=compute-[0-3]-[0-31] Default=YES PreemptMode=REQUEUE
PartitionName=runatrisk Nodes=compute-[0-3]-[0-31] MaxTime=1440
PreemptMode=CANCEL Priority=1000
Preemption is functioning but something is not quite right. I have an actual
example below that illustrates the odd behavior and would appreciate any help
from the community. If you need additional information regarding my config
please let me know.
The job I submit to the runatrisk partition is a 32 node 8 tasks per node MPI
job that sleeps for an hour.
The job I submit to the ncfs partition is a 32 node 8 tasks per node MPI job
that just prints "Hello".
The ncfs partition has a higher priority than runatrisk which triggers the
preemption.
I submit 4 jobs to runatrisk to fill up the nodes on compute-0-[0-31] and
compute-1-[0-31], the only place that a 32 node job can run.
[marcin@ht0 mpi-test]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
20817 ncfs ncfs_hel marcin PD 0:00 32 (Resources)
20813 runatrisk runatris marcin CG 2:06 32
compute-0-[0-31]
20816 runatrisk runatris marcin R 0:28 32
compute-1-[0-31]
20815 runatrisk runatris marcin R 0:29 32
compute-1-[0-31]
20814 runatrisk runatris marcin R 0:30 32
compute-0-[0-31]
>> below is the strange behavior, slurm decided to kill 2 of the runatrisk jobs
>> and one is on compute-0* and the other on compute-1*, in the end the ncfs
>> job ends up running on compute-0*, it shouldn't have killed that runatrisk
>> job on compute-1*. Does anyone know why this happens?
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
20817 ncfs ncfs_hel marcin PD 0:00 32 (Resources)
20815 runatrisk runatris marcin CG 1:09 32
compute-1-[0-31]
20813 runatrisk runatris marcin CG 2:06 5
compute-0-[0-1,9,17,25]
20816 runatrisk runatris marcin R 1:10 32
compute-1-[0-31]
20814 runatrisk runatris marcin R 1:12 32
compute-0-[0-31]
[marcin@ht0 mpi-test]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
20817 ncfs ncfs_hel marcin R 0:01 32
compute-0-[0-31]
20815 runatrisk runatris marcin CG 1:09 5
compute-1-[0-1,9,17,25]
20816 runatrisk runatris marcin R 1:33 32
compute-1-[0-31]
20814 runatrisk runatris marcin R 1:35 32
compute-0-[0-31]
[marcin@ht0 mpi-test]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
20816 runatrisk runatris marcin R 3:29 32
compute-1-[0-31]
20814 runatrisk runatris marcin R 3:31 32
compute-0-[0-31]
Thanks,
Marcin