I am running slurm 2.6.0 with PreemptType=preempt/partition_prio and the 
following partition setup

PartitionName=DEFAULT Shared=NO State=UP Default=NO Priority=5000 MaxNodes=32 
MaxTime=5760

PartitionName=ncfs Nodes=compute-[0-3]-[0-31] PreemptMode=REQUEUE 
Priority=10000 AllowGroups=ncfs,itgroup
PartitionName=batch Nodes=compute-[0-3]-[0-31] Default=YES PreemptMode=REQUEUE
PartitionName=runatrisk Nodes=compute-[0-3]-[0-31] MaxTime=1440 
PreemptMode=CANCEL Priority=1000

Preemption is functioning but something is not quite right. I have an actual 
example below that illustrates the odd behavior and would appreciate any help 
from the community. If you need additional information regarding my config 
please let me know.
The job I submit to the runatrisk partition is a 32 node 8 tasks per node MPI 
job that sleeps for an hour.
The job I submit to the ncfs partition is a 32 node 8  tasks per node MPI job 
that just prints "Hello".
The ncfs partition has a higher priority than runatrisk which triggers the 
preemption.
I submit 4 jobs to runatrisk to fill up the nodes on compute-0-[0-31] and 
compute-1-[0-31], the only place that a 32 node job can run.
[marcin@ht0 mpi-test]$ squeue
             JOBID PARTITION     NAME     USER  ST       TIME  NODES 
NODELIST(REASON)
             20817      ncfs ncfs_hel   marcin  PD       0:00     32 (Resources)
             20813 runatrisk runatris   marcin  CG       2:06     32 
compute-0-[0-31]
             20816 runatrisk runatris   marcin   R       0:28     32 
compute-1-[0-31]
             20815 runatrisk runatris   marcin   R       0:29     32 
compute-1-[0-31]
             20814 runatrisk runatris   marcin   R       0:30     32 
compute-0-[0-31]
>> below is the strange behavior, slurm decided to kill 2 of the runatrisk jobs 
>> and one is on compute-0* and the other on compute-1*, in the end the ncfs 
>> job ends up running on compute-0*, it shouldn't have killed that runatrisk 
>> job on compute-1*. Does anyone know why this happens?
JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
             20817      ncfs ncfs_hel   marcin  PD       0:00     32 (Resources)
             20815 runatrisk runatris   marcin  CG       1:09     32 
compute-1-[0-31]
             20813 runatrisk runatris   marcin  CG       2:06      5 
compute-0-[0-1,9,17,25]
             20816 runatrisk runatris   marcin   R       1:10     32 
compute-1-[0-31]
             20814 runatrisk runatris   marcin   R       1:12     32 
compute-0-[0-31]

[marcin@ht0 mpi-test]$ squeue
             JOBID PARTITION     NAME     USER  ST       TIME  NODES 
NODELIST(REASON)
             20817      ncfs ncfs_hel   marcin   R       0:01     32 
compute-0-[0-31]
             20815 runatrisk runatris   marcin  CG       1:09      5 
compute-1-[0-1,9,17,25]
             20816 runatrisk runatris   marcin   R       1:33     32 
compute-1-[0-31]
             20814 runatrisk runatris   marcin   R       1:35     32 
compute-0-[0-31]
[marcin@ht0 mpi-test]$ squeue
             JOBID PARTITION     NAME     USER  ST       TIME  NODES 
NODELIST(REASON)
             20816 runatrisk runatris   marcin   R       3:29     32 
compute-1-[0-31]
             20814 runatrisk runatris   marcin   R       3:31     32 
compute-0-[0-31]

Thanks,
Marcin

Reply via email to