Alan,
I believe that you need "Shared=NO" for both partitions and preemption
modes PreemptMode=CANCEL or REQUEUE. For PreemptMode=Suspend, it seems
to work fine for SelectType=select/linear, but not for
SelectType=select/cons_res. I'll make a note of this bug in the
select/cons_res plugin, but I'm not sure when it will get fixed.
Moe Jette
Quoting Alan Orth <[email protected]>:
I'm having problems getting basic partition-based preemption working.
For testing purposes I've set up a cluster with 4 CPUs and two
partitions (each with different priorities). I can't figure out how to
get the higher-priority partition to preempt the lower-priority
partition. This test configuration has 4 CPU slots.
First, ask for 4 CPUs, in the batch partition.
$ salloc -n4 -p batch openssl speed
salloc: Granted job allocation 68
Doing md2 for 3s on 16 size blocks: 305643 md2's in 2.97s
Second, ask for 4 CPUs, in the interactive partition:
$ salloc -n4 -p interactive openssl speed
salloc: Pending job allocation 71
salloc: job 71 queued and waiting for resources
With PreemptMode=SUSPEND it will wait until the low-priority job
finishes (as shown above). If PreemptMode=CANCEL or REQUEUE, the
low-priority job allocation is "revoked", but the job keeps running!!!
Have I misread or misunderstood something about Preemption in
partitions?
Thanks!
Here are the relevant configuration options I've set:
From slurm.conf:
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
PreemptMode=SUSPEND,GANG
PreemptType=preempt/partition_prio
NodeName=noma CoresPerSocket=4 ThreadsPerCore=1 Sockets=1 State=UNKNOWN
PartitionName=batch Nodes=noma Default=NO DefaultTime=INFINITE
MaxTime=INFINITE State=UP Priority=10 Shared=Force:1
PartitionName=interactive Nodes=noma Default=NO MaxTime=INFINITE
State=UP Priority=20 Shared=NO
--
Alan Orth
[email protected]
http://alaninkenya.org
http://mjanja.co.ke
"You cannot simultaneously prevent and prepare for war." -Albert Einstein