Ah! Thanks for getting back to me on this. You are indeed right, I now see that this does in fact work if the job to be preempted/requeued is submitted with sbatch. I admit I was confused on the different job submission tools (srun, salloc, sbatch)...
Thanks again, Alan On Tue, Aug 2, 2011 at 8:34 PM, <[email protected]> wrote: > Only batch jobs can be requeued, your salloc job would need to be killed. > > When the allocation is killed, that should kill all of the processes on the > compute nodes (as identified using SLURM's Proctrack plugin), but the salloc > command (running on a login node) would not be killed. If your salloc > command isn't spawning anything on the compute nodes using srun, there would > be no processes to kill. > > > Quoting Alan Orth <[email protected]>: > >> Ok, either I'm missing something, or I've hit a bug... In a test >> cluster with 4 CPUs and two partitions, "low" and "high"... >> >> $ salloc -n4 -p low openssl speed >> salloc: Granted job allocation 14 >> >> $ salloc -n4 -p high openssl speed >> salloc: Pending job allocation 15 >> salloc: job 15 queued and waiting for resources >> salloc: job 15 has been allocated resources >> salloc: Granted job allocation 15 >> >> After submitting the high-priority job I see this printed in the >> slurmctld log file: >> "preempted job 14 had to be killed" >> >> But the "killed" job keeps running (even though its allocation is >> revoked). What gives? >> >> $ scontrol show partitions >> PartitionName=low >> AllocNodes=ALL AllowGroups=ALL Default=NO >> DefaultTime=NONE DisableRootJobs=NO Hidden=NO >> MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 >> Nodes=noma >> Priority=10 RootOnly=NO Shared=NO PreemptMode=REQUEUE >> State=UP TotalCPUs=4 TotalNodes=1 >> >> PartitionName=high >> AllocNodes=ALL AllowGroups=ALL Default=NO >> DefaultTime=NONE DisableRootJobs=NO Hidden=NO >> MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 >> Nodes=noma >> Priority=20 RootOnly=NO Shared=NO PreemptMode=REQUEUE >> State=UP TotalCPUs=4 TotalNodes=1 >> >> slurm.conf >> PreemptMode = REQUEUE >> PreemptType = preempt/partition_prio >> ProctrackType = proctrack/pgid >> SchedulerType=sched/backfill >> SelectType=select/linear >> >> Thanks! >> >> On 07/15/2011 06:07 PM, [email protected] wrote: >>> >>> Quoting Alan Orth <[email protected]>: >>> >>>> Moe, >>>> >>>> Thanks for the quick response. I've just updated my configuration to >>>> include some of your tips, but I'm still having problems. I can >>>> confirm the same behavior happens with the linear select plugin; with >>>> "PreemptMode=REQUEUE" the resource allocation is revoked from the job >>>> in lower-priority partition, but the job continues to run (consuming >>>> CPU resources). >>> >>> Is the job state CG (completing)? If so, then the problem isn't in the >>> preemption logic, but in the configuration or communications (i.e. the >>> slurmd daemon on the compute nodes isn't doing what the slurmctld (the >>> slurm control daemon) is telling it to do). Alternately it might not be >>> finding all of the processes due to the ProctractType configuration.If >>> that's the case, the SlurmdLogFile and SlurmctldLogFile should help to >>> diagnose the problem. Running "scontrol show config" will show you what all >>> of these values are. >>> >>> >>>> With "PreemptMode=SUSPEND,GANG" the job submitted in >>>> the higher-priority partition simply waits until there are free slots. >>>> The behavior doesn't seem to change with either select/linear or >>>> select/cons_res. >>>> >>>> Again, relevant slurm.conf sections from my slurm 2.2.7 test >>>> installation (on Ubuntu 11.04): >>>> >>>> SchedulerType=sched/backfill >>>> SelectType=select/linear >>>> PreemptType=preempt/partition_prio >>>> NodeName=noma CoresPerSocket=4 ThreadsPerCore=1 Sockets=1 State=UNKNOWN >>>> PartitionName=batch Nodes=noma Default=NO DefaultTime=INFINITE >>>> MaxTime=INFINITE State=UP Priority=10 Shared=NO >>>> PartitionName=interactive Nodes=noma Default=NO MaxTime=INFINITE >>>> State=UP Priority=20 Shared=NO >>>> >>>> Regarding my previous use of the "Shared=Force:1" option in the >>>> low-priority partition, I had specified it because the >>>> documentation[1] mentions "By default the max_share value is 4. In >>>> order to preempt jobs (and not gang schedule them), always set >>>> max_share to 1." >>> >>> That is the correct configuration for preempting a job through >>> suspending it, but not if you want its resources to be relinquished before >>> starting another job on the same resources (i.e. with PreemptMode=Cancel or >>> Requeue). In the latter case, you need Shared=NO. >>> >>>> Cheers and thanks, >>>> >>>> Alan >>>> >>>> [1] https://computing.llnl.gov/linux/slurm/preempt.html >>>> >>>> On Thu, Jul 14, 2011 at 6:15 PM, <[email protected]> wrote: >>>>> >>>>> Alan, >>>>> >>>>> I believe that you need "Shared=NO" for both partitions and preemption >>>>> modes >>>>> PreemptMode=CANCEL or REQUEUE. For PreemptMode=Suspend, it seems to >>>>> work >>>>> fine for SelectType=select/linear, but not for >>>>> SelectType=select/cons_res. >>>>> I'll make a note of this bug in the select/cons_res plugin, but I'm >>>>> not sure >>>>> when it will get fixed. >>>>> >>>>> Moe Jette >>>>> >>>>> >>>>> Quoting Alan Orth <[email protected]>: >>>>> >>>>>> I'm having problems getting basic partition-based preemption working. >>>>>> For testing purposes I've set up a cluster with 4 CPUs and two >>>>>> partitions (each with different priorities). I can't figure out how to >>>>>> get the higher-priority partition to preempt the lower-priority >>>>>> partition. This test configuration has 4 CPU slots. >>>>>> >>>>>> First, ask for 4 CPUs, in the batch partition. >>>>>> $ salloc -n4 -p batch openssl speed >>>>>> salloc: Granted job allocation 68 >>>>>> Doing md2 for 3s on 16 size blocks: 305643 md2's in 2.97s >>>>>> >>>>>> Second, ask for 4 CPUs, in the interactive partition: >>>>>> $ salloc -n4 -p interactive openssl speed >>>>>> salloc: Pending job allocation 71 >>>>>> salloc: job 71 queued and waiting for resources >>>>>> >>>>>> With PreemptMode=SUSPEND it will wait until the low-priority job >>>>>> finishes (as shown above). If PreemptMode=CANCEL or REQUEUE, the >>>>>> low-priority job allocation is "revoked", but the job keeps running!!! >>>>>> Have I misread or misunderstood something about Preemption in >>>>>> partitions? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Here are the relevant configuration options I've set: >>>>>> >>>>>> From slurm.conf: >>>>>> SchedulerType=sched/backfill >>>>>> SelectType=select/cons_res >>>>>> SelectTypeParameters=CR_CPU >>>>>> PreemptMode=SUSPEND,GANG >>>>>> PreemptType=preempt/partition_prio >>>>>> NodeName=noma CoresPerSocket=4 ThreadsPerCore=1 Sockets=1 >>>>>> State=UNKNOWN >>>>>> PartitionName=batch Nodes=noma Default=NO DefaultTime=INFINITE >>>>>> MaxTime=INFINITE State=UP Priority=10 Shared=Force:1 >>>>>> PartitionName=interactive Nodes=noma Default=NO MaxTime=INFINITE >>>>>> State=UP Priority=20 Shared=NO >>>>>> >>>>>> -- >>>>>> Alan Orth >>>>>> [email protected] >>>>>> http://alaninkenya.org >>>>>> http://mjanja.co.ke >>>>>> "You cannot simultaneously prevent and prepare for war." -Albert >>>>>> Einstein >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Alan Orth >>>> [email protected] >>>> http://alaninkenya.org >>>> http://mjanja.co.ke >>>> "You cannot simultaneously prevent and prepare for war." -Albert >>>> Einstein >>>> >>> >>> >>> >> >> >> -- >> Alan Orth >> [email protected] >> http://alaninkenya.org >> "I have always wished for my computer to be as easy to use as my >> telephone; my wish has come true because I can no longer figure out how >> to use my telephone." -Bjarne Stroustrup, inventor of C++ > > > > -- Alan Orth [email protected] http://alaninkenya.org http://mjanja.co.ke "In heaven all the interesting people are missing." -Friedrich Nietzsche
