Re: [slurm-dev] Partition-based preemption?

Alan Orth Thu, 04 Aug 2011 03:55:58 -0700

Ah!  Thanks for getting back to me on this.  You are indeed right, I
now see that this does in fact work if the job to be
preempted/requeued is submitted with sbatch.  I admit I was confused
on the different job submission tools (srun, salloc, sbatch)...


Thanks again,

Alan

On Tue, Aug 2, 2011 at 8:34 PM,  <[email protected]> wrote:
> Only batch jobs can be requeued, your salloc job would need to be killed.
>
> When the allocation is killed, that should kill all of the processes on the
> compute nodes (as identified using SLURM's Proctrack plugin), but the salloc
> command (running on a login node) would not be killed. If your salloc
> command isn't spawning anything on the compute nodes using srun, there would
> be no processes to kill.
>
>
> Quoting Alan Orth <[email protected]>:
>
>> Ok, either I'm missing something, or I've hit a bug... In a test
>> cluster with 4 CPUs and two partitions, "low" and "high"...
>>
>> $ salloc -n4 -p low openssl speed
>> salloc: Granted job allocation 14
>>
>> $ salloc -n4 -p high openssl speed
>> salloc: Pending job allocation 15
>> salloc: job 15 queued and waiting for resources
>> salloc: job 15 has been allocated resources
>> salloc: Granted job allocation 15
>>
>> After submitting the high-priority job I see this printed in the
>> slurmctld log file:
>> "preempted job 14 had to be killed"
>>
>> But the "killed" job keeps running (even though its allocation is
>> revoked).  What gives?
>>
>> $ scontrol show partitions
>> PartitionName=low
>>   AllocNodes=ALL AllowGroups=ALL Default=NO
>>   DefaultTime=NONE DisableRootJobs=NO Hidden=NO
>>   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1
>>   Nodes=noma
>>   Priority=10 RootOnly=NO Shared=NO PreemptMode=REQUEUE
>>   State=UP TotalCPUs=4 TotalNodes=1
>>
>> PartitionName=high
>>   AllocNodes=ALL AllowGroups=ALL Default=NO
>>   DefaultTime=NONE DisableRootJobs=NO Hidden=NO
>>   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1
>>   Nodes=noma
>>   Priority=20 RootOnly=NO Shared=NO PreemptMode=REQUEUE
>>   State=UP TotalCPUs=4 TotalNodes=1
>>
>> slurm.conf
>> PreemptMode             = REQUEUE
>> PreemptType             = preempt/partition_prio
>> ProctrackType           = proctrack/pgid
>> SchedulerType=sched/backfill
>> SelectType=select/linear
>>
>> Thanks!
>>
>> On 07/15/2011 06:07 PM, [email protected] wrote:
>>>
>>> Quoting Alan Orth <[email protected]>:
>>>
>>>> Moe,
>>>>
>>>> Thanks for the quick response.  I've just updated my configuration to
>>>> include some of your tips, but I'm still having problems.  I can
>>>> confirm the same behavior happens with the linear select plugin; with
>>>> "PreemptMode=REQUEUE" the resource allocation is revoked from the job
>>>> in lower-priority partition, but the job continues to run (consuming
>>>> CPU resources).
>>>
>>> Is the job state CG (completing)? If so, then the problem isn't in  the
>>> preemption logic, but in the configuration or communications  (i.e. the
>>> slurmd daemon on the compute nodes isn't doing what the  slurmctld (the
>>> slurm control daemon) is telling it to do).  Alternately it might not be
>>> finding all of the processes due to the  ProctractType configuration.If
>>> that's the case, the SlurmdLogFile  and SlurmctldLogFile should help to
>>> diagnose the problem. Running  "scontrol show config" will show you what all
>>> of these values are.
>>>
>>>
>>>> With "PreemptMode=SUSPEND,GANG" the job submitted in
>>>> the higher-priority partition simply waits until there are free slots.
>>>> The behavior doesn't seem to change with either select/linear or
>>>> select/cons_res.
>>>>
>>>> Again, relevant slurm.conf sections from my slurm 2.2.7 test
>>>> installation (on Ubuntu 11.04):
>>>>
>>>> SchedulerType=sched/backfill
>>>> SelectType=select/linear
>>>> PreemptType=preempt/partition_prio
>>>> NodeName=noma CoresPerSocket=4 ThreadsPerCore=1 Sockets=1 State=UNKNOWN
>>>> PartitionName=batch Nodes=noma Default=NO DefaultTime=INFINITE
>>>> MaxTime=INFINITE State=UP Priority=10 Shared=NO
>>>> PartitionName=interactive Nodes=noma Default=NO MaxTime=INFINITE
>>>> State=UP Priority=20 Shared=NO
>>>>
>>>> Regarding my previous use of the "Shared=Force:1" option in the
>>>> low-priority partition, I had specified it because the
>>>> documentation[1] mentions "By default the max_share value is 4. In
>>>> order to preempt jobs (and not gang schedule them), always set
>>>> max_share to 1."
>>>
>>> That is the correct configuration for preempting a job through
>>>  suspending it, but not if you want its resources to be relinquished  before
>>> starting another job on the same resources (i.e. with  PreemptMode=Cancel or
>>> Requeue). In the latter case, you need  Shared=NO.
>>>
>>>> Cheers and thanks,
>>>>
>>>> Alan
>>>>
>>>> [1] https://computing.llnl.gov/linux/slurm/preempt.html
>>>>
>>>> On Thu, Jul 14, 2011 at 6:15 PM, <[email protected]> wrote:
>>>>>
>>>>> Alan,
>>>>>
>>>>> I believe that you need "Shared=NO" for both partitions and  preemption
>>>>> modes
>>>>> PreemptMode=CANCEL or REQUEUE. For PreemptMode=Suspend, it seems to
>>>>> work
>>>>> fine for SelectType=select/linear, but not for
>>>>> SelectType=select/cons_res.
>>>>> I'll make a note of this bug in the select/cons_res plugin, but  I'm
>>>>> not sure
>>>>> when it will get fixed.
>>>>>
>>>>> Moe Jette
>>>>>
>>>>>
>>>>> Quoting Alan Orth <[email protected]>:
>>>>>
>>>>>> I'm having problems getting basic partition-based preemption working.
>>>>>> For testing purposes I've set up a cluster with 4 CPUs and two
>>>>>> partitions (each with different priorities). I can't figure out how to
>>>>>> get the higher-priority partition to preempt the lower-priority
>>>>>> partition.  This test configuration has 4 CPU slots.
>>>>>>
>>>>>> First, ask for 4 CPUs, in the batch partition.
>>>>>> $ salloc -n4 -p batch openssl speed
>>>>>> salloc: Granted job allocation 68
>>>>>> Doing md2 for 3s on 16 size blocks: 305643 md2's in 2.97s
>>>>>>
>>>>>> Second, ask for 4 CPUs, in the interactive partition:
>>>>>> $ salloc -n4 -p interactive openssl speed
>>>>>> salloc: Pending job allocation 71
>>>>>> salloc: job 71 queued and waiting for resources
>>>>>>
>>>>>> With PreemptMode=SUSPEND it will wait until the low-priority job
>>>>>> finishes (as shown above).  If PreemptMode=CANCEL or REQUEUE, the
>>>>>> low-priority job allocation is "revoked", but the job keeps running!!!
>>>>>> Have I misread or misunderstood something about Preemption in
>>>>>> partitions?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Here are the relevant configuration options I've set:
>>>>>>
>>>>>> From slurm.conf:
>>>>>> SchedulerType=sched/backfill
>>>>>> SelectType=select/cons_res
>>>>>> SelectTypeParameters=CR_CPU
>>>>>> PreemptMode=SUSPEND,GANG
>>>>>> PreemptType=preempt/partition_prio
>>>>>> NodeName=noma CoresPerSocket=4 ThreadsPerCore=1 Sockets=1
>>>>>> State=UNKNOWN
>>>>>> PartitionName=batch Nodes=noma Default=NO DefaultTime=INFINITE
>>>>>> MaxTime=INFINITE State=UP Priority=10 Shared=Force:1
>>>>>> PartitionName=interactive Nodes=noma Default=NO MaxTime=INFINITE
>>>>>> State=UP Priority=20 Shared=NO
>>>>>>
>>>>>> --
>>>>>> Alan Orth
>>>>>> [email protected]
>>>>>> http://alaninkenya.org
>>>>>> http://mjanja.co.ke
>>>>>> "You cannot simultaneously prevent and prepare for war." -Albert
>>>>>> Einstein
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Alan Orth
>>>> [email protected]
>>>> http://alaninkenya.org
>>>> http://mjanja.co.ke
>>>> "You cannot simultaneously prevent and prepare for war." -Albert
>>>> Einstein
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Alan Orth
>> [email protected]
>> http://alaninkenya.org
>> "I have always wished for my computer to be as easy to use as my
>> telephone; my wish has come true because I can no longer figure out how
>> to use my telephone." -Bjarne Stroustrup, inventor of C++
>
>
>
>



-- 
Alan Orth
[email protected]
http://alaninkenya.org
http://mjanja.co.ke
"In heaven all the interesting people are missing." -Friedrich Nietzsche

Re: [slurm-dev] Partition-based preemption?

Reply via email to