Hi,

I open this new thread about a couple of problems with 2.2-5:

1. (Minor) in slurm.conf if you want to specify QOS as preemption mode and
"suspend,gang"  as type you get an error, but it is supposed that it can
work, isnt?

2. In 2.2.5 QOS suspend preemption does not get the correct behavior
using SelectType=select/linear:

         3      ib-default            mdrun_mpi     user2  R       1:21
 2         16 node-ib-[1-2]             node-ib-[1-2]
         2      ib-default            mdrun_mpi     user1  R       1:40
 2         16 node-ib-[1-2]             node-ib-[1-2]

 since I get booth jobs running at the same time (overlap) without any
suspend (S), partition config is as follows:

PartitionName=ib-default         Nodes=node-ib-[1-2]   Default=YES
 Shared=FORCE:1  MaxTime=INFINITE  State=UP  AllowGroups=ALL

QOS is:
Name   Priority    Preempt PreemptMode
---------- ----------     ----------    -----------
normal        50                    suspend
 chem         90     normal     cluster

User1 is associated to "normal" and user2 to "chem", so user2 is supposed to
preempt user1 with suspend.
If I use QOS PreemptMode=requeue the job is killed but not requeued, the
only that looks to work is QOS PreemptMode=cancel.

Worst even, if I use SelectType=select/cons_res, none of the preemptions
works and the jobs remain the queue waiting. This maybe related with the
bug:

http://groups.google.com/group/slurm-devel/browse_thread/thread/c1103aceb989a511

Any solution to this behaviors? am I doing something wrong? I would really
like to be able to preempt other queues using suspend.

Thanks,
Daniel


2011/5/13 Daniel Adriano Silva M <[email protected]>

> Hi,
>
> So I found the first problem to enable QOS and suspend. Although the online
> manual says QOS is compatible with suspend (
> https://computing.llnl.gov/linux/slurm/preempt.html) in the practice it
> does not work, since slurmd outputs:
>
> scontrol: fatal: PreemptType and PreemptMode values incompatible
>
> Is this a kind of bug present in  2.2.5-1 or is a true limitation and
> therefore the online reference is wrong?
>
> Thanks,
> Daniel
>
> 2011/5/12 Jette, Moe <[email protected]>
>
>> Danny is correct with respect to partition-based preemption rules.
>>
>> You can specify the preemption mechanism used for each partition,
>> but they are enforced using a simple ordering by priority. QOS-based
>> preemption is much more flexible, but requires the use of a slurmdbd
>> (database daemon and database).
>> ________________________________________
>> From: [email protected] [[email protected]] On
>> Behalf Of Danny Auble [[email protected]]
>> Sent: Thursday, May 12, 2011 7:53 AM
>> To: [email protected]
>> Subject: Re: [slurm-dev] Complex configuration of partition preemption?
>>
>> Hey Daniel,
>>
>> > Hi,
>> >
>> > Two Questions:
>> > 1. I would like to ask you if it is possible to create the next kind of
>> > configuration:
>> >
>> > PARTITION-1 priority=90
>> > PARTITION-2 priority=50
>> > PARTITION-3 priority=10
>> >
>> > What I want to get here is that PARTITION-1 can preempt any of
>> PARTITION-2
>> > or PARTITION-3. But PARTITION-2 must not be able to preempt partition-3.
>> >
>> > Any clues on how to get this setup, previously I tried to use something
>> > like:
>> >
>> > PARTITION-3 PreemptMode=suspend
>> >
>> > But that of course causes that PARTITION-3 can be preempted by any of
>> the
>> > other two partitions.
>>
>> I don't think you can do this with the partition priority, but this kind
>> of thing will work with QOS.  You might give that a try.  It will give you a
>> bunch more flexibility as well.
>>
>> >
>> > 2. Is there any sacctmgr way to limit the total consumed wall-time? Some
>> new
>> > users will be added to our cluster, but we want to add these new
>> accounts
>> > with a limit on total wall-time usage, then when the user/group reach
>> the
>> > limit we want to prevent it from executing more jobs. Is it possible to
>> do
>> > this with slurm?
>>
>> GrpWall.  If you use that on a user association the group will just be
>> that one user, so all their jobs will aggregate together.  If you use it on
>> an account all the user accounts inside that account and all its subaccounts
>> will be aggregated.
>>
>> You have to use the priority/multifactor plugin to make it work.  see man
>> sacctmgr
>>
>> Danny
>>
>> >
>> >
>> > Thanks,
>> > Daniel
>> >
>>
>>
>>
>

Reply via email to