...along those lines, we have a similar setup where we have nodes that have
priority access for some and a general purpose partition for hoi palloi.
Not entirely unlike your goal, just with different names.

As Lyn indicates, you can't do this by mixing a partition specification and
a QOS on the command line- doesn't work.  However, you can specify multiple
partitions and get a "first available" result (e.g. sbatch
--partition=limited,high,default).  Partition QOS's might be your answer to
enable this.  If you set up partitions "limited", "high", and "default"
each with their own QOS with the limits enforced in the QOS:

    PartitionName=limited QOS=limited ...
    PartitionName=high QOS=high ...
    PartitionName=default QOS=default ...

And set up a limited QOS:

    sacctmgr create qos limited MaxJobsPerUser=3 ...

and so on for the others.

It's roughly what we do.  Hope this helps.

M




On Fri, Apr 29, 2016 at 11:51 AM, Lyn Gerner <[email protected]>
wrote:

> Hi Matt,
>
> The current sbatch functionality can get you part of the way there. You
> can submit to --partition=<limited>,<default> and the job will run
> whereever it can first.  In your particular case (at NCCS), your --qos=high
> is only available in the default partition, so specifying it would render
> the job unable to run in the <limited> partition.
>
> Regards,
> Lyn
>
> On Fri, Apr 29, 2016 at 7:56 AM, Thompson, Matt[SCIENCE SYSTEMS AND
> APPLICATIONS INC] <[email protected]> wrote:
>
>>
>> SLURM Devs,
>>
>> This is probably a FAQ whose answer is "nope" but my search-fu has failed
>> me. We recently had a need to think about something. This is going to be a
>> generic experiment because I don't want to have to remember all the details
>> of the real names of qos, etc.
>>
>> Namely, on our cluster, lets say we have three ways to run:
>>
>>   1. --partition=limited
>>   2. --qos=high
>>   3. Default
>>
>> Number one is a partition that not many can submit to, is a dedicated
>> chunk of the cluster, but one can only run 3 jobs in it.
>>
>> Number two is a qos with a high priority in the "general" "default"
>> partition of the machine. This might have a limit on number of jobs (let's
>> say 6, though I don't know if there is a limit) so people don't abuse it.
>>
>> Number three is when you just sbatch and get whatever the default is.
>>
>>
>> Obviously, #1 is the gold standard, run until you limit out; #2 is
>> better, and #3 is least attractive.
>>
>> Now, we have a situation where an experiment needs to run, say 12 jobs
>> that take 3 hours each. If we had our druthers, we'd submit all 12 to #1
>> and all 12 would launch at once. Can't do that. You get only 3 in. So now
>> go to #2, only get 6 in (assuming the general cluster partition isn't
>> full). If you limit out of #2, then fall over to #3.
>>
>> I think you get what I want. I'd love to have a single sbatch call that
>> says:
>>
>>   Take this job and submit such that it runs under #1,  #2,  #3, and
>>   whatever can take it first wins.
>>
>> In our case, I can see 3 perhaps getting in right away into #1, a few
>> more a bit later in #2 and then the next ones maybe when #1 is free again,
>> or perhaps #3... I know the --constraint has a nice OR operator, but I'm
>> not sure anything else does.
>>
>>
>> Now, one way we can think to do this (since I don't know if you can do
>> the above) is to submit 12 jobs to *each* queue-config possibility and then
>> underneath, have a lockfile-managed script that holds a MasterList of all
>> the possible jobs. If someone manages to get an allocation, that one pops a
>> job off the MasterList, now there are 11 left, and so on.
>>
>> Once the MasterList is empty (aka all jobs run or running), you could
>> then clean up all the queued jobs that never will run anything useful (and
>> if they get an allocation, the empty MasterList would just return the
>> allocation immediately).
>>
>> We have experience with this lock and masterlist (for other purposes), so
>> we can do it, but as I said, it'd be nice if we could make one big meta
>> sbatch call. Because it's nice to only have 12 jobs in the queue instead of
>> 36 :)
>>
>> Matt
>> --
>> Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
>> NASA GSFC,    Global Modeling and Assimilation Office
>> Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
>> Phone: 301-614-6712                 Fax: 301-614-6246
>> http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
>>
>
>

Reply via email to