Keep in mind if you use GrpNodes and aren't doing whole node allocations
and if multiple jobs using the QOS land on the same node that node gets
counted multiple times in the limit. I would suggest using GrpCPU
instead if not always using whole node allocations.
Danny
On 10/30/2014 12:38 PM, Brown George Andrew wrote:
Hi,
To go into more specifics I'm wanting to be able to limit the number of nodes
or cores providing the ability to run jobs with a wall time up to one week, all
other nodes defaulting to 1 day. So I'd set say Q1 to have a
MaxWallDurationPerJob of 24 hours and set it as the default then add another
QOS with MaxWallDurationPerJob as a week and GrpNodes to N. Where this was
previously two partitions I would now have a single partition with a max wall
time of 1 week and two QOSes. In this case I'd want users to be able to do
exactly what you highlight as a bug.
For completeness I would also set the DenyOnLimit flag.
In your case perhaps the MaxNodes setting in sacctmgr may help? In 14.03
features were added which now allow you to more finely control which accounts
get used with partitions as well as QOSes, this may be of interest.
Kind regards,
George
________________________________________
From: Tingyang Xu [[email protected]]
Sent: 30 October 2014 19:42
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition
Hello George,
We do have the same issue now. I think the solution of QOS has bug. For
example,
Assume that Partition B allows two QOSes, Q1 and Q2. Then you set up
GrpNodes=10 on both Q1 and Q2. Then, the users can actually use 20 nodes if
they submit jobs to Q1 and Q2, respectively.
Best,
Tingyang Xu
-----Original Message-----
From: Brown George Andrew
Sent: Thursday, October 30, 2014 2:36 PM
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition
Thanks for the quick replies!
Indeed a QOS seems like what I want here. Sorry I was stuck thinking in
partitions and clearly was having some tunnel vision.
Cheers,
George
________________________________________
From: [email protected] [[email protected]]
Sent: 30 October 2014 19:08
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition
In addition to a QOS, an advanced reservation may also satisfy your needs:
http://slurm.schedmd.com/reservations.html
Quoting Ryan Cox <[email protected]>:
George,
Wouldn't a QOS with GrpNodes=10 accomplish that?
Ryan
On 10/30/2014 11:47 AM, Brown George Andrew wrote:
Hi,
I would like to have a partition of N nodes without statically
defining which nodes should belong to a partition and I'm trying to
work out the best way to achieve this.
Currently I have partitions which span across all the nodes in my
cluster with differing settings, but I would like some of these to
only occupy a subset of the cluster. I could say define partition A
which can use all nodes but partition B may only access nodes
01-10. But I would like avoid partition B being reduced in size in
the event of maintenance or hardware failure.
I'm thinking the way to do this would be via a plugin. I would keep
all partitions spanning all nodes in the cluster but upon
submission check how many nodes are in use on the requested
partition. If there were say already 10 nodes in use in partition B
the job should be queued. However things then get a bit more
complex as to when slurm should de-queue and then run the job.
Is there a native method to do this in slurm? Essentially I would
like something like the MaxNodes option that exists for partitions
today but have it limit the total number of nodes used by jobs
submitted to that partition rather than just a limit per job.
Many thanks,
George
--
Morris "Moe" Jette
CTO, SchedMD LLC