Title: Re: [slurm-dev] Re: Dynamic, small partition?
I believe GrpNodes=20 is what you're looking for.
from the man page https://slurm.schedmd.com/sacctmgr.html

GrpNodes=
Maximum number of nodes running jobs are able to be allocated in aggregate for this association and all associations which are children of this association.

Hope that helps,
--Dani_L.

On 24/07//2017 15:02, Steffen Grunewald wrote:
On Wed, 2017-07-19 at 09:52:37 -0600, Nicholas McCollum wrote:
You could try a QoS with
Flags=DenyOnLimit,OverPartQOS,PartitionTimeLimit Priority=<number> 

Depending on how you have your accounting set up, you could tweak some
of the GrpTRES, MaxTRES, MaxTRESPU and MaxJobsPU to try to limit
resource usage down to your 20 node limit.  I'm not sure, off the top
of my head, how to define a hard limit for max nodes that a QoS can
use.  

You could use accounting to prevent unauthorized users from submitting
to that QoS.

If the QoS isn't going to be used often, you could have only one job
running in that QoS at a time, and use job_submit.lua to set the
MaxNodes submitted at job submission to 20.

if string.match(job_desc.qos, "special") then
  job_desc.max_nodes = 20
end


Just a couple idea's for you, there's probably a way better way to do
it!
Thanks for hurling such a long list of ideas at me.

Since there's already a too long list of QoSes, I decided to have a look
at the job_submit.lua path first.

Only to discover that our installation doesn't know about that plugin,
and that it would take me the better part of a few days to find out how
to overcome this.

Since our micro version is no longer available in the source archives,
I'd have to upgrade to the latest micro release first, which adds
unwanted complexity and the risk of having to scrub my vacation.

I'm not too fluent in Lua, and there seems to be no safe platform to
debug a job_submit.lua script.
Nevertheless, I'm looking into this.

Your suggestion would limit the number of nodes assigned to a single
job - that can be achieved by MaxNodes=20 in the Partition definition.
Actually I don't want to trim a job's size, not at submission time, and
not later.

I'm still in doubt a job_submit.lua script might fully handle my task:
It's not about limiting a job's resources, it's about limiting jobs
being allocated and run in a particular partition. So what I could do
at submit time: modify the "uhold" state of the job according to the
current usage of the special partition, and let the user retry. Would
I "see" this job again (in job_submit.lua)?
Documentation of what this plugin can do, and what it cannot, is rather
sparse, and this spontaneous request is only one item on my list. 

Perhaps I'm looking for the wrong kind of tool - what I'd need is a 
plugin that decides whether a job is eligible for allocation at all,
and so far I've been unsuccessful writing search queries to find such
a plugin. Does it exist?

Thanks,
 Steffen

-- 
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority

On Wed, 2017-07-19 at 09:12 -0600, Steffen Grunewald wrote:
Is it possible to define, and use, a subset of all available nodes in
a
partition *without* explicitly setting aside a number of nodes (in a
static
Nodes=... definition)?
Let's say I want, starting with a 100-node cluster, make 20 nodes
available
for jobs needing an extended MaxTime and Priority (compared with the
defaults)
 - and if these 20 nodes have been allocated, no more nodes will be
available
to jobs  submitted to this particular partition, but the 20 nodes may
cover
a subset of all nodes changing over time (as it will not be in use
very often)?

Can this be done with Slurm's built-in functionality, 15.08.8 or
later?
Any pointers are welcome...

Thanks,
 S

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to