Hello George,
We do have the same issue now. I think the solution of QOS has bug. For example, Assume that Partition B allows two QOSes, Q1 and Q2. Then you set up GrpNodes=10 on both Q1 and Q2. Then, the users can actually use 20 nodes if they submit jobs to Q1 and Q2, respectively.

Best,
Tingyang Xu

-----Original Message----- From: Brown George Andrew
Sent: Thursday, October 30, 2014 2:36 PM
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition


Thanks for the quick replies!

Indeed a QOS seems like what I want here. Sorry I was stuck thinking in partitions and clearly was having some tunnel vision.

Cheers,
George
________________________________________
From: [email protected] [[email protected]]
Sent: 30 October 2014 19:08
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition

In addition to a QOS, an advanced reservation may also satisfy your needs:
http://slurm.schedmd.com/reservations.html

Quoting Ryan Cox <[email protected]>:

George,

Wouldn't a QOS with GrpNodes=10 accomplish that?

Ryan

On 10/30/2014 11:47 AM, Brown George Andrew wrote:
Hi,

I would like to have a partition of N nodes without statically
defining which nodes should belong to a partition and I'm trying to
work out the best way to achieve this.

Currently I have partitions which span across all the nodes in my
cluster with differing settings, but I would like some of these to
only occupy a subset of the cluster. I could say define partition A
which can use all nodes but partition B may only access nodes
01-10. But I would like avoid partition B being reduced in size in
the event of maintenance or hardware failure.

I'm thinking the way to do this would be via a plugin. I would keep
all partitions spanning all nodes in the cluster but upon
submission check how many nodes are in use on the requested
partition. If there were say already 10 nodes in use in partition B
the job should be queued. However things then get a bit more
complex as to when slurm should de-queue and then run the job.

Is there a native method to do this in slurm? Essentially I would
like something like the MaxNodes option that exists for partitions
today but have it limit the total number of nodes used by jobs
submitted to that partition rather than just a limit per job.

Many thanks,
George


--
Morris "Moe" Jette
CTO, SchedMD LLC

Reply via email to