Ah I hadn't considered that, yes we'd want GrpCPU in that case. Thanks for 
pointing this out Danny. 

Kind regards,
George

Sent from my iPhone

> On 30 Oct 2014, at 21:35, Danny Auble <[email protected]> wrote:
> 
> 
> Keep in mind if you use GrpNodes and aren't doing whole node allocations and 
> if multiple jobs using the QOS land on the same node that node gets counted 
> multiple times in the limit.  I would suggest using GrpCPU instead if not 
> always using whole node allocations.
> 
> Danny
> 
>> On 10/30/2014 12:38 PM, Brown George Andrew wrote:
>> Hi,
>> 
>> To go into more specifics I'm wanting to be able to limit the number of 
>> nodes or cores providing the ability to run jobs with a wall time up to one 
>> week, all other nodes defaulting to 1 day. So I'd set say Q1 to have a 
>> MaxWallDurationPerJob of 24 hours and set it as the default then add another 
>> QOS with MaxWallDurationPerJob as a week and GrpNodes to N. Where this was 
>> previously two partitions I would now have a single partition with a max 
>> wall time of 1 week and two QOSes. In this case I'd want users to be able to 
>> do exactly what you highlight as a bug.
>> 
>> For completeness I would also set the DenyOnLimit flag.
>> 
>> In your case perhaps the MaxNodes setting in sacctmgr may help? In 14.03 
>> features were added which now allow you to more finely control which 
>> accounts get used with partitions as well as QOSes, this may be of interest.
>> 
>> Kind regards,
>> George
>> ________________________________________
>> From: Tingyang Xu [[email protected]]
>> Sent: 30 October 2014 19:42
>> To: slurm-dev
>> Subject: [slurm-dev] Re: Non static partition definition
>> 
>> Hello George,
>> We do have the same issue now. I think the solution of QOS has bug. For
>> example,
>> Assume that Partition B allows two QOSes, Q1 and Q2. Then you set up
>> GrpNodes=10 on both Q1 and Q2. Then, the users can actually use 20 nodes if
>> they submit jobs to Q1 and Q2, respectively.
>> 
>> Best,
>> Tingyang Xu
>> 
>> -----Original Message-----
>> From: Brown George Andrew
>> Sent: Thursday, October 30, 2014 2:36 PM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: Non static partition definition
>> 
>> 
>> Thanks for the quick replies!
>> 
>> Indeed a QOS seems like what I want here. Sorry I was stuck thinking in
>> partitions and clearly was having some tunnel vision.
>> 
>> Cheers,
>> George
>> ________________________________________
>> From: [email protected] [[email protected]]
>> Sent: 30 October 2014 19:08
>> To: slurm-dev
>> Subject: [slurm-dev] Re: Non static partition definition
>> 
>> In addition to a QOS, an advanced reservation may also satisfy your needs:
>> http://slurm.schedmd.com/reservations.html
>> 
>> Quoting Ryan Cox <[email protected]>:
>> 
>>> George,
>>> 
>>> Wouldn't a QOS with GrpNodes=10 accomplish that?
>>> 
>>> Ryan
>>> 
>>>> On 10/30/2014 11:47 AM, Brown George Andrew wrote:
>>>> Hi,
>>>> 
>>>> I would like to have a partition of N nodes without statically
>>>> defining which nodes should belong to a partition and I'm trying to
>>>> work out the best way to achieve this.
>>>> 
>>>> Currently I have partitions which span across all the nodes in my
>>>> cluster with differing settings, but I would like some of these to
>>>> only occupy a subset of the cluster. I could say define partition A
>>>> which can use all nodes but partition B may only access nodes
>>>> 01-10. But I would like avoid partition B being reduced in size in
>>>> the event of maintenance or hardware failure.
>>>> 
>>>> I'm thinking the way to do this would be via a plugin. I would keep
>>>> all partitions spanning all nodes in the cluster but upon
>>>> submission check how many nodes are in use on the requested
>>>> partition. If there were say already 10 nodes in use in partition B
>>>> the job should be queued. However things then get a bit more
>>>> complex as to when slurm should de-queue and then run the job.
>>>> 
>>>> Is there a native method to do this in slurm? Essentially I would
>>>> like something like the MaxNodes option that exists for partitions
>>>> today but have it limit the total number of nodes used by jobs
>>>> submitted to that partition rather than just a limit per job.
>>>> 
>>>> Many thanks,
>>>> George
>> 
>> --
>> Morris "Moe" Jette
>> CTO, SchedMD LLC

Reply via email to