Ah I hadn't considered that, yes we'd want GrpCPU in that case. Thanks for pointing this out Danny.
Kind regards, George Sent from my iPhone > On 30 Oct 2014, at 21:35, Danny Auble <[email protected]> wrote: > > > Keep in mind if you use GrpNodes and aren't doing whole node allocations and > if multiple jobs using the QOS land on the same node that node gets counted > multiple times in the limit. I would suggest using GrpCPU instead if not > always using whole node allocations. > > Danny > >> On 10/30/2014 12:38 PM, Brown George Andrew wrote: >> Hi, >> >> To go into more specifics I'm wanting to be able to limit the number of >> nodes or cores providing the ability to run jobs with a wall time up to one >> week, all other nodes defaulting to 1 day. So I'd set say Q1 to have a >> MaxWallDurationPerJob of 24 hours and set it as the default then add another >> QOS with MaxWallDurationPerJob as a week and GrpNodes to N. Where this was >> previously two partitions I would now have a single partition with a max >> wall time of 1 week and two QOSes. In this case I'd want users to be able to >> do exactly what you highlight as a bug. >> >> For completeness I would also set the DenyOnLimit flag. >> >> In your case perhaps the MaxNodes setting in sacctmgr may help? In 14.03 >> features were added which now allow you to more finely control which >> accounts get used with partitions as well as QOSes, this may be of interest. >> >> Kind regards, >> George >> ________________________________________ >> From: Tingyang Xu [[email protected]] >> Sent: 30 October 2014 19:42 >> To: slurm-dev >> Subject: [slurm-dev] Re: Non static partition definition >> >> Hello George, >> We do have the same issue now. I think the solution of QOS has bug. For >> example, >> Assume that Partition B allows two QOSes, Q1 and Q2. Then you set up >> GrpNodes=10 on both Q1 and Q2. Then, the users can actually use 20 nodes if >> they submit jobs to Q1 and Q2, respectively. >> >> Best, >> Tingyang Xu >> >> -----Original Message----- >> From: Brown George Andrew >> Sent: Thursday, October 30, 2014 2:36 PM >> To: slurm-dev >> Subject: [slurm-dev] Re: Non static partition definition >> >> >> Thanks for the quick replies! >> >> Indeed a QOS seems like what I want here. Sorry I was stuck thinking in >> partitions and clearly was having some tunnel vision. >> >> Cheers, >> George >> ________________________________________ >> From: [email protected] [[email protected]] >> Sent: 30 October 2014 19:08 >> To: slurm-dev >> Subject: [slurm-dev] Re: Non static partition definition >> >> In addition to a QOS, an advanced reservation may also satisfy your needs: >> http://slurm.schedmd.com/reservations.html >> >> Quoting Ryan Cox <[email protected]>: >> >>> George, >>> >>> Wouldn't a QOS with GrpNodes=10 accomplish that? >>> >>> Ryan >>> >>>> On 10/30/2014 11:47 AM, Brown George Andrew wrote: >>>> Hi, >>>> >>>> I would like to have a partition of N nodes without statically >>>> defining which nodes should belong to a partition and I'm trying to >>>> work out the best way to achieve this. >>>> >>>> Currently I have partitions which span across all the nodes in my >>>> cluster with differing settings, but I would like some of these to >>>> only occupy a subset of the cluster. I could say define partition A >>>> which can use all nodes but partition B may only access nodes >>>> 01-10. But I would like avoid partition B being reduced in size in >>>> the event of maintenance or hardware failure. >>>> >>>> I'm thinking the way to do this would be via a plugin. I would keep >>>> all partitions spanning all nodes in the cluster but upon >>>> submission check how many nodes are in use on the requested >>>> partition. If there were say already 10 nodes in use in partition B >>>> the job should be queued. However things then get a bit more >>>> complex as to when slurm should de-queue and then run the job. >>>> >>>> Is there a native method to do this in slurm? Essentially I would >>>> like something like the MaxNodes option that exists for partitions >>>> today but have it limit the total number of nodes used by jobs >>>> submitted to that partition rather than just a limit per job. >>>> >>>> Many thanks, >>>> George >> >> -- >> Morris "Moe" Jette >> CTO, SchedMD LLC
