Daniel, Could you provide more information on the project's needs?
A QOS could be configured with a generous priority and limits so that the project cannot dominate the partition; Reservations could be used too, but you'd need to define at a minimum a start time and duration - and when not in use the hardware would be idle and unavailable to other users. John DeSantis 2015-11-19 13:31 GMT-05:00 Daniel Letai <[email protected]>: > > The other issue is how to define the "public" partition. It would also have > to float, with lower priority, or else how would you achieve exclusivity of > "special" on the 5node float? > > --Dani_L. > > > On 11/19/2015 06:10 PM, Paul Edmon wrote: >> >> >> Yeah, I guess QoS won't really work for overflow. I was more thinking of >> the QoS as a way to create a floating partition of 5 nodes with the rest >> being in the public queue. They would send jobs to the QoS to hit that and >> then when it is full they would submit to public as normal. That's at least >> my thinking, but it's less seamless to the users as they will have to >> consciously monitor what is going on. >> >> -Paul Edmon- >> >> On 11/19/2015 10:50 AM, Daniel Letai wrote: >>> >>> >>> Can you elaborate a little? I'm not sure what kind of QoS will help, nor >>> how to implement one that will satisfy the requirements. >>> >>> On 11/19/2015 04:52 PM, Paul Edmon wrote: >>>> >>>> >>>> You might consider a QoS for this. It may not do everything you want >>>> but it will give you the flexibility. >>>> >>>> -Paul Edmon- >>>> >>>> On 11/19/2015 04:49 AM, Daniel Letai wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> Suppose I have a 100 node cluster with ~5% nodes down at any given time >>>>> (maintanence/hw failure/...). >>>>> >>>>> One of the projects requires exclusive use of 5 nodes, and be able to >>>>> use entire cluster when available (when other projects aren't running). >>>>> >>>>> I can do this easily if I maintain a static list of the exclusive nodes >>>>> in slurm.conf: >>>>> >>>>> PartitionName=public Nodes=tux0[01-95] Default=YES >>>>> PartitionName=special Nodes=tux[001-100] Default=NO >>>>> >>>>> And allowing only that project to use partition special. >>>>> >>>>> However, due to the downtime of 5%, I'd like to maintain a dynamic >>>>> exclusive 5 nodes. >>>>> Any suggestions? >>>>> >>>>> The project is serial and deployed as array of single node jobs, so I >>>>> can run it even when the other 95 nodes are full. >>>>> >>>>> Thanks, >>>>> --Dani_L.
