Daniel, > That's correct - exclusive use means the project must always have at least 5 > nodes available to it, at all times, even if it means those nodes will be > idle some of the time.
A reservation could fulfill this without an issue, but I think there is a better approach to this particular workflow. > OTOH, if some of the other nodes are idle for whatever reason (no one else > is using the cluster), let the project use any (up to all) available nodes. So, the immediate concern I see with this is if the "--reservation=" flag was supplied in submission script(s) without checking the reserved nodes first; if the reservation is currently active, and there are jobs already running against it, the job(s) would be placed into a pending state. Basically, this goes back to what Paul said about users needing to monitor the system. > The project is run automatically based on some data as it becomes available > to a dispatching app. Optimally it should be preemptable on the other nodes > but not on the exclusive ones, and must not preempt other jobs, but the > entire preemption issue is of secondary importance. > > A reservation is somewhat better than hardcoded nodelist as in my first > post, but it's major drawback is that on reservation "renewal" there might > not be enough (or any) nodes available and the project will not have enough > nodes (since it can't preempt - unless somehow it can preempt, but only on > those 5 nodes in the "new" reservation?). Thanks, I have a better idea of the project's needs now, and it doesn't look as if you'll need to use preemption unless there are time constraints. If I were in your place, I think the best course of action would be to have two separate partitions (since idle hardware isn't a concern). One would be the "public" partition, and the latter would be the "special"/project partition. And, in order to accommodate for unforeseen hardware issues and node issues, assign a few extra nodes into the "special" partition - 10 for example - and place the rest of the nodes into the "public" partition. You would then need to provide some kind of access control on the "special" partition, either a "AllowQOS" or "AllowAccounts" parameter so that only the project jobs can run on it. The "public" partition would not have any access control on it. The last part would be tailoring the submission script(s) to first attempt to use the "special" partition and if the job cannot be dispatched immediately, fall back to the "public" partition and run there: "--partition=special,public" I believe method would allow the project the best use of the system resources without needing to utilize a reservation or preemption (currently). HTH! John DeSantis 2015-11-21 11:29 GMT-05:00 Daniel Letai <[email protected]>: > > John, > > That's correct - exclusive use means the project must always have at least 5 > nodes available to it, at all times, even if it means those nodes will be > idle some of the time. > > OTOH, if some of the other nodes are idle for whatever reason (no one else > is using the cluster), let the project use any (up to all) available nodes. > > The project is run automatically based on some data as it becomes available > to a dispatching app. Optimally it should be preemptable on the other nodes > but not on the exclusive ones, and must not preempt other jobs, but the > entire preemption issue is of secondary importance. > > A reservation is somewhat better than hardcoded nodelist as in my first > post, but it's major drawback is that on reservation "renewal" there might > not be enough (or any) nodes available and the project will not have enough > nodes (since it can't preempt - unless somehow it can preempt, but only on > those 5 nodes in the "new" reservation?). > > --Dani_L. > > > On 11/19/2015 10:35 PM, John Desantis wrote: >> >> Daniel, >> >> Could you provide more information on the project's needs? >> >> A QOS could be configured with a generous priority and limits so that >> the project cannot dominate the partition; Reservations could be used >> too, but you'd need to define at a minimum a start time and duration - >> and when not in use the hardware would be idle and unavailable to >> other users. >> >> John DeSantis >> >> >> 2015-11-19 13:31 GMT-05:00 Daniel Letai <[email protected]>: >>> >>> The other issue is how to define the "public" partition. It would also >>> have >>> to float, with lower priority, or else how would you achieve exclusivity >>> of >>> "special" on the 5node float? >>> >>> --Dani_L. >>> >>> >>> On 11/19/2015 06:10 PM, Paul Edmon wrote: >>>> >>>> >>>> Yeah, I guess QoS won't really work for overflow. I was more thinking >>>> of >>>> the QoS as a way to create a floating partition of 5 nodes with the rest >>>> being in the public queue. They would send jobs to the QoS to hit that >>>> and >>>> then when it is full they would submit to public as normal. That's at >>>> least >>>> my thinking, but it's less seamless to the users as they will have to >>>> consciously monitor what is going on. >>>> >>>> -Paul Edmon- >>>> >>>> On 11/19/2015 10:50 AM, Daniel Letai wrote: >>>>> >>>>> >>>>> Can you elaborate a little? I'm not sure what kind of QoS will help, >>>>> nor >>>>> how to implement one that will satisfy the requirements. >>>>> >>>>> On 11/19/2015 04:52 PM, Paul Edmon wrote: >>>>>> >>>>>> >>>>>> You might consider a QoS for this. It may not do everything you want >>>>>> but it will give you the flexibility. >>>>>> >>>>>> -Paul Edmon- >>>>>> >>>>>> On 11/19/2015 04:49 AM, Daniel Letai wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Suppose I have a 100 node cluster with ~5% nodes down at any given >>>>>>> time >>>>>>> (maintanence/hw failure/...). >>>>>>> >>>>>>> One of the projects requires exclusive use of 5 nodes, and be able to >>>>>>> use entire cluster when available (when other projects aren't >>>>>>> running). >>>>>>> >>>>>>> I can do this easily if I maintain a static list of the exclusive >>>>>>> nodes >>>>>>> in slurm.conf: >>>>>>> >>>>>>> PartitionName=public Nodes=tux0[01-95] Default=YES >>>>>>> PartitionName=special Nodes=tux[001-100] Default=NO >>>>>>> >>>>>>> And allowing only that project to use partition special. >>>>>>> >>>>>>> However, due to the downtime of 5%, I'd like to maintain a dynamic >>>>>>> exclusive 5 nodes. >>>>>>> Any suggestions? >>>>>>> >>>>>>> The project is serial and deployed as array of single node jobs, so I >>>>>>> can run it even when the other 95 nodes are full. >>>>>>> >>>>>>> Thanks, >>>>>>> --Dani_L.
