Ryan, I believe this is the default behavior of reservations unless the flag "static_alloc" is specified.
John DeSantis 2015-11-21 22:13 GMT-05:00 Novosielski, Ryan <[email protected]>: > I could have sworn that I just heard it was possible to create a floating > reservation for any number of nodes and that you could also cause it to > replace nodes if any went missing with the "replace" flag. Is that not all > in the current release? > > -- > ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* > || \\UTGERS |---------------------*O*--------------------- > ||_// Biomedical | Ryan Novosielski - Senior Technologist > || \\ and Health | [email protected] 973/972.0922 (2x0922) > || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark > `' > > On Nov 21, 2015, at 11:30, Daniel Letai <[email protected]> wrote: > > > John, > > That's correct - exclusive use means the project must always have at > least 5 nodes available to it, at all times, even if it means those > nodes will be idle some of the time. > > OTOH, if some of the other nodes are idle for whatever reason (no one > else is using the cluster), let the project use any (up to all) > available nodes. > > The project is run automatically based on some data as it becomes > available to a dispatching app. Optimally it should be preemptable on > the other nodes but not on the exclusive ones, and must not preempt > other jobs, but the entire preemption issue is of secondary importance. > > A reservation is somewhat better than hardcoded nodelist as in my first > post, but it's major drawback is that on reservation "renewal" there > might not be enough (or any) nodes available and the project will not > have enough nodes (since it can't preempt - unless somehow it can > preempt, but only on those 5 nodes in the "new" reservation?). > > --Dani_L. > > On 11/19/2015 10:35 PM, John Desantis wrote: > > Daniel, > > > Could you provide more information on the project's needs? > > > A QOS could be configured with a generous priority and limits so that > > the project cannot dominate the partition; Reservations could be used > > too, but you'd need to define at a minimum a start time and duration - > > and when not in use the hardware would be idle and unavailable to > > other users. > > > John DeSantis > > > > 2015-11-19 13:31 GMT-05:00 Daniel Letai <[email protected]>: > > The other issue is how to define the "public" partition. It would also have > > to float, with lower priority, or else how would you achieve exclusivity of > > "special" on the 5node float? > > > --Dani_L. > > > > On 11/19/2015 06:10 PM, Paul Edmon wrote: > > > Yeah, I guess QoS won't really work for overflow. I was more thinking of > > the QoS as a way to create a floating partition of 5 nodes with the rest > > being in the public queue. They would send jobs to the QoS to hit that and > > then when it is full they would submit to public as normal. That's at least > > my thinking, but it's less seamless to the users as they will have to > > consciously monitor what is going on. > > > -Paul Edmon- > > > On 11/19/2015 10:50 AM, Daniel Letai wrote: > > > Can you elaborate a little? I'm not sure what kind of QoS will help, nor > > how to implement one that will satisfy the requirements. > > > On 11/19/2015 04:52 PM, Paul Edmon wrote: > > > You might consider a QoS for this. It may not do everything you want > > but it will give you the flexibility. > > > -Paul Edmon- > > > On 11/19/2015 04:49 AM, Daniel Letai wrote: > > > Hi, > > > Suppose I have a 100 node cluster with ~5% nodes down at any given time > > (maintanence/hw failure/...). > > > One of the projects requires exclusive use of 5 nodes, and be able to > > use entire cluster when available (when other projects aren't running). > > > I can do this easily if I maintain a static list of the exclusive nodes > > in slurm.conf: > > > PartitionName=public Nodes=tux0[01-95] Default=YES > > PartitionName=special Nodes=tux[001-100] Default=NO > > > And allowing only that project to use partition special. > > > However, due to the downtime of 5%, I'd like to maintain a dynamic > > exclusive 5 nodes. > > Any suggestions? > > > The project is serial and deployed as array of single node jobs, so I > > can run it even when the other 95 nodes are full. > > > Thanks, > > --Dani_L.
