Daniel,

> Would it preclude the job from using all idle nodes if there are nodes
> available both in the special and in the public partitions?

The man page states that the partition offering the quickest dispatch
will take the job(s) first, but partitions with a higher priority will
be considered first.  I haven't attempted to run jobs with multiple
partitions specified, so empirically I cannot confirm or deny the
behavior.

> E.g., if I'm deploying a 2x array[4] jobs from the special project, the
> first array can use all the avail nodes in special, and the second would run
> 1 element on special. Would it then use public for the other 3 elements
> (provided public has some idle nodes)?

As long as the special partition is idle, I'd assume that the
"special" partition would take as many jobs as possible and then
dispatch the remaining on idle nodes in "public".

John DeSantis


2015-11-23 2:21 GMT-05:00 dani <[email protected]>:
> John,
>
>
> On 22/11//2015 17:15, John Desantis wrote:
>
> Daniel,
>
> That's correct - exclusive use means the project must always have at least 5
> nodes available to it, at all times, even if it means those nodes will be
> idle some of the time.
>
> A reservation could fulfill this without an issue, but I think there
> is a better approach to this particular workflow.
>
> OTOH, if some of the other nodes are idle for whatever reason (no one else
> is using the cluster), let the project use any (up to all) available nodes.
>
> So, the immediate concern I see with this is if the "--reservation="
> flag was supplied in submission script(s) without checking the
> reserved nodes first;  if the reservation is currently active, and
> there are jobs already running against it, the job(s) would be placed
> into a pending state.  Basically, this goes back to what Paul said
> about users needing to monitor the system.
>
> The project is run automatically based on some data as it becomes available
> to a dispatching app. Optimally it should be preemptable on the other nodes
> but not on the exclusive ones, and must not preempt other jobs, but the
> entire preemption issue is of secondary importance.
>
> A reservation is somewhat better than hardcoded nodelist as in my first
> post, but it's major drawback is that on reservation "renewal" there might
> not be enough (or any) nodes available and the project will not have enough
> nodes (since it can't preempt - unless somehow it can preempt, but only on
> those 5 nodes in the "new" reservation?).
>
> Thanks, I have a better idea of the project's needs now, and it
> doesn't look as if you'll need to use preemption unless there are time
> constraints.
>
> If I were in your place, I think the best course of action would be to
> have two separate partitions (since idle hardware isn't a concern).
> One would be the "public" partition, and the latter would be the
> "special"/project partition.  And, in order to accommodate for
> unforeseen hardware issues and node issues, assign a few extra nodes
> into the "special" partition - 10 for example - and place the rest of
> the nodes into the "public" partition.
>
> You would then need to provide some kind of access control on the
> "special" partition, either a "AllowQOS" or "AllowAccounts" parameter
> so that only the project jobs can run on it.  The "public" partition
> would not have any access control on it.
>
> The last part would be tailoring the submission script(s) to first
> attempt to use the "special" partition and if the job cannot be
> dispatched immediately, fall back to the "public" partition and run
> there:
>
> "--partition=special,public"
>
> I believe method would allow the project the best use of the system
> resources without needing to utilize a reservation or preemption
> (currently).
>
> Would it preclude the job from using all idle nodes if there are nodes
> available both in the special and in the public partitions?
>
> E.g., if I'm deploying a 2x array[4] jobs from the special project, the
> first array can use all the avail nodes in special, and the second would run
> 1 element on special. Would it then use public for the other 3 elements
> (provided public has some idle nodes)?
>
> HTH!
>
> John DeSantis
>
> Thanks for your input, it's very helpful :)
> --Dani_L.
>
>
> 2015-11-21 11:29 GMT-05:00 Daniel Letai <[email protected]>:
>
> John,
>
> That's correct - exclusive use means the project must always have at least 5
> nodes available to it, at all times, even if it means those nodes will be
> idle some of the time.
>
> OTOH, if some of the other nodes are idle for whatever reason (no one else
> is using the cluster), let the project use any (up to all) available nodes.
>
> The project is run automatically based on some data as it becomes available
> to a dispatching app. Optimally it should be preemptable on the other nodes
> but not on the exclusive ones, and must not preempt other jobs, but the
> entire preemption issue is of secondary importance.
>
> A reservation is somewhat better than hardcoded nodelist as in my first
> post, but it's major drawback is that on reservation "renewal" there might
> not be enough (or any) nodes available and the project will not have enough
> nodes (since it can't preempt - unless somehow it can preempt, but only on
> those 5 nodes in the "new" reservation?).
>
> --Dani_L.
>
>
> On 11/19/2015 10:35 PM, John Desantis wrote:
>
> Daniel,
>
> Could you provide more information on the project's needs?
>
> A QOS could be configured with a generous priority and limits so that
> the project cannot dominate the partition;  Reservations could be used
> too, but you'd need to define at a minimum a start time and duration -
> and when not in use the hardware would be idle and unavailable to
> other users.
>
> John DeSantis
>
>
> 2015-11-19 13:31 GMT-05:00 Daniel Letai <[email protected]>:
>
> The other issue is how to define the "public" partition. It would also
> have
> to float, with lower priority, or else how would you achieve exclusivity
> of
> "special" on the 5node float?
>
> --Dani_L.
>
>
> On 11/19/2015 06:10 PM, Paul Edmon wrote:
>
> Yeah, I guess QoS won't really work for overflow.  I was more thinking
> of
> the QoS as a way to create a floating partition of 5 nodes with the rest
> being in the public queue.  They would send jobs to the QoS to hit that
> and
> then when it is full they would submit to public as normal.  That's at
> least
> my thinking, but it's less seamless to the users as they will have to
> consciously monitor what is going on.
>
> -Paul Edmon-
>
> On 11/19/2015 10:50 AM, Daniel Letai wrote:
>
> Can you elaborate a little? I'm not sure what kind of QoS will help,
> nor
> how to implement one that will satisfy the requirements.
>
> On 11/19/2015 04:52 PM, Paul Edmon wrote:
>
> You might consider a QoS for this.  It may not do everything you want
> but it will give you the flexibility.
>
> -Paul Edmon-
>
> On 11/19/2015 04:49 AM, Daniel Letai wrote:
>
> Hi,
>
> Suppose I have a 100 node cluster with ~5% nodes down at any given
> time
> (maintanence/hw failure/...).
>
> One of the projects requires exclusive use of 5 nodes, and be able to
> use entire cluster when available (when other projects aren't
> running).
>
> I can do this easily if I maintain a static list of the exclusive
> nodes
> in slurm.conf:
>
> PartitionName=public Nodes=tux0[01-95] Default=YES
> PartitionName=special Nodes=tux[001-100] Default=NO
>
> And allowing only that project to use partition special.
>
> However, due to the downtime of 5%, I'd like to maintain a dynamic
> exclusive 5 nodes.
> Any suggestions?
>
> The project is serial and deployed as array of single node jobs, so I
> can run it even when the other 95 nodes are full.
>
> Thanks,
> --Dani_L.
>
>

Reply via email to