Title: Re: [slurm-dev] Re: A floating exclusive partition
John,

On 22/11//2015 17:15, John Desantis wrote:
Daniel,

That's correct - exclusive use means the project must always have at least 5
nodes available to it, at all times, even if it means those nodes will be
idle some of the time.
A reservation could fulfill this without an issue, but I think there
is a better approach to this particular workflow.

OTOH, if some of the other nodes are idle for whatever reason (no one else
is using the cluster), let the project use any (up to all) available nodes.
So, the immediate concern I see with this is if the "--reservation="
flag was supplied in submission script(s) without checking the
reserved nodes first;  if the reservation is currently active, and
there are jobs already running against it, the job(s) would be placed
into a pending state.  Basically, this goes back to what Paul said
about users needing to monitor the system.

The project is run automatically based on some data as it becomes available
to a dispatching app. Optimally it should be preemptable on the other nodes
but not on the exclusive ones, and must not preempt other jobs, but the
entire preemption issue is of secondary importance.

A reservation is somewhat better than hardcoded nodelist as in my first
post, but it's major drawback is that on reservation "renewal" there might
not be enough (or any) nodes available and the project will not have enough
nodes (since it can't preempt - unless somehow it can preempt, but only on
those 5 nodes in the "new" reservation?).
Thanks, I have a better idea of the project's needs now, and it
doesn't look as if you'll need to use preemption unless there are time
constraints.

If I were in your place, I think the best course of action would be to
have two separate partitions (since idle hardware isn't a concern).
One would be the "public" partition, and the latter would be the
"special"/project partition.  And, in order to accommodate for
unforeseen hardware issues and node issues, assign a few extra nodes
into the "special" partition - 10 for example - and place the rest of
the nodes into the "public" partition.

You would then need to provide some kind of access control on the
"special" partition, either a "AllowQOS" or "AllowAccounts" parameter
so that only the project jobs can run on it.  The "public" partition
would not have any access control on it.

The last part would be tailoring the submission script(s) to first
attempt to use the "special" partition and if the job cannot be
dispatched immediately, fall back to the "public" partition and run
there:

"--partition=special,public"

I believe method would allow the project the best use of the system
resources without needing to utilize a reservation or preemption
(currently).
Would it preclude the job from using all idle nodes if there are nodes available both in the special and in the public partitions?

E.g., if I'm deploying a 2x array[4] jobs from the special project, the first array can use all the avail nodes in special, and the second would run 1 element on special. Would it then use public for the other 3 elements (provided public has some idle nodes)?

HTH!

John DeSantis
Thanks for your input, it's very helpful :)
--Dani_L.

2015-11-21 11:29 GMT-05:00 Daniel Letai <[email protected]>:
John,

That's correct - exclusive use means the project must always have at least 5
nodes available to it, at all times, even if it means those nodes will be
idle some of the time.

OTOH, if some of the other nodes are idle for whatever reason (no one else
is using the cluster), let the project use any (up to all) available nodes.

The project is run automatically based on some data as it becomes available
to a dispatching app. Optimally it should be preemptable on the other nodes
but not on the exclusive ones, and must not preempt other jobs, but the
entire preemption issue is of secondary importance.

A reservation is somewhat better than hardcoded nodelist as in my first
post, but it's major drawback is that on reservation "renewal" there might
not be enough (or any) nodes available and the project will not have enough
nodes (since it can't preempt - unless somehow it can preempt, but only on
those 5 nodes in the "new" reservation?).

--Dani_L.


On 11/19/2015 10:35 PM, John Desantis wrote:
Daniel,

Could you provide more information on the project's needs?

A QOS could be configured with a generous priority and limits so that
the project cannot dominate the partition;  Reservations could be used
too, but you'd need to define at a minimum a start time and duration -
and when not in use the hardware would be idle and unavailable to
other users.

John DeSantis


2015-11-19 13:31 GMT-05:00 Daniel Letai <[email protected]>:
The other issue is how to define the "public" partition. It would also
have
to float, with lower priority, or else how would you achieve exclusivity
of
"special" on the 5node float?

--Dani_L.


On 11/19/2015 06:10 PM, Paul Edmon wrote:

Yeah, I guess QoS won't really work for overflow.  I was more thinking
of
the QoS as a way to create a floating partition of 5 nodes with the rest
being in the public queue.  They would send jobs to the QoS to hit that
and
then when it is full they would submit to public as normal.  That's at
least
my thinking, but it's less seamless to the users as they will have to
consciously monitor what is going on.

-Paul Edmon-

On 11/19/2015 10:50 AM, Daniel Letai wrote:

Can you elaborate a little? I'm not sure what kind of QoS will help,
nor
how to implement one that will satisfy the requirements.

On 11/19/2015 04:52 PM, Paul Edmon wrote:

You might consider a QoS for this.  It may not do everything you want
but it will give you the flexibility.

-Paul Edmon-

On 11/19/2015 04:49 AM, Daniel Letai wrote:

Hi,

Suppose I have a 100 node cluster with ~5% nodes down at any given
time
(maintanence/hw failure/...).

One of the projects requires exclusive use of 5 nodes, and be able to
use entire cluster when available (when other projects aren't
running).

I can do this easily if I maintain a static list of the exclusive
nodes
in slurm.conf:

PartitionName=public Nodes=tux0[01-95] Default=YES
PartitionName=special Nodes=tux[001-100] Default=NO

And allowing only that project to use partition special.

However, due to the downtime of 5%, I'd like to maintain a dynamic
exclusive 5 nodes.
Any suggestions?

The project is serial and deployed as array of single node jobs, so I
can run it even when the other 95 nodes are full.

Thanks,
--Dani_L.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to