Daniel,

Could you provide more information on the project's needs?

A QOS could be configured with a generous priority and limits so that
the project cannot dominate the partition;  Reservations could be used
too, but you'd need to define at a minimum a start time and duration -
and when not in use the hardware would be idle and unavailable to
other users.

John DeSantis


2015-11-19 13:31 GMT-05:00 Daniel Letai <[email protected]>:
>
> The other issue is how to define the "public" partition. It would also have
> to float, with lower priority, or else how would you achieve exclusivity  of
> "special" on the 5node float?
>
> --Dani_L.
>
>
> On 11/19/2015 06:10 PM, Paul Edmon wrote:
>>
>>
>> Yeah, I guess QoS won't really work for overflow.  I was more thinking of
>> the QoS as a way to create a floating partition of 5 nodes with the rest
>> being in the public queue.  They would send jobs to the QoS to hit that and
>> then when it is full they would submit to public as normal.  That's at least
>> my thinking, but it's less seamless to the users as they will have to
>> consciously monitor what is going on.
>>
>> -Paul Edmon-
>>
>> On 11/19/2015 10:50 AM, Daniel Letai wrote:
>>>
>>>
>>> Can you elaborate a little? I'm not sure what kind of QoS will help, nor
>>> how to implement one that will satisfy the requirements.
>>>
>>> On 11/19/2015 04:52 PM, Paul Edmon wrote:
>>>>
>>>>
>>>> You might consider a QoS for this.  It may not do everything you want
>>>> but it will give you the flexibility.
>>>>
>>>> -Paul Edmon-
>>>>
>>>> On 11/19/2015 04:49 AM, Daniel Letai wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Suppose I have a 100 node cluster with ~5% nodes down at any given time
>>>>> (maintanence/hw failure/...).
>>>>>
>>>>> One of the projects requires exclusive use of 5 nodes, and be able to
>>>>> use entire cluster when available (when other projects aren't running).
>>>>>
>>>>> I can do this easily if I maintain a static list of the exclusive nodes
>>>>> in slurm.conf:
>>>>>
>>>>> PartitionName=public Nodes=tux0[01-95] Default=YES
>>>>> PartitionName=special Nodes=tux[001-100] Default=NO
>>>>>
>>>>> And allowing only that project to use partition special.
>>>>>
>>>>> However, due to the downtime of 5%, I'd like to maintain a dynamic
>>>>> exclusive 5 nodes.
>>>>> Any suggestions?
>>>>>
>>>>> The project is serial and deployed as array of single node jobs, so I
>>>>> can run it even when the other 95 nodes are full.
>>>>>
>>>>> Thanks,
>>>>> --Dani_L.

Reply via email to