[slurm-dev] Re: SLURM : how to have a round-robin across nodes based on load average?

Loris Bennett Wed, 25 Nov 2015 23:59:42 -0800

"Sean Johnson" <[email protected]> writes:

>>> Hmm interesting for your use. How do you automatically power them down/up?
>>
>> In slurm.conf you can point the following variables:
>>
>> SuspendProgram
>> ResumeProgram
>>
>> to appropriate scripts.  Our nodes have an IPMI interface, so we can
>> use this to power nodes off or on.
>
> Do you end up developing a side cost of increased hard drive replacement? This
> seems like an interesting idea, but my first thought goes to the notion that 
> the
> spin up / spin down cycle is what ages spinning disk hard drives the most.


Our nodes are diskless, so this is not an issue. 

> On 19 Nov 2015, at 2:06, Loris Bennett wrote:
>
>> cips bmkg <[email protected]> writes:
>>
>>> Re: [slurm-dev] Re: Fwd: SLURM : how to have a round-robin across nodes 
>>> based
>>> on
>>> load average?
>>>
>>> Hmm interesting for your use. How do you automatically power them down/up?
>>
>> In slurm.conf you can point the following variables:
>>
>> SuspendProgram
>> ResumeProgram
>>
>> to appropriate scripts.  Our nodes have an IPMI interface, so we can
>> use this to power nodes off or on.
>>
>>> Fact of the matter is we almost always need our nodes : our HPC is a full
>>> capacity 99% of the time.
>>
>> Well, if your cluster is full, you don't need to worry about spreading
>> the workload in a round-robin fashion, do you?  However, if there are
>> times, such as weekends or public holidays, during which your occupation
>> rate drops, then it makes sense to have the workload consolidated, so
>> that you can switch off unneeded nodes.
>>
>> Cheers,
>>
>> Loris
>>
>>> On Thu, Nov 19, 2015 at 2:40 PM, Loris Bennett <[email protected]>
>>> wrote:
>>>
>>>  cips bmkg <[email protected]> writes:
>>>
>>>  > Re: [slurm-dev] Re: Fwd: SLURM : how to have a round-robin across
>>>  > nodes based on load average?
>>>  >
>>>  > Hi,
>>>  >
>>>  > If you generate a lot of mono-core sequential tasks, the regular SLURM
>>>  > allocation would pile them up into the first node, following with
>>>  > second , etc...
>>>  >
>>>  > The last node would (almost) never be used.
>>>
>>>  Why is this a bad thing? When we have nodes that are empty, we power
>>>  them down. When they are needed, they get powered on again. If you
>>>  spread the workload evenly over all nodes, this can never happen and you
>>>  waste a lot of energy.
>>
>> -- 
>> This signature is currently under construction.

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

[slurm-dev] Re: SLURM : how to have a round-robin across nodes based on load average?

Reply via email to