cips bmkg <[email protected]> writes:

> Re: [slurm-dev] Re: Fwd: SLURM : how to have a round-robin across nodes based 
> on
> load average? 
>
> Hmm interesting for your use. How do you automatically power them down/up?

In slurm.conf you can point the following variables:

SuspendProgram
ResumeProgram

to appropriate scripts.  Our nodes have an IPMI interface, so we can
use this to power nodes off or on.

> Fact of the matter is we almost always need our nodes : our HPC is a full
> capacity 99% of the time.

Well, if your cluster is full, you don't need to worry about spreading
the workload in a round-robin fashion, do you?  However, if there are
times, such as weekends or public holidays, during which your occupation
rate drops, then it makes sense to have the workload consolidated, so
that you can switch off unneeded nodes.

Cheers,

Loris

> On Thu, Nov 19, 2015 at 2:40 PM, Loris Bennett <[email protected]>
> wrote:
>
>     cips bmkg <[email protected]> writes:
>     
>     > Re: [slurm-dev] Re: Fwd: SLURM : how to have a round-robin across
>     > nodes based on load average?
>     >
>     > Hi,
>     >
>     > If you generate a lot of mono-core sequential tasks, the regular SLURM
>     > allocation would pile them up into the first node, following with
>     > second , etc...
>     >
>     > The last node would (almost) never be used.
>     
>     Why is this a bad thing? When we have nodes that are empty, we power
>     them down. When they are needed, they get powered on again. If you
>     spread the workload evenly over all nodes, this can never happen and you
>     waste a lot of energy.

-- 
This signature is currently under construction.

Reply via email to