"Sean Johnson" <[email protected]> writes: >>> Hmm interesting for your use. How do you automatically power them down/up? >> >> In slurm.conf you can point the following variables: >> >> SuspendProgram >> ResumeProgram >> >> to appropriate scripts. Our nodes have an IPMI interface, so we can >> use this to power nodes off or on. > > Do you end up developing a side cost of increased hard drive replacement? This > seems like an interesting idea, but my first thought goes to the notion that > the > spin up / spin down cycle is what ages spinning disk hard drives the most.
Our nodes are diskless, so this is not an issue. > On 19 Nov 2015, at 2:06, Loris Bennett wrote: > >> cips bmkg <[email protected]> writes: >> >>> Re: [slurm-dev] Re: Fwd: SLURM : how to have a round-robin across nodes >>> based >>> on >>> load average? >>> >>> Hmm interesting for your use. How do you automatically power them down/up? >> >> In slurm.conf you can point the following variables: >> >> SuspendProgram >> ResumeProgram >> >> to appropriate scripts. Our nodes have an IPMI interface, so we can >> use this to power nodes off or on. >> >>> Fact of the matter is we almost always need our nodes : our HPC is a full >>> capacity 99% of the time. >> >> Well, if your cluster is full, you don't need to worry about spreading >> the workload in a round-robin fashion, do you? However, if there are >> times, such as weekends or public holidays, during which your occupation >> rate drops, then it makes sense to have the workload consolidated, so >> that you can switch off unneeded nodes. >> >> Cheers, >> >> Loris >> >>> On Thu, Nov 19, 2015 at 2:40 PM, Loris Bennett <[email protected]> >>> wrote: >>> >>> cips bmkg <[email protected]> writes: >>> >>> > Re: [slurm-dev] Re: Fwd: SLURM : how to have a round-robin across >>> > nodes based on load average? >>> > >>> > Hi, >>> > >>> > If you generate a lot of mono-core sequential tasks, the regular SLURM >>> > allocation would pile them up into the first node, following with >>> > second , etc... >>> > >>> > The last node would (almost) never be used. >>> >>> Why is this a bad thing? When we have nodes that are empty, we power >>> them down. When they are needed, they get powered on again. If you >>> spread the workload evenly over all nodes, this can never happen and you >>> waste a lot of energy. >> >> -- >> This signature is currently under construction. -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email [email protected]
