On Thu, Jun 18 2015, Gerben Roest <[email protected]> wrote:

> Daniel Letai schreef op 17-6-2015 om 16:22:
>
>> Since I'm using select/cr_cons and using CR_CPU_Memory, I thought I'd
>> assign as default the relative amount of memory per core,
>> old - DefMemPerCPU = 8000
>> new - DefMemPerCPU = 20000
>> 
>> However, those values are part of the partition, not node, definition.
>
> Can it be some kind of feature request for next Slurm releases? I know
> of a (very) heterogeneous cluster that would benefit from making
> DefMemPerCPU part of the node definition instead of partition def. I
> don't want to bother the users with lots of different partitions for
> each node type.
>

We solved it a bit differently. Under the assumption that a program
should specify how much memory it requires, instead of getting a random
amount according to which node/partition it's running on, we set
DefMemPerCPU = 50 for the entire cluster(s).

If users want memory, they should ask for it - or their programs will be
swapped out/killed (same for cpu/time/gres). We have a 256G/16 cores
nodes and we don't want a default 16G per core, as most programs won't
use it (they usually request 1 - 10G), and other programs which requires
200G will have to wait in the queue because the memory is allegedly
allocated. And when the 200G program does run, the rest that use the
default value won't run, because they requested more than they need (and
more than is available).


    Yair.

Reply via email to