Hi Dani,

We use cons_res, and cr_core_memory.  The way we deal with it is to just
define the defaultmempercpu as the lowest multiple that will work for all
of the nodes filling up the memory on the least memory nodes.  We have all
nodes in one partition “all” the default one.  We have nodes with 128GB,
386GB, and 1.5TB.  We tell the users this default memory amount (4GB), and
have them request more if they need it.  This frees up memory that is
wasted for big memory jobs enabling them to start sooner.  It works great.

Chris



On 6/18/15, 5:56 AM, "Daniel Letai" <[email protected]> wrote:

>
>
>
>After considering this solution further, it's not ideal - only one of the
>partitions will be used, but many jobs require the entire cluster to run.
>
>I think I liked the old behavior (automatic memory over-subscription)
>better.
>For now, I'll probably have to define a low common denominator (suggested
>50 sounds about right) and force users to define --mem
>
>On 06/17/2015 08:38 PM, Trey Dockendorf wrote:
>
>
>We also have a heterogeneous environment, with basically two classes of
>nodes in terms of the memory/CPU.  We have 2GB/CPU and 4GB/CPU.  We use
>"background" partitions which access the entire cluster and allow for
>opportunistic utilization of
> otherwise idle CPUs.  We found we had to create one of these partitions
>for each class of memory/CPU.
>
>
>PartitionName=DEFAULT Nodes=<long nodelist> DefMemPerCPU=1900
>MaxMemPerCPU=2000
>
>
>
>PartitionName=background Nodes=<long nodelist> Priority=10
>AllowQOS=background MaxNodes=1 MaxTime=96:00:00 State=UP
>
>PartitionName=background-4g Nodes=<long nodelist> Priority=10
>AllowQOS=background MaxNodes=1 DefMemPerCPU=3900 MaxMemPerCPU=4000
>MaxTime=96:00:00 State=UP
>
>
>
>The background partition contains all 2GB/CPU nodes and background-4g
>contains all 4GB/CPU.  A user can submit to either by doing something
>like "sbatch --partition=background,background-4g --qos=background".
>
>
>There may be a better and/or more clever way of handling such partitions
>in a heterogenous environment, but the above method has served us well.
>
>
>
>- Trey
>
>
>=============================
>
>
>Trey Dockendorf 
>Systems Analyst I 
>Texas A&M University
>Academy for Advanced Telecommunications and Learning Technologies
>Phone: (979)458-2396
>Email: 
>[email protected] <mailto:[email protected]>
>Jabber: 
>[email protected] <mailto:[email protected]>
>
>
>
>
>On Wed, Jun 17, 2015 at 9:22 AM, Daniel Letai
><[email protected]> wrote:
>
>
>Currently I have 2 types of nodes:
>old = 2 sockets, 4 cores per socket, 64GB mem
>new = 2 sockets, 6 cores per socket, 128GB mem
>
>Since I'm using select/cr_cons and using CR_CPU_Memory, I thought I'd
>assign as default the relative amount of memory per core,
>old - DefMemPerCPU = 8000
>new - DefMemPerCPU = 20000
>
>However, those values are part of the partition, not node, definition.
>
>How can I assign those defaults to the cluster, yet define a single
>global partition to allow jobs to utilize the entire cluster?
>Assume tux[001-100]=old, tux[101-200]=new
>
>I assume something like
>PartitionName=Default Nodes=tux[001-100] DefMemPerCPU=8000
>PartitionName=Default Nodes=tux[101-200] DefMemPerCPU=20000
>PartitionName=compute Nodes=tux[101-200] Default=yes State=up
>
>will not work.
>
>What is the correct way to represent/use this cluster?
>The other option I could think of was set DefMemPerCPU=1 to the entire
>cluster, and force users to always use --mem, but I'm hoping to avoid
>this kind of solution.
>
>
>
>
>
>
>
>

Reply via email to