Thanks, I wasn't aware of the ability to submit to several partitions at once.

On 17/06//2015 20:38, Trey Dockendorf wrote:
We also have a heterogeneous environment, with basically two classes of
nodes in terms of the memory/CPU.  We have 2GB/CPU and 4GB/CPU.  We use
"background" partitions which access the entire cluster and allow for
opportunistic utilization of otherwise idle CPUs.  We found we had to
create one of these partitions for each class of memory/CPU.

PartitionName=DEFAULT Nodes=<long nodelist> DefMemPerCPU=1900
MaxMemPerCPU=2000

PartitionName=background Nodes=<long nodelist> Priority=10
AllowQOS=background MaxNodes=1 MaxTime=96:00:00 State=UP
PartitionName=background-4g Nodes=<long nodelist> Priority=10
AllowQOS=background MaxNodes=1 DefMemPerCPU=3900 MaxMemPerCPU=4000
MaxTime=96:00:00 State=UP

The background partition contains all 2GB/CPU nodes and background-4g
contains all 4GB/CPU.  A user can submit to either by doing something
like "sbatch --partition=background,background-4g --qos=background".

There may be a better and/or more clever way of handling such partitions
in a heterogenous environment, but the above method has served us well.

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected] <mailto:[email protected]>
Jabber: [email protected] <mailto:[email protected]>

On Wed, Jun 17, 2015 at 9:22 AM, Daniel Letai <[email protected]
<mailto:[email protected]>> wrote:


    Currently I have 2 types of nodes:
    old = 2 sockets, 4 cores per socket, 64GB mem
    new = 2 sockets, 6 cores per socket, 128GB mem

    Since I'm using select/cr_cons and using CR_CPU_Memory, I thought
    I'd assign as default the relative amount of memory per core,
    old - DefMemPerCPU = 8000
    new - DefMemPerCPU = 20000

    However, those values are part of the partition, not node, definition.

    How can I assign those defaults to the cluster, yet define a single
    global partition to allow jobs to utilize the entire cluster?
    Assume tux[001-100]=old, tux[101-200]=new

    I assume something like
    PartitionName=Default Nodes=tux[001-100] DefMemPerCPU=8000
    PartitionName=Default Nodes=tux[101-200] DefMemPerCPU=20000
    PartitionName=compute Nodes=tux[101-200] Default=yes State=up

    will not work.

    What is the correct way to represent/use this cluster?
    The other option I could think of was set DefMemPerCPU=1 to the
    entire cluster, and force users to always use --mem, but I'm hoping
    to avoid this kind of solution.



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to