[slurm-dev] Re: cgroup setup and cpuset issues

Danny Auble Wed, 10 Jun 2015 07:38:58 -0700

I would make sure the application is an actual parallel application.How are you verifying it is only running on 1 cpu? Could you send whatthe user is running?

On a sort of related note, since you are using cgroups in the taskplugin you might also try


ProctrackType=proctrack/cgroup

Since you are using slurmdbd accounting you probably don't needJobCompType=jobcomp/filetxt either.

With respect to task affinity, I wouldn't expect it to matter here as weadvise most people to use it as it will bind tasks to specific cpususually increasing performance.


Danny

On 06/10/15 07:27, Ryan Cox wrote:

Jackie,

You probably want to try TaskAffinity=no. I remember that we had someweird behavior when we had it set to yes. Task affinity is used topin tasks to certain cpus but the cgroup already limits them to theallocated set of cpus, so it seems redundant.


Ryan

On 06/09/2015 06:51 PM, Jacqueline Scoggins wrote:

cgroup setup and cpuset issues

First round of testing cgroups and noticed that no matter how manycpus requested (-n x) the users job is only running on one cpu.


Current configuration:

slurm.conf  -

SlurmUser=slurm

SlurmdUser=root

SlurmctldPort=6817

SlurmdPort=6818

AuthType=auth/munge

CryptoType=crypto/munge

CompleteWait=0

StateSaveLocation=/tmp

SlurmdSpoolDir=/tmp/slurmd

SlurmctldPidFile=/var/run/slurmctld.pid

SlurmdPidFile=/var/run/slurmd.pid

SwitchType=switch/none

MpiDefault=none

CacheGroups=0

KillOnBadExit=1

JobRequeue=0

ReturnToService=1

TreeWidth=4096

MaxJobCount=100000

*/TaskPlugin=task/cgroup/*

TopologyPlugin=topology/tree

MessageTimeout=60

SlurmctldTimeout=300

SlurmdTimeout=300

InactiveLimit=0

MinJobAge=300

KillWait=30

Waittime=0

SchedulerType=sched/backfill

SchedulerParameters=bf_continue

SelectType=select/cons_res

SelectTypeParameters=CR_CPU_Memory

ProctrackType=proctrack/linuxproc

FastSchedule=0

PriorityType=priority/multifactor

PriorityDecayHalfLife=14-0

PriorityUsageResetPeriod=None

PriorityWeightFairshare=1000000

PriorityWeightQOS=1000000000

PriorityWeightAge=1000

PriorityWeightPartition=0

PriorityWeightJobSize=1000

PriorityMaxAge=06:00:00

PriorityFlags=Ticket_based

SlurmctldDebug=4

SlurmctldLogFile=/local/slurm/log/slurmctld.log

SlurmdDebug=4

SlurmdLogFile=/local/slurm/log/slurmd.log

JobCompType=jobcomp/filetxt

JobCompLoc=/var/spool/slurm/jobs/complete

JobAcctGatherType=jobacct_gather/linux

JobAcctGatherFrequency=10

AccountingStorageType=accounting_storage/slurmdbd

AccountingStorageEnforce=associations,limits,qos

AccountingStorageHost=phoenix.scs.lbl.gov <http://phoenix.scs.lbl.gov>

HealthCheckProgram=/usr/sbin/nhc

HealthCheckInterval=300

NodeName=n0[000-018] NodeAddr=10.0.17.[0-18] CPUs=8 Sockets=2CoresPerSocket=4 Feature=lr_phi


PartitionName=c_shared Nodes=n0[000-008] Shared=yes

PartitionName=regular Nodes=n0[009-018] Shared=Exclusive


cgroup.conf


CgroupAutomount=yes

CgroupReleaseAgentDir="/etc/slurm/cgroup"

ConstrainCores=yes

TaskAffinity=yes

ConstrainRAMSpace=no

Is there something I am missing. I tried it with only TaskAffinitywithout ConstrainRAMSpace=no but that did not make any difference.



Slurm version = 14.03.8

OS = SL 6.6


Please advise if I need to configure something else to make it work.


Thanks in advanced for your assistance.


Jackie Scoggins


--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University

[slurm-dev] Re: cgroup setup and cpuset issues

Reply via email to