Jackie,
You probably want to try TaskAffinity=no. I remember that we had some
weird behavior when we had it set to yes. Task affinity is used to pin
tasks to certain cpus but the cgroup already limits them to the
allocated set of cpus, so it seems redundant.
Ryan
On 06/09/2015 06:51 PM, Jacqueline Scoggins wrote:
cgroup setup and cpuset issues
First round of testing cgroups and noticed that no matter how many
cpus requested (-n x) the users job is only running on one cpu.
Current configuration:
slurm.conf -
SlurmUser=slurm
SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
CryptoType=crypto/munge
CompleteWait=0
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SwitchType=switch/none
MpiDefault=none
CacheGroups=0
KillOnBadExit=1
JobRequeue=0
ReturnToService=1
TreeWidth=4096
MaxJobCount=100000
*/TaskPlugin=task/cgroup/*
TopologyPlugin=topology/tree
MessageTimeout=60
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SchedulerParameters=bf_continue
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
ProctrackType=proctrack/linuxproc
FastSchedule=0
PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityUsageResetPeriod=None
PriorityWeightFairshare=1000000
PriorityWeightQOS=1000000000
PriorityWeightAge=1000
PriorityWeightPartition=0
PriorityWeightJobSize=1000
PriorityMaxAge=06:00:00
PriorityFlags=Ticket_based
SlurmctldDebug=4
SlurmctldLogFile=/local/slurm/log/slurmctld.log
SlurmdDebug=4
SlurmdLogFile=/local/slurm/log/slurmd.log
JobCompType=jobcomp/filetxt
JobCompLoc=/var/spool/slurm/jobs/complete
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=10
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageEnforce=associations,limits,qos
AccountingStorageHost=phoenix.scs.lbl.gov <http://phoenix.scs.lbl.gov>
HealthCheckProgram=/usr/sbin/nhc
HealthCheckInterval=300
NodeName=n0[000-018] NodeAddr=10.0.17.[0-18] CPUs=8 Sockets=2
CoresPerSocket=4 Feature=lr_phi
PartitionName=c_shared Nodes=n0[000-008] Shared=yes
PartitionName=regular Nodes=n0[009-018] Shared=Exclusive
cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
TaskAffinity=yes
ConstrainRAMSpace=no
Is there something I am missing. I tried it with only TaskAffinity
without ConstrainRAMSpace=no but that did not make any difference.
Slurm version = 14.03.8
OS = SL 6.6
Please advise if I need to configure something else to make it work.
Thanks in advanced for your assistance.
Jackie Scoggins
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University