I would make sure the application is an actual parallel application.
How are you verifying it is only running on 1 cpu? Could you send what
the user is running?
On a sort of related note, since you are using cgroups in the task
plugin you might also try
ProctrackType=proctrack/cgroup
Since you are using slurmdbd accounting you probably don't need
JobCompType=jobcomp/filetxt either.
With respect to task affinity, I wouldn't expect it to matter here as we
advise most people to use it as it will bind tasks to specific cpus
usually increasing performance.
Danny
On 06/10/15 07:27, Ryan Cox wrote:
Jackie,
You probably want to try TaskAffinity=no. I remember that we had some
weird behavior when we had it set to yes. Task affinity is used to
pin tasks to certain cpus but the cgroup already limits them to the
allocated set of cpus, so it seems redundant.
Ryan
On 06/09/2015 06:51 PM, Jacqueline Scoggins wrote:
cgroup setup and cpuset issues
First round of testing cgroups and noticed that no matter how many
cpus requested (-n x) the users job is only running on one cpu.
Current configuration:
slurm.conf -
SlurmUser=slurm
SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
CryptoType=crypto/munge
CompleteWait=0
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SwitchType=switch/none
MpiDefault=none
CacheGroups=0
KillOnBadExit=1
JobRequeue=0
ReturnToService=1
TreeWidth=4096
MaxJobCount=100000
*/TaskPlugin=task/cgroup/*
TopologyPlugin=topology/tree
MessageTimeout=60
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SchedulerParameters=bf_continue
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
ProctrackType=proctrack/linuxproc
FastSchedule=0
PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityUsageResetPeriod=None
PriorityWeightFairshare=1000000
PriorityWeightQOS=1000000000
PriorityWeightAge=1000
PriorityWeightPartition=0
PriorityWeightJobSize=1000
PriorityMaxAge=06:00:00
PriorityFlags=Ticket_based
SlurmctldDebug=4
SlurmctldLogFile=/local/slurm/log/slurmctld.log
SlurmdDebug=4
SlurmdLogFile=/local/slurm/log/slurmd.log
JobCompType=jobcomp/filetxt
JobCompLoc=/var/spool/slurm/jobs/complete
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=10
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageEnforce=associations,limits,qos
AccountingStorageHost=phoenix.scs.lbl.gov <http://phoenix.scs.lbl.gov>
HealthCheckProgram=/usr/sbin/nhc
HealthCheckInterval=300
NodeName=n0[000-018] NodeAddr=10.0.17.[0-18] CPUs=8 Sockets=2
CoresPerSocket=4 Feature=lr_phi
PartitionName=c_shared Nodes=n0[000-008] Shared=yes
PartitionName=regular Nodes=n0[009-018] Shared=Exclusive
cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
TaskAffinity=yes
ConstrainRAMSpace=no
Is there something I am missing. I tried it with only TaskAffinity
without ConstrainRAMSpace=no but that did not make any difference.
Slurm version = 14.03.8
OS = SL 6.6
Please advise if I need to configure something else to make it work.
Thanks in advanced for your assistance.
Jackie Scoggins
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University