Jackie,

You probably want to try TaskAffinity=no. I remember that we had some weird behavior when we had it set to yes. Task affinity is used to pin tasks to certain cpus but the cgroup already limits them to the allocated set of cpus, so it seems redundant.

Ryan

On 06/09/2015 06:51 PM, Jacqueline Scoggins wrote:
cgroup setup and cpuset issues
First round of testing cgroups and noticed that no matter how many cpus requested (-n x) the users job is only running on one cpu.

Current configuration:

slurm.conf  -

SlurmUser=slurm

SlurmdUser=root

SlurmctldPort=6817

SlurmdPort=6818

AuthType=auth/munge

CryptoType=crypto/munge

CompleteWait=0

StateSaveLocation=/tmp

SlurmdSpoolDir=/tmp/slurmd

SlurmctldPidFile=/var/run/slurmctld.pid

SlurmdPidFile=/var/run/slurmd.pid

SwitchType=switch/none

MpiDefault=none

CacheGroups=0

KillOnBadExit=1

JobRequeue=0

ReturnToService=1

TreeWidth=4096

MaxJobCount=100000

*/TaskPlugin=task/cgroup/*

TopologyPlugin=topology/tree

MessageTimeout=60

SlurmctldTimeout=300

SlurmdTimeout=300

InactiveLimit=0

MinJobAge=300

KillWait=30

Waittime=0

SchedulerType=sched/backfill

SchedulerParameters=bf_continue

SelectType=select/cons_res

SelectTypeParameters=CR_CPU_Memory

ProctrackType=proctrack/linuxproc

FastSchedule=0

PriorityType=priority/multifactor

PriorityDecayHalfLife=14-0

PriorityUsageResetPeriod=None

PriorityWeightFairshare=1000000

PriorityWeightQOS=1000000000

PriorityWeightAge=1000

PriorityWeightPartition=0

PriorityWeightJobSize=1000

PriorityMaxAge=06:00:00

PriorityFlags=Ticket_based

SlurmctldDebug=4

SlurmctldLogFile=/local/slurm/log/slurmctld.log

SlurmdDebug=4

SlurmdLogFile=/local/slurm/log/slurmd.log

JobCompType=jobcomp/filetxt

JobCompLoc=/var/spool/slurm/jobs/complete

JobAcctGatherType=jobacct_gather/linux

JobAcctGatherFrequency=10

AccountingStorageType=accounting_storage/slurmdbd

AccountingStorageEnforce=associations,limits,qos

AccountingStorageHost=phoenix.scs.lbl.gov <http://phoenix.scs.lbl.gov>

HealthCheckProgram=/usr/sbin/nhc

HealthCheckInterval=300

NodeName=n0[000-018] NodeAddr=10.0.17.[0-18] CPUs=8 Sockets=2 CoresPerSocket=4 Feature=lr_phi

PartitionName=c_shared Nodes=n0[000-008] Shared=yes

PartitionName=regular Nodes=n0[009-018] Shared=Exclusive


cgroup.conf


CgroupAutomount=yes

CgroupReleaseAgentDir="/etc/slurm/cgroup"

ConstrainCores=yes

TaskAffinity=yes

ConstrainRAMSpace=no


Is there something I am missing. I tried it with only TaskAffinity without ConstrainRAMSpace=no but that did not make any difference.


Slurm version = 14.03.8

OS = SL 6.6


Please advise if I need to configure something else to make it work.


Thanks in advanced for your assistance.


Jackie Scoggins


--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University

Reply via email to