Hi Slurm community, I have hopefully an easy question regarding cpu/partition configuration in slurm.conf.
BACKGROUND: We are running slurm 16.05.6 built on Ubuntu 14.04 LTS (because 14.04 works with our current bcfg2 xml configuration management servers). Each node has two, 12 core Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz When you run 'cat /proc/cpuinfo' it returns 48 processors because each cores consists of two threads. I want to make sure that we are defining our cpu and available cores to slurm appropriately. What slurm considers a cpu, and what a process considers a thread - all can get mixed up with the semantics. PROBLEM: Most users run R. R is single threaded so when someone submits a job it will take 1 thread and leave the other thread on the core empty. So although a user thinks there are 48 cores available, in actuality they only have the 24 physical available to them. If however they are running an app that can use the multiple threads (Julia?) then things are different. We've been getting by up to this point until a user tried to run a numpy array in his python3.5 app which has resulted in all kinds of cpu overload and memory swap. He's using job arrays of size 32, running one array in each job, and on one node for example 12 of his python apps are running but all 48 cpus are utilized. Load average is 300.0+. Sometimes memory is swapping and sometimes not. Before getting into his submit script I wanted to make sure we are configuring slurm.conf appropriately for our nodes, and then I can make sure he's making the right allocations in his submit scripts. SLURM.CONF Below is our slurm.conf - I assume the defining of our nodes and partitions at the bottom is most suspect. Can anyone advise as to the best way to configure these nodes for cpu utilization? We are using consumable resources for CPU but not for memory at this time. I'll also include the SLURM_ env variables at the bottom (by simply running 'srun env' if that's of help too. It's interesting to me that SLURM_CPUS_ON_NODE=2. Is that correct? Doesn't seem right. ClusterName=marzano ControlMachine=lunchbox ControlAddr=xxxxxxxxx #BackupController= #BackupAddr= # SlurmUser=slurm #SlurmdUser=root SlurmctldPort=6817 SlurmdPort=6818 AuthType=auth/munge #JobCredentialPrivateKey= #JobCredentialPublicCertificate= StateSaveLocation=/slurm.state SlurmdSpoolDir=/tmp/slurmd SwitchType=switch/none MpiDefault=none SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmdPidFile=/var/run/slurm/slurmd.pid ProctrackType=proctrack/pgid #PluginDir= #FirstJobId= ReturnToService=2 #MaxJobCount= #PlugStackConfig=/etc/slurm/plugstack.conf #PropagatePrioProcess= #PropagateResourceLimits= #PropagateResourceLimitsExcept= #Prolog= #Epilog= #SrunProlog= #SrunEpilog= #TaskProlog= #TaskEpilog= #TaskPlugin= #TrackWCKey=no #TreeWidth=50 #TmpFS= #UsePAM= #MailProg=/s/slurm/bin/smail MailProg=/workspace/statlab/bin/smailwrap # # TIMERS SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 # # SCHEDULING SchedulerType=sched/backfill #SchedulerAuth= #SchedulerPort= #SchedulerRootFilter= SelectType=select/cons_res SelectTypeParameters=CR_Core FastSchedule=1 #DefMemPerNode = UNLIMITED #MaxMemPerNode = UNLIMITED #DefMemPerCPU = UNLIMITED MaxMemPerCPU = 2600 PriorityType=priority/multifactor PriorityDecayHalfLife=14-0 #PriorityUsageResetPeriod=14-0 PriorityWeightFairshare=100000 PriorityWeightAge=1000 PriorityWeightPartition=10000 PriorityWeightJobSize=1000 PriorityMaxAge=7-0 PriorityFavorSmall=NO # # LOGGING SlurmctldDebug=6 SlurmctldLogFile=/var/log/slurmctld/slurmctld.log SlurmdDebug=6 SlurmdLogFile=/var/log/slurmd/slurmd.log JobCompType=jobcomp/none #JobCompLoc= # # ACCOUNTING JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 # AccountingStorageType=accounting_storage/slurmdbd AccountingStorageEnforce=limits,qos AccountingStorageHost=lunchbox AccountingStorageLoc=slurm_acct_db AccountingStoragePass=auth/munge AccountingStorageUser=slurm # # COMPUTE NODES NodeName=marzano0[1-8] CPUs=48 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=128827 State=UNKNOWN PartitionName=long Nodes=marzano0[1-4,7-8] Default=NO MaxTime=14-0 State=UP PartitionName=short Nodes=marzano0[5-6] Default=YES MaxTime=4-0 State=UP SLURM_PRIO_PROCESS=0 SRUN_DEBUG=3 SLURM_UMASK=0002 SLURM_CLUSTER_NAME=marzano SLURM_SUBMIT_DIR=/workspace/software/cyana-3.97 SLURM_SUBMIT_HOST=lunchbox SLURM_JOB_NAME=env SLURM_JOB_CPUS_PER_NODE=2 SLURM_NTASKS=1 SLURM_NPROCS=1 SLURM_DISTRIBUTION=cyclic SLURM_JOB_ID=21223 SLURM_JOBID=21223 SLURM_STEP_ID=0 SLURM_STEPID=0 SLURM_NNODES=1 SLURM_JOB_NUM_NODES=1 SLURM_NODELIST=marzano05 SLURM_JOB_PARTITION=short SLURM_TASKS_PER_NODE=1 SLURM_SRUN_COMM_PORT=49261 SLURM_JOB_ACCOUNT=mikec SLURM_JOB_QOS=normal SLURM_STEP_NODELIST=marzano05 SLURM_JOB_NODELIST=marzano05 SLURM_STEP_NUM_NODES=1 SLURM_STEP_NUM_TASKS=1 SLURM_STEP_TASKS_PER_NODE=1 SLURM_STEP_LAUNCHER_PORT=49261 SLURM_SRUN_COMM_HOST=xxxxxxxxx SLURM_TOPOLOGY_ADDR=marzano05 SLURM_TOPOLOGY_ADDR_PATTERN=node TMPDIR=/tmp SLURM_CPUS_ON_NODE=2 SLURM_TASK_PID=23727 SLURM_NODEID=0 SLURM_PROCID=0 SLURM_LOCALID=0 SLURM_LAUNCH_NODE_IPADDR=xxxxxxxx SLURM_GTIDS=0 SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint SLURM_JOB_UID=3691 SLURM_JOB_USER=mikec SLURMD_NODENAME=marzano05