[slurm-dev] Problem with GRES and GPU

Nicolas Bigaouette Fri, 13 Jan 2012 08:57:31 -0800

Hi all,

A user reported a problem when submitting GPU jobs.


The particular machine has 3 nvidia GPUs:
0.  GeForce GTX 580
1.  GeForce 210
2.  GeForce GTX 580
The two 580 are for GPGPU, while the 210 is only to drive a display. I
don't want jobs running on the 210, so I /etc/slurm/gres.conf contains this:
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia2

The first GPU job to be submitted runs on the first 580 (id 0) but the
second submitted job will not run on the second 580 (assuming the first one
is still busy/allocated). The code will run on the GT 210 which 1) have low
perfomance and 2) pisses off the user of the display.

What I think happens is that the first job gets submitted correctly and
runs on the first GTX. But the second job does not get scheduled a GPU by
slurm. Since the GT 210 is not controlled (or "hidden") by slurm, the code
can see it and thus runs there...

What could be wrong?? I'm attaching the config file I'm using and the logs.

Thanks a lot.

Regards

Nicolas

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
DebugFlags=NO_CONF_HASH

ControlMachine=NODENAME
#ControlAddr=
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
CacheGroups=0
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#PrologSlurmctld=
#FirstJobId=1
#MaxJobId=999999
GresTypes=gpu
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/pgid
#Prolog=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/tmp/slurm/slurmd
SlurmUser=slurm
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/tmp/slurm
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFs=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
#SelectTypeParameters=
#
#
# JOB PRIORITY
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=NODENAME
#DebugFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=7
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmSchedLogFile=/var/log/slurm/slurmsched.log
SlurmSchedLogLevel=7
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=NODENAME RealMemory=16082 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 
State=UNKNOWN Gres=gpu:2
PartitionName=gpu Nodes=NODENAME Default=YES MaxTime=INFINITE State=UP

[2012-01-13T11:48:05] Launching batch job 1763 for UID 1000
[2012-01-13T11:48:05] [1763] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0
[2012-01-13T11:48:05] [1763] done with job

[2012-01-13T11:48:05] debug2: Processing RPC: REQUEST_SUBMIT_BATCH_JOB from 
uid=1000
[2012-01-13T11:48:05] debug3: JobDesc: user_id=1000 job_id=-1 partition=gpu 
name=SimName
[2012-01-13T11:48:05] debug3:    cpus=1-4294967294 pn_min_cpus=-1
[2012-01-13T11:48:05] debug3:    -N min-[max]: 1-[4294967294]:65534:65534:65534
[2012-01-13T11:48:05] debug3:    pn_min_memory_job=-1 pn_min_tmp_disk=-1
[2012-01-13T11:48:05] debug3:    immediate=0 features=(null) reservation=(null)
[2012-01-13T11:48:05] debug3:    req_nodes=(null) exc_nodes=(null) gres=gpu:1
[2012-01-13T11:48:05] debug3:    time_limit=-1--1 priority=-1 contiguous=0 
shared=-1
[2012-01-13T11:48:05] debug3:    kill_on_node_fail=-1 script=#!/bin/bash

#SBATCH --job-name=SimName
...
[2012-01-13T11:48:05] debug3:    
argv="/home/me/test/output/slurm_20120113_11h48.sh"
[2012-01-13T11:48:05] debug3:    
environment=MANPATH=/usr/lib64/mpi/mpi-openmpi-gcc/usr/share/man:/home/me/.gentoo/java-config-2/current-user-vm/man:/usr/local/share/man:/usr/share/man:/usr/share/binutils-data/x86_64-pc-linux-gnu/2.21.1/man:/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.3/man:/etc/java-config/system-vm/man/:/opt/intel/composerxe-2011.4.191/man/en_US:/opt/cuda/man,VTK_DIR=/usr/lib64/vtk-5.8,KDE_MULTIHEAD=false,...
[2012-01-13T11:48:05] debug3:    stdin=/dev/null 
stdout=/home/me/test/output/out_%j.log stderr=/home/me/test/output/err_%j.log
[2012-01-13T11:48:05] debug3:    work_dir=/home/me/test 
alloc_node:sid=NODENAME:26341
[2012-01-13T11:48:05] debug3:    resp_host=(null) alloc_resp_port=0  
other_port=0
[2012-01-13T11:48:05] debug3:    dependency=(null) account=(null) qos=(null) 
comment=(null)
[2012-01-13T11:48:05] debug3:    mail_type=0 mail_user=(null) nice=55534 
num_tasks=4294967294 open_mode=0 overcommit=-1 acctg_freq=-1
[2012-01-13T11:48:05] debug3:    network=(null) begin=Unknown cpus_per_task=-1 
requeue=-1 licenses=(null)
[2012-01-13T11:48:05] debug3:    end_time=Unknown signal=0@0 wait_all_nodes=-1
[2012-01-13T11:48:05] debug3:    ntasks_per_node=-1 ntasks_per_socket=-1 
ntasks_per_core=-1
[2012-01-13T11:48:05] debug3:    cpus_bind=65534:(null) mem_bind=65534:(null) 
plane_size:65534
[2012-01-13T11:48:05] debug2: found 1 usable nodes from config containing 
NODENAME
[2012-01-13T11:48:05] debug3: _pick_best_nodes: job 1763 idle_nodes 0 
share_nodes 1
[2012-01-13T11:48:05] debug2: sched: JobId=1763 allocated resources: 
NodeList=(null)
[2012-01-13T11:48:05] _slurm_rpc_submit_batch_job JobId=1763 usec=536
[2012-01-13T11:48:05] debug:  sched: Running job scheduler
[2012-01-13T11:48:05] debug2: found 1 usable nodes from config containing 
NODENAME
[2012-01-13T11:48:05] debug3: _pick_best_nodes: job 1763 idle_nodes 0 
share_nodes 1
[2012-01-13T11:48:05] debug3: dist_task: best_fit : using node[0]:socket[1] : 3 
cores available
[2012-01-13T11:48:05] debug3: cons_res: _add_job_to_res: job 1763 act 0
[2012-01-13T11:48:05] debug3: cons_res: adding job 1763 to part gpu row 0
[2012-01-13T11:48:05] debug3: sched: JobId=1763 initiated
[2012-01-13T11:48:05] sched: Allocate JobId=1763 NodeList=NODENAME #CPUs=1
[2012-01-13T11:48:05] debug2: Spawning RPC agent for msg_type 4005
[2012-01-13T11:48:05] debug2: got 1 threads to send out
[2012-01-13T11:48:05] debug2: Tree head got back 0 looking for 1
[2012-01-13T11:48:05] debug3: Tree sending to NODENAME
[2012-01-13T11:48:05] debug3: Writing job id 1763 to header record of job_state 
file
[2012-01-13T11:48:05] debug2: Tree head got back 1
[2012-01-13T11:48:05] debug2: Tree head got them all
[2012-01-13T11:48:05] debug2: node_did_resp NODENAME
[2012-01-13T11:48:05] debug2: Processing RPC: REQUEST_COMPLETE_BATCH_SCRIPT 
from uid=0 JobId=1763
[2012-01-13T11:48:05] completing job 1763
[2012-01-13T11:48:05] debug3: cons_res: _rm_job_from_res: job 1763 action 0
[2012-01-13T11:48:05] debug3: cons_res: removed job 1763 from part gpu row 0
[2012-01-13T11:48:05] debug2: Spawning RPC agent for msg_type 6011
[2012-01-13T11:48:05] sched: job_complete for JobId=1763 successful
[2012-01-13T11:48:05] debug2: _slurm_rpc_complete_batch_script JobId=1763 
usec=213
[2012-01-13T11:48:05] debug2: got 1 threads to send out
[2012-01-13T11:48:05] debug2: Tree head got back 0 looking for 1
[2012-01-13T11:48:05] debug3: Tree sending to NODENAME
[2012-01-13T11:48:05] debug2: Tree head got back 1
[2012-01-13T11:48:05] debug2: Tree head got them all
[2012-01-13T11:48:05] debug2: node_did_resp NODENAME
[2012-01-13T11:48:05] debug:  sched: Running job scheduler
[2012-01-13T11:48:07] debug3: Writing job id 1763 to header record of job_state 
file
[2012-01-13T11:48:07] debug3: Processing RPC: REQUEST_NODE_INFO from uid=1005
[2012-01-13T11:48:07] debug3: _slurm_rpc_dump_nodes, size=153 usec=112
[2012-01-13T11:48:07] debug3: Processing RPC: REQUEST_JOB_INFO from uid=1005
[2012-01-13T11:48:08] debug3: Processing RPC: REQUEST_NODE_INFO from uid=1000
[2012-01-13T11:48:08] debug3: _slurm_rpc_dump_nodes, size=153 usec=100
[2012-01-13T11:48:08] debug3: Processing RPC: REQUEST_JOB_INFO from uid=1000

[2012-01-13T11:48:05] sched: JobId=1763 allocated resources: NodeList=(null)
[2012-01-13T11:48:05] sched: Running job scheduler
[2012-01-13T11:48:05] sched: JobId=1763 initiated
[2012-01-13T11:48:05] sched: Allocate JobId=1763 NodeList=NODENAME #CPUs=1
[2012-01-13T11:48:05] sched: job_complete for JobId=1763 successful
[2012-01-13T11:48:05] sched: Running job scheduler
[2012-01-13T11:48:52] sched: Running job scheduler

[slurm-dev] Problem with GRES and GPU

Reply via email to