Anything in the slurmctrld log?
On Tue, 2015-01-06 at 15:03 -0800, [email protected] wrote: > We're trying to follow http://slurm.schedmd.com/gres.html to schedule > requested use of /dev/shm using the term 'memdir' without success so far. > > In slurm.conf we have: > GresTypes=memdir > And > NodeName=DEFAULT Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 > RealMemory=131072 Gres=memdir:64 > And it may matter that we have: > FastSchedule=2 > > And each node has (The autogenerated bit is from Bright Cluster Manager): > > cat /etc/slurm/gres.conf > # This section of this file was automatically generated by cmd. Do not edit > manually! > # BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE > Name=gpu > Name=mic > # END AUTOGENERATED SECTION -- DO NOT REMOVE > Name=memdir Count=64 > > (we will need to vary both these later to customize the resource available on > different nodes) > > (we'd like to try using 64G instead of 64 but just want it working first) > > The resource seems to be set for a node: > > scontrol show node c001 > NodeName=c001 Arch=x86_64 CoresPerSocket=10 > CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.08 Features=(null) > Gres=memdir:64 > NodeAddr=c001 NodeHostName=c001 Version=14.03.0 > OS=Linux RealMemory=131072 AllocMem=0 Sockets=2 Boards=1 > State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2014-12-23T12:03:22 SlurmdStartTime=2014-12-23T01:06:05 > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > And seems to be available to use in principle: > > salloc --gres=help > Valid gres options are: > memdir[:count] > > But is not useable in practice: > > salloc --gres=memdir:16 > salloc: error: Job submit/allocate failed: Invalid generic resource (gres) > specification > > Can anyone see where we are going wrong? > > Gareth Williams > > ps. At some point we will also want to schedule gpus.
