We're trying to follow http://slurm.schedmd.com/gres.html to schedule requested 
use of /dev/shm using the term 'memdir' without success so far.

In slurm.conf we have:
 GresTypes=memdir
And
 NodeName=DEFAULT Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 
RealMemory=131072 Gres=memdir:64
And it may matter that we have:
 FastSchedule=2

And each node has (The autogenerated bit is from Bright Cluster Manager):
> cat /etc/slurm/gres.conf
# This section of this file was automatically generated by cmd. Do not edit 
manually!
# BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
Name=gpu
Name=mic
# END AUTOGENERATED SECTION   -- DO NOT REMOVE
Name=memdir Count=64

(we will need to vary both these later to customize the resource available on 
different nodes)

(we'd like to try using 64G instead of 64 but just want it working first)

The resource seems to be set for a node:
> scontrol show node c001
NodeName=c001 Arch=x86_64 CoresPerSocket=10
   CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.08 Features=(null)
   Gres=memdir:64
   NodeAddr=c001 NodeHostName=c001 Version=14.03.0
   OS=Linux RealMemory=131072 AllocMem=0 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2014-12-23T12:03:22 SlurmdStartTime=2014-12-23T01:06:05
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

And seems to be available to use in principle:
> salloc --gres=help
Valid gres options are:
memdir[:count]

But is not useable in practice:
> salloc --gres=memdir:16
salloc: error: Job submit/allocate failed: Invalid generic resource (gres) 
specification

Can anyone see where we are going wrong?

Gareth Williams

ps. At some point we will also want to schedule gpus.

Reply via email to