Anything in the slurmctrld log?


On Tue, 2015-01-06 at 15:03 -0800, [email protected] wrote:
> We're trying to follow http://slurm.schedmd.com/gres.html to schedule 
> requested use of /dev/shm using the term 'memdir' without success so far.
> 
> In slurm.conf we have:
>  GresTypes=memdir
> And
>  NodeName=DEFAULT Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 
> RealMemory=131072 Gres=memdir:64
> And it may matter that we have:
>  FastSchedule=2
> 
> And each node has (The autogenerated bit is from Bright Cluster Manager):
> > cat /etc/slurm/gres.conf
> # This section of this file was automatically generated by cmd. Do not edit 
> manually!
> # BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
> Name=gpu
> Name=mic
> # END AUTOGENERATED SECTION   -- DO NOT REMOVE
> Name=memdir Count=64
> 
> (we will need to vary both these later to customize the resource available on 
> different nodes)
> 
> (we'd like to try using 64G instead of 64 but just want it working first)
> 
> The resource seems to be set for a node:
> > scontrol show node c001
> NodeName=c001 Arch=x86_64 CoresPerSocket=10
>    CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.08 Features=(null)
>    Gres=memdir:64
>    NodeAddr=c001 NodeHostName=c001 Version=14.03.0
>    OS=Linux RealMemory=131072 AllocMem=0 Sockets=2 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
>    BootTime=2014-12-23T12:03:22 SlurmdStartTime=2014-12-23T01:06:05
>    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
> 
> And seems to be available to use in principle:
> > salloc --gres=help
> Valid gres options are:
> memdir[:count]
> 
> But is not useable in practice:
> > salloc --gres=memdir:16
> salloc: error: Job submit/allocate failed: Invalid generic resource (gres) 
> specification
> 
> Can anyone see where we are going wrong?
> 
> Gareth Williams
> 
> ps. At some point we will also want to schedule gpus.

Reply via email to