Not sure why it's complaining about plugins, maybe the config files on the nodes are a bit messed up?
Final suggestion before waiting for schedmd help would be to set DebugFlags=gres in slurm.conf and restart slurmctrld and do scontrol -reconf Just to prove it does work: [franco@charlie1 ~]$ salloc --gres=help Valid gres options are: Perseus[:count] Galaxy[:count] SRME[:count] [franco@charlie1 ~]$ salloc --gres=Galaxy:3 -p d1 salloc: Pending job allocation 98683 salloc: job 98683 queued and waiting for resources On Tue, 2015-01-06 at 20:00 -0800, [email protected] wrote: > > -----Original Message----- > > From: Franco Broi [mailto:[email protected]] > > Sent: Wednesday, 7 January 2015 11:47 AM > > To: slurm-dev > > Subject: [slurm-dev] Re: gres without plugin > > > > > > > > Anything in the slurmctrld log? > > For only a couple of nodes I'm getting messages every 5 minutes like: > [2015-01-07T14:52:03.211] error: gres_plugin_node_config_unpack: no plugin > configured to unpack data type memdir from node c007 > > Gareth > > > > > > > On Tue, 2015-01-06 at 15:03 -0800, [email protected] wrote: > > > We're trying to follow http://slurm.schedmd.com/gres.html to schedule > > requested use of /dev/shm using the term 'memdir' without success so > > far. > > > > > > In slurm.conf we have: > > > GresTypes=memdir > > > And > > > NodeName=DEFAULT Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 > > > RealMemory=131072 Gres=memdir:64 And it may matter that we have: > > > FastSchedule=2 > > > > > > And each node has (The autogenerated bit is from Bright Cluster > > Manager): > > > > cat /etc/slurm/gres.conf > > > # This section of this file was automatically generated by cmd. Do > > not edit manually! > > > # BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE Name=gpu Name=mic > > > # END AUTOGENERATED SECTION -- DO NOT REMOVE > > > Name=memdir Count=64 > > > > > > (we will need to vary both these later to customize the resource > > > available on different nodes) > > > > > > (we'd like to try using 64G instead of 64 but just want it working > > > first) > > > > > > The resource seems to be set for a node: > > > > scontrol show node c001 > > > NodeName=c001 Arch=x86_64 CoresPerSocket=10 > > > CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.08 Features=(null) > > > Gres=memdir:64 > > > NodeAddr=c001 NodeHostName=c001 Version=14.03.0 > > > OS=Linux RealMemory=131072 AllocMem=0 Sockets=2 Boards=1 > > > State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 > > > BootTime=2014-12-23T12:03:22 SlurmdStartTime=2014-12-23T01:06:05 > > > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > > > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > > > > And seems to be available to use in principle: > > > > salloc --gres=help > > > Valid gres options are: > > > memdir[:count] > > > > > > But is not useable in practice: > > > > salloc --gres=memdir:16 > > > salloc: error: Job submit/allocate failed: Invalid generic resource > > > (gres) specification > > > > > > Can anyone see where we are going wrong? > > > > > > Gareth Williams > > > > > > ps. At some point we will also want to schedule gpus.
