Not sure why it's complaining about plugins, maybe the config files on
the nodes are a bit messed up?

Final suggestion before waiting for schedmd help would be to set
DebugFlags=gres in slurm.conf and restart slurmctrld and do scontrol
-reconf

Just to prove it does work:

[franco@charlie1 ~]$ salloc --gres=help
Valid gres options are:
Perseus[:count]
Galaxy[:count]
SRME[:count]

[franco@charlie1 ~]$ salloc --gres=Galaxy:3 -p d1
salloc: Pending job allocation 98683
salloc: job 98683 queued and waiting for resources



On Tue, 2015-01-06 at 20:00 -0800, [email protected] wrote:
> > -----Original Message-----
> > From: Franco Broi [mailto:[email protected]]
> > Sent: Wednesday, 7 January 2015 11:47 AM
> > To: slurm-dev
> > Subject: [slurm-dev] Re: gres without plugin
> > 
> > 
> > 
> > Anything in the slurmctrld log?
> 
> For only a couple of nodes I'm getting messages every 5 minutes like:
> [2015-01-07T14:52:03.211] error: gres_plugin_node_config_unpack: no plugin 
> configured to unpack data type memdir from node c007
> 
> Gareth
> 
> > 
> > 
> > On Tue, 2015-01-06 at 15:03 -0800, [email protected] wrote:
> > > We're trying to follow http://slurm.schedmd.com/gres.html to schedule
> > requested use of /dev/shm using the term 'memdir' without success so
> > far.
> > >
> > > In slurm.conf we have:
> > >  GresTypes=memdir
> > > And
> > >  NodeName=DEFAULT Sockets=2 CoresPerSocket=10 ThreadsPerCore=1
> > > RealMemory=131072 Gres=memdir:64 And it may matter that we have:
> > >  FastSchedule=2
> > >
> > > And each node has (The autogenerated bit is from Bright Cluster
> > Manager):
> > > > cat /etc/slurm/gres.conf
> > > # This section of this file was automatically generated by cmd. Do
> > not edit manually!
> > > # BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE Name=gpu Name=mic
> > > # END AUTOGENERATED SECTION   -- DO NOT REMOVE
> > > Name=memdir Count=64
> > >
> > > (we will need to vary both these later to customize the resource
> > > available on different nodes)
> > >
> > > (we'd like to try using 64G instead of 64 but just want it working
> > > first)
> > >
> > > The resource seems to be set for a node:
> > > > scontrol show node c001
> > > NodeName=c001 Arch=x86_64 CoresPerSocket=10
> > >    CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.08 Features=(null)
> > >    Gres=memdir:64
> > >    NodeAddr=c001 NodeHostName=c001 Version=14.03.0
> > >    OS=Linux RealMemory=131072 AllocMem=0 Sockets=2 Boards=1
> > >    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
> > >    BootTime=2014-12-23T12:03:22 SlurmdStartTime=2014-12-23T01:06:05
> > >    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
> > >    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
> > >
> > > And seems to be available to use in principle:
> > > > salloc --gres=help
> > > Valid gres options are:
> > > memdir[:count]
> > >
> > > But is not useable in practice:
> > > > salloc --gres=memdir:16
> > > salloc: error: Job submit/allocate failed: Invalid generic resource
> > > (gres) specification
> > >
> > > Can anyone see where we are going wrong?
> > >
> > > Gareth Williams
> > >
> > > ps. At some point we will also want to schedule gpus.

Reply via email to