How did you install? My guess is it isn't a full install like Moe said. I would remove the PluginDir option since it will default to where you configured it to be. Based on you pointing to /usr/lib64 as the location on your one node I'm surprised it didn't work.
On May 7, 2015 8:13:35 PM PDT, David Lin <[email protected]> wrote: >Hi Danny, >No that doesn't work, > >starting slurmd: slurmd: error: Couldn't find the specified plugin name > >for select/cons_res looking at all files >slurmd: error: cannot find select plugin for select/cons_res >slurmd: fatal: Can't find plugin for select/cons_res > >David > >On 05/07/2015 07:39 PM, Danny Auble wrote: >> What happens if you set >> >> PluginDir=/usr/lib64 >> >> >> >> On May 7, 2015 6:10:19 PM PDT, David Lin <[email protected]> >wrote: >> >> Hi Moe, >> I do have the Slurm plugins installed, and I do see the file >> /usr/lib64/select_cons_res.so <http://res.so> >> my slurm.conf also has PluginDir=/usr/lib64/slurm >> I've pasted my full slurm.conf below just in case. >> >> Thanks! >> David >> >> >> # slurm.conf file generated by configurator.html. >> # Put this file on all nodes of your cluster. >> # See the slurm.conf man page for more information. >> # >> ControlMachine=rsg-master >> ControlAddr=171.64.74.213 <http://171.64.74.213> >> #BackupController= >> #BackupAddr= >> # >> AuthType=auth/munge >> CacheGroups=0 >> #CheckpointType=checkpoint/none >> CryptoType=crypto/munge >> #DisableRootJobs=NO >> #EnforcePartLimits=NO >> #Epilog= >> #EpilogSlurmctld= >> #FirstJobId=1 >> #MaxJobId=999999 >> #GresTypes= >> #GroupUpdateForce=0 >> #GroupUpdateTime=600 >> #JobCheckpointDir=/var/slurm/checkpoint >> #JobCredentialPrivateKey= >> #JobCredentialPublicCertificate= >> #JobFileAppend=0 >> #JobRequeue=1 >> #JobSubmitPlugins=1 >> #KillOnBadExit=0 >> #LaunchType=launch/slurm >> #Licenses=foo*4,bar >> #MailProg=/bin/mail >> #MaxJobCount=5000 >> #MaxStepCount=40000 >> #MaxTasksPerNode=128 >> MpiDefault=none >> #MpiParams=ports=#-# >> PluginDir=/usr/lib64/slurm >> #PlugStackConfig= >> #PrivateData=jobs >> ProctrackType=proctrack/pgid >> #Prolog= >> #PrologFlags= >> #PrologSlurmctld= >> #PropagatePrioProcess=0 >> #PropagateResourceLimits= >> #PropagateResourceLimitsExcept= >> #RebootProgram= >> ReturnToService=2 >> #SallocDefaultCommand= >> SlurmctldPidFile=/var/run/slurmctld.pid >> SlurmctldPort=6817 >> SlurmdPidFile=/var/run/slurmd.pid >> SlurmdPort=6818 >> SlurmdSpoolDir=/var/spool/slurmd >> SlurmUser=slurm >> #SlurmdUser=root >> #SrunEpilog= >> #SrunProlog= >> StateSaveLocation=/var/spool >> SwitchType=switch/none >> #TaskEpilog= >> TaskPlugin=task/none >> #TaskPluginParam= >> #TaskProlog= >> #TopologyPlugin=topology/tree >> #TmpFS=/tmp >> #TrackWCKey=no >> #TreeWidth= >> #UnkillableStepProgram= >> #UsePAM=0 >> # >> # >> # TIMERS >> #BatchStartTimeout=10 >> #CompleteWait=0 >> #EpilogMsgTime=2000 >> #GetEnvTimeout=2 >> #HealthCheckInterval=0 >> #HealthCheckProgram= >> InactiveLimit=0 >> KillWait=30 >> #MessageTimeout=10 >> #ResvOverRun=0 >> MinJobAge=300 >> #OverTimeLimit=0 >> SlurmctldTimeout=120 >> SlurmdTimeout=300 >> #UnkillableStepTimeout=60 >> #VSizeFactor=0 >> Waittime=0 >> # >> # >> # SCHEDULING >> #DefMemPerCPU=0 >> FastSchedule=0 >> #MaxMemPerCPU=0 >> #SchedulerRootFilter=1 >> #SchedulerTimeSlice=30 >> SchedulerType=sched/backfill >> SchedulerPort=7321 >> SelectType=select/cons_res >> SelectTypeParameters=CR_Core_Memory >> # >> # >> # JOB PRIORITY >> #PriorityFlags= >> #PriorityType=priority/basic >> #PriorityDecayHalfLife= >> #PriorityCalcPeriod= >> #PriorityFavorSmall= >> #PriorityMaxAge= >> #PriorityUsageResetPeriod= >> #PriorityWeightAge= >> #PriorityWeightFairshare= >> #PriorityWeightJobSize= >> #PriorityWeightPartition= >> #PriorityWeightQOS= >> # >> # >> # LOGGING AND ACCOUNTING >> #AccountingStorageEnforce=0 >> #AccountingStorageHost= >> #AccountingStorageLoc= >> #AccountingStoragePass= >> #AccountingStoragePort= >> AccountingStorageType=accounting_storage/none >> #AccountingStorageUser= >> AccountingStoreJobComment=YES >> ClusterName=cluster >> #DebugFlags= >> #JobCompHost= >> #JobCompLoc= >> #JobCompPass= >> #JobCompPort= >> JobCompType=jobcomp/none >> #JobCompUser= >> #JobContainerType=job_container/none >> JobAcctGatherFrequency=30 >> JobAcctGatherType=jobacct_gather/none >> SlurmctldDebug=9 >> SlurmctldLogFile=/var/log/slurmctld.log >> SlurmdDebug=9 >> SlurmdLogFile=/var/log/slurmd.log >> #SlurmSchedLogFile= >> #SlurmSchedLogLevel= >> # >> # >> # POWER SAVE SUPPORT FOR IDLE NODES (optional) >> #SuspendProgram= >> #ResumeProgram= >> #SuspendTimeout= >> #ResumeTimeout= >> #ResumeRate= >> #SuspendExcNodes= >> #SuspendExcParts= >> #SuspendRate= >> #SuspendTime= >> # >> # >> # COMPUTE NODES >> NodeName=rsg[4-7] State=UNKNOWN CPUs=24 >Sockets=2 >> CoresPerSocket=6 ThreadsPerCore=2 >> NodeName=rsg[12-15] State=UNKNOWN CPUs=24 >Sockets=2 >> CoresPerSocket=6 ThreadsPerCore=2 >> NodeName=rsg[16-31] State=UNKNOWN CPUs=32 >Sockets=2 >> CoresPerSocket=8 ThreadsPerCore=2 >> >> >> >> >> >> On 05/07/2015 05:59 PM, Moe Jette wrote: >> >> It looks like you didn't install the RPM with Slurm plugins. >> Quoting David Lin <[email protected]>: >> >> Hello, I am having some issues with the select/cons_res >> mode of slurm. When I tried to execute a job such as srun >> -N 2 -n 2 hostname, I get this $ srun -N 2 -n 2 -q RHEL6 >> hostname srun: error: slurm_receive_msg: Zero Bytes were >> transmitted or received srun: error: Unable to allocate >> resources: Zero Bytes were transmitted or received and on >> the slurmctld log, I see this [2015-05-07T16:52:43.264] >> error: we don't have select plugin type 102 >> [2015-05-07T16:52:43.264] error: >> select_g_select_jobinfo_unpack: unpack error >> [2015-05-07T16:52:43.264] error: Malformed RPC of type >> REQUEST_RESOURCE_ALLOCATION(4001) received >> [2015-05-07T16:52:43.264] error: slurm_receive_msg: >Header >> lengths are longer than data received >> [2015-05-07T16:52:43.274] error: slurm_receive_msg: >Header >> lengths are longer than data received All of the nodes as >> well as the controller running slurmctld have the exact >> same slurm.conf, and I've included the relevant section >> below. # SCHEDULING #DefMemPerCPU=0 FastSchedule=0 >> #MaxMemPerCPU=0 #SchedulerRootFilter=1 >> #SchedulerTimeSlice=30 SchedulerType=sched/backfill >> SchedulerPort=7321 SelectType=select/cons_res >> SelectTypeParameters=CR_Core_Memory Is there some >> configuration I'm missing? Thank you! David >> >>
