[slurm-dev] Re: Issues with cons_res

Danny Auble Thu, 07 May 2015 20:46:43 -0700

How did you install?  My guess is it isn't a full install like Moe said.  I 
would remove the PluginDir option since it will default to where you configured 
it to be.  Based on you pointing to /usr/lib64 as the location on your one node 
I'm surprised it didn't work.


On May 7, 2015 8:13:35 PM PDT, David Lin <[email protected]> wrote:
>Hi Danny,
>No that doesn't work,
>
>starting slurmd: slurmd: error: Couldn't find the specified plugin name
>
>for select/cons_res looking at all files
>slurmd: error: cannot find select plugin for select/cons_res
>slurmd: fatal: Can't find plugin for select/cons_res
>
>David
>
>On 05/07/2015 07:39 PM, Danny Auble wrote:
>> What happens if you set
>>
>> PluginDir=/usr/lib64
>>
>>
>>
>> On May 7, 2015 6:10:19 PM PDT, David Lin <[email protected]>
>wrote:
>>
>>     Hi Moe,
>>     I do have the Slurm plugins installed, and I do see the file
>>     /usr/lib64/select_cons_res.so  <http://res.so>
>>     my slurm.conf also has PluginDir=/usr/lib64/slurm
>>     I've pasted my full slurm.conf below just in case.
>>
>>     Thanks!
>>     David
>>
>>
>>     # slurm.conf file generated by configurator.html.
>>     # Put this file on all nodes of your cluster.
>>     # See the slurm.conf man page for more information.
>>     #
>>     ControlMachine=rsg-master
>>     ControlAddr=171.64.74.213  <http://171.64.74.213>
>>     #BackupController=
>>     #BackupAddr=
>>     #
>>     AuthType=auth/munge
>>     CacheGroups=0
>>     #CheckpointType=checkpoint/none
>>     CryptoType=crypto/munge
>>     #DisableRootJobs=NO
>>     #EnforcePartLimits=NO
>>     #Epilog=
>>     #EpilogSlurmctld=
>>     #FirstJobId=1
>>     #MaxJobId=999999
>>     #GresTypes=
>>     #GroupUpdateForce=0
>>     #GroupUpdateTime=600
>>     #JobCheckpointDir=/var/slurm/checkpoint
>>     #JobCredentialPrivateKey=
>>     #JobCredentialPublicCertificate=
>>     #JobFileAppend=0
>>     #JobRequeue=1
>>     #JobSubmitPlugins=1
>>     #KillOnBadExit=0
>>     #LaunchType=launch/slurm
>>     #Licenses=foo*4,bar
>>     #MailProg=/bin/mail
>>     #MaxJobCount=5000
>>     #MaxStepCount=40000
>>     #MaxTasksPerNode=128
>>     MpiDefault=none
>>     #MpiParams=ports=#-#
>>     PluginDir=/usr/lib64/slurm
>>     #PlugStackConfig=
>>     #PrivateData=jobs
>>     ProctrackType=proctrack/pgid
>>     #Prolog=
>>     #PrologFlags=
>>     #PrologSlurmctld=
>>     #PropagatePrioProcess=0
>>     #PropagateResourceLimits=
>>     #PropagateResourceLimitsExcept=
>>     #RebootProgram=
>>     ReturnToService=2
>>     #SallocDefaultCommand=
>>     SlurmctldPidFile=/var/run/slurmctld.pid
>>     SlurmctldPort=6817
>>     SlurmdPidFile=/var/run/slurmd.pid
>>     SlurmdPort=6818
>>     SlurmdSpoolDir=/var/spool/slurmd
>>     SlurmUser=slurm
>>     #SlurmdUser=root
>>     #SrunEpilog=
>>     #SrunProlog=
>>     StateSaveLocation=/var/spool
>>     SwitchType=switch/none
>>     #TaskEpilog=
>>     TaskPlugin=task/none
>>     #TaskPluginParam=
>>     #TaskProlog=
>>     #TopologyPlugin=topology/tree
>>     #TmpFS=/tmp
>>     #TrackWCKey=no
>>     #TreeWidth=
>>     #UnkillableStepProgram=
>>     #UsePAM=0
>>     #
>>     #
>>     # TIMERS
>>     #BatchStartTimeout=10
>>     #CompleteWait=0
>>     #EpilogMsgTime=2000
>>     #GetEnvTimeout=2
>>     #HealthCheckInterval=0
>>     #HealthCheckProgram=
>>     InactiveLimit=0
>>     KillWait=30
>>     #MessageTimeout=10
>>     #ResvOverRun=0
>>     MinJobAge=300
>>     #OverTimeLimit=0
>>     SlurmctldTimeout=120
>>     SlurmdTimeout=300
>>     #UnkillableStepTimeout=60
>>     #VSizeFactor=0
>>     Waittime=0
>>     #
>>     #
>>     # SCHEDULING
>>     #DefMemPerCPU=0
>>     FastSchedule=0
>>     #MaxMemPerCPU=0
>>     #SchedulerRootFilter=1
>>     #SchedulerTimeSlice=30
>>     SchedulerType=sched/backfill
>>     SchedulerPort=7321
>>     SelectType=select/cons_res
>>     SelectTypeParameters=CR_Core_Memory
>>     #
>>     #
>>     # JOB PRIORITY
>>     #PriorityFlags=
>>     #PriorityType=priority/basic
>>     #PriorityDecayHalfLife=
>>     #PriorityCalcPeriod=
>>     #PriorityFavorSmall=
>>     #PriorityMaxAge=
>>     #PriorityUsageResetPeriod=
>>     #PriorityWeightAge=
>>     #PriorityWeightFairshare=
>>     #PriorityWeightJobSize=
>>     #PriorityWeightPartition=
>>     #PriorityWeightQOS=
>>     #
>>     #
>>     # LOGGING AND ACCOUNTING
>>     #AccountingStorageEnforce=0
>>     #AccountingStorageHost=
>>     #AccountingStorageLoc=
>>     #AccountingStoragePass=
>>     #AccountingStoragePort=
>>     AccountingStorageType=accounting_storage/none
>>     #AccountingStorageUser=
>>     AccountingStoreJobComment=YES
>>     ClusterName=cluster
>>     #DebugFlags=
>>     #JobCompHost=
>>     #JobCompLoc=
>>     #JobCompPass=
>>     #JobCompPort=
>>     JobCompType=jobcomp/none
>>     #JobCompUser=
>>     #JobContainerType=job_container/none
>>     JobAcctGatherFrequency=30
>>     JobAcctGatherType=jobacct_gather/none
>>     SlurmctldDebug=9
>>     SlurmctldLogFile=/var/log/slurmctld.log
>>     SlurmdDebug=9
>>     SlurmdLogFile=/var/log/slurmd.log
>>     #SlurmSchedLogFile=
>>     #SlurmSchedLogLevel=
>>     #
>>     #
>>     # POWER SAVE SUPPORT FOR IDLE NODES (optional)
>>     #SuspendProgram=
>>     #ResumeProgram=
>>     #SuspendTimeout=
>>     #ResumeTimeout=
>>     #ResumeRate=
>>     #SuspendExcNodes=
>>     #SuspendExcParts=
>>     #SuspendRate=
>>     #SuspendTime=
>>     #
>>     #
>>     # COMPUTE NODES
>>     NodeName=rsg[4-7]                        State=UNKNOWN CPUs=24
>Sockets=2
>>     CoresPerSocket=6 ThreadsPerCore=2
>>     NodeName=rsg[12-15]                      State=UNKNOWN CPUs=24
>Sockets=2
>>     CoresPerSocket=6 ThreadsPerCore=2
>>     NodeName=rsg[16-31]                      State=UNKNOWN CPUs=32
>Sockets=2
>>     CoresPerSocket=8 ThreadsPerCore=2
>>
>>
>>
>>
>>
>>     On 05/07/2015 05:59 PM, Moe Jette wrote:
>>
>>         It looks like you didn't install the RPM with Slurm plugins.
>>         Quoting David Lin <[email protected]>:
>>
>>             Hello, I am having some issues with the select/cons_res
>>             mode of slurm. When I tried to execute a job such as srun
>>             -N 2 -n 2 hostname, I get this $ srun -N 2 -n 2 -q RHEL6
>>             hostname srun: error: slurm_receive_msg: Zero Bytes were
>>             transmitted or received srun: error: Unable to allocate
>>             resources: Zero Bytes were transmitted or received and on
>>             the slurmctld log, I see this [2015-05-07T16:52:43.264]
>>             error: we don't have select plugin type 102
>>             [2015-05-07T16:52:43.264] error:
>>             select_g_select_jobinfo_unpack: unpack error
>>             [2015-05-07T16:52:43.264] error: Malformed RPC of type
>>             REQUEST_RESOURCE_ALLOCATION(4001) received
>>             [2015-05-07T16:52:43.264] error: slurm_receive_msg:
>Header
>>             lengths are longer than data received
>>             [2015-05-07T16:52:43.274] error: slurm_receive_msg:
>Header
>>             lengths are longer than data received All of the nodes as
>>             well as the controller running slurmctld have the exact
>>             same slurm.conf, and I've included the relevant section
>>             below. # SCHEDULING #DefMemPerCPU=0 FastSchedule=0
>>             #MaxMemPerCPU=0 #SchedulerRootFilter=1
>>             #SchedulerTimeSlice=30 SchedulerType=sched/backfill
>>             SchedulerPort=7321 SelectType=select/cons_res
>>             SelectTypeParameters=CR_Core_Memory Is there some
>>             configuration I'm missing? Thank you! David 
>>
>>

[slurm-dev] Re: Issues with cons_res

Reply via email to