Hi Danny,
No that doesn't work,

starting slurmd: slurmd: error: Couldn't find the specified plugin name for select/cons_res looking at all files
slurmd: error: cannot find select plugin for select/cons_res
slurmd: fatal: Can't find plugin for select/cons_res

David

On 05/07/2015 07:39 PM, Danny Auble wrote:
What happens if you set

PluginDir=/usr/lib64



On May 7, 2015 6:10:19 PM PDT, David Lin <[email protected]> wrote:

    Hi Moe,
    I do have the Slurm plugins installed, and I do see the file
    /usr/lib64/select_cons_res.so  <http://res.so>
    my slurm.conf also has PluginDir=/usr/lib64/slurm
    I've pasted my full slurm.conf below just in case.

    Thanks!
    David


    # slurm.conf file generated by configurator.html.
    # Put this file on all nodes of your cluster.
    # See the slurm.conf man page for more information.
    #
    ControlMachine=rsg-master
    ControlAddr=171.64.74.213  <http://171.64.74.213>
    #BackupController=
    #BackupAddr=
    #
    AuthType=auth/munge
    CacheGroups=0
    #CheckpointType=checkpoint/none
    CryptoType=crypto/munge
    #DisableRootJobs=NO
    #EnforcePartLimits=NO
    #Epilog=
    #EpilogSlurmctld=
    #FirstJobId=1
    #MaxJobId=999999
    #GresTypes=
    #GroupUpdateForce=0
    #GroupUpdateTime=600
    #JobCheckpointDir=/var/slurm/checkpoint
    #JobCredentialPrivateKey=
    #JobCredentialPublicCertificate=
    #JobFileAppend=0
    #JobRequeue=1
    #JobSubmitPlugins=1
    #KillOnBadExit=0
    #LaunchType=launch/slurm
    #Licenses=foo*4,bar
    #MailProg=/bin/mail
    #MaxJobCount=5000
    #MaxStepCount=40000
    #MaxTasksPerNode=128
    MpiDefault=none
    #MpiParams=ports=#-#
    PluginDir=/usr/lib64/slurm
    #PlugStackConfig=
    #PrivateData=jobs
    ProctrackType=proctrack/pgid
    #Prolog=
    #PrologFlags=
    #PrologSlurmctld=
    #PropagatePrioProcess=0
    #PropagateResourceLimits=
    #PropagateResourceLimitsExcept=
    #RebootProgram=
    ReturnToService=2
    #SallocDefaultCommand=
    SlurmctldPidFile=/var/run/slurmctld.pid
    SlurmctldPort=6817
    SlurmdPidFile=/var/run/slurmd.pid
    SlurmdPort=6818
    SlurmdSpoolDir=/var/spool/slurmd
    SlurmUser=slurm
    #SlurmdUser=root
    #SrunEpilog=
    #SrunProlog=
    StateSaveLocation=/var/spool
    SwitchType=switch/none
    #TaskEpilog=
    TaskPlugin=task/none
    #TaskPluginParam=
    #TaskProlog=
    #TopologyPlugin=topology/tree
    #TmpFS=/tmp
    #TrackWCKey=no
    #TreeWidth=
    #UnkillableStepProgram=
    #UsePAM=0
    #
    #
    # TIMERS
    #BatchStartTimeout=10
    #CompleteWait=0
    #EpilogMsgTime=2000
    #GetEnvTimeout=2
    #HealthCheckInterval=0
    #HealthCheckProgram=
    InactiveLimit=0
    KillWait=30
    #MessageTimeout=10
    #ResvOverRun=0
    MinJobAge=300
    #OverTimeLimit=0
    SlurmctldTimeout=120
    SlurmdTimeout=300
    #UnkillableStepTimeout=60
    #VSizeFactor=0
    Waittime=0
    #
    #
    # SCHEDULING
    #DefMemPerCPU=0
    FastSchedule=0
    #MaxMemPerCPU=0
    #SchedulerRootFilter=1
    #SchedulerTimeSlice=30
    SchedulerType=sched/backfill
    SchedulerPort=7321
    SelectType=select/cons_res
    SelectTypeParameters=CR_Core_Memory
    #
    #
    # JOB PRIORITY
    #PriorityFlags=
    #PriorityType=priority/basic
    #PriorityDecayHalfLife=
    #PriorityCalcPeriod=
    #PriorityFavorSmall=
    #PriorityMaxAge=
    #PriorityUsageResetPeriod=
    #PriorityWeightAge=
    #PriorityWeightFairshare=
    #PriorityWeightJobSize=
    #PriorityWeightPartition=
    #PriorityWeightQOS=
    #
    #
    # LOGGING AND ACCOUNTING
    #AccountingStorageEnforce=0
    #AccountingStorageHost=
    #AccountingStorageLoc=
    #AccountingStoragePass=
    #AccountingStoragePort=
    AccountingStorageType=accounting_storage/none
    #AccountingStorageUser=
    AccountingStoreJobComment=YES
    ClusterName=cluster
    #DebugFlags=
    #JobCompHost=
    #JobCompLoc=
    #JobCompPass=
    #JobCompPort=
    JobCompType=jobcomp/none
    #JobCompUser=
    #JobContainerType=job_container/none
    JobAcctGatherFrequency=30
    JobAcctGatherType=jobacct_gather/none
    SlurmctldDebug=9
    SlurmctldLogFile=/var/log/slurmctld.log
    SlurmdDebug=9
    SlurmdLogFile=/var/log/slurmd.log
    #SlurmSchedLogFile=
    #SlurmSchedLogLevel=
    #
    #
    # POWER SAVE SUPPORT FOR IDLE NODES (optional)
    #SuspendProgram=
    #ResumeProgram=
    #SuspendTimeout=
    #ResumeTimeout=
    #ResumeRate=
    #SuspendExcNodes=
    #SuspendExcParts=
    #SuspendRate=
    #SuspendTime=
    #
    #
    # COMPUTE NODES
    NodeName=rsg[4-7]                        State=UNKNOWN CPUs=24 Sockets=2
    CoresPerSocket=6 ThreadsPerCore=2
    NodeName=rsg[12-15]                      State=UNKNOWN CPUs=24 Sockets=2
    CoresPerSocket=6 ThreadsPerCore=2
    NodeName=rsg[16-31]                      State=UNKNOWN CPUs=32 Sockets=2
    CoresPerSocket=8 ThreadsPerCore=2





    On 05/07/2015 05:59 PM, Moe Jette wrote:

        It looks like you didn't install the RPM with Slurm plugins.
        Quoting David Lin <[email protected]>:

            Hello, I am having some issues with the select/cons_res
            mode of slurm. When I tried to execute a job such as srun
            -N 2 -n 2 hostname, I get this $ srun -N 2 -n 2 -q RHEL6
            hostname srun: error: slurm_receive_msg: Zero Bytes were
            transmitted or received srun: error: Unable to allocate
            resources: Zero Bytes were transmitted or received and on
            the slurmctld log, I see this [2015-05-07T16:52:43.264]
            error: we don't have select plugin type 102
            [2015-05-07T16:52:43.264] error:
            select_g_select_jobinfo_unpack: unpack error
            [2015-05-07T16:52:43.264] error: Malformed RPC of type
            REQUEST_RESOURCE_ALLOCATION(4001) received
            [2015-05-07T16:52:43.264] error: slurm_receive_msg: Header
            lengths are longer than data received
            [2015-05-07T16:52:43.274] error: slurm_receive_msg: Header
            lengths are longer than data received All of the nodes as
            well as the controller running slurmctld have the exact
            same slurm.conf, and I've included the relevant section
            below. # SCHEDULING #DefMemPerCPU=0 FastSchedule=0
            #MaxMemPerCPU=0 #SchedulerRootFilter=1
            #SchedulerTimeSlice=30 SchedulerType=sched/backfill
            SchedulerPort=7321 SelectType=select/cons_res
            SelectTypeParameters=CR_Core_Memory Is there some
configuration I'm missing? Thank you! David


Reply via email to