No we have it set exclusively to 6817, and slurmdPort 2 lines later to 6818.

Is it isolating to processors based on incoming port?

AC

On 06/12/2013 01:00 PM, Lyn Gerner wrote:
Re: [slurm-dev] Re: Slurmctld multithreaded?
Alan, are you using the port range option on SlurmctldPort (e.g., SlurmctldPort=6817-6818) in slurm.conf <http://slurm.schedmd.com/slurm.conf.html>?


On Wed, Jun 12, 2013 at 9:55 AM, Alan V. Cowles <[email protected] <mailto:[email protected]>> wrote:


    Under the Data Objects section on the following page
    http://slurm.schedmd.com/selectplugins.html we find the statement:

    "Slurmctld is a multi-threaded program with independent read and write
    locks on each data structure type."

    Which is what lead me to believe it's there, that we perhaps missed a
    configuration option.

    AC



    On 06/12/2013 12:43 PM, Paul Edmon wrote:
    > I'm also interested in this as I've only ever seen one slurmctld and
    > only at 100%.  It would be good if making slurm multithreaded
    was on the
    > path for the future.  I know we will have 100,000's of jobs in
    flight
    > for our config so it would be good to have something that can
    take that
    > load.
    >
    > -Paul Edmon-
    >
    > On 06/12/2013 12:30 PM, Alan V. Cowles wrote:
    >> Hey Guys,
    >>
    >> I've seen a few references to the slurmctld as a multithreaded
    process
    >> but it doesn't seem that way.
    >>
    >> We had a user submit 18000 jobs to our cluster (512 slots) and
    it shows
    >> 512 fully loaded, shows those jobs running, shows about 9800
    currently
    >> pending, but upon her submission threw errors around 16500.
    >>
    >> Submitted batch job 16589
    >> Submitted batch job 16590
    >> Submitted batch job 16591
    >> sbatch: error: Slurm temporarily unable to accept job, sleeping and
    >> retrying.
    >> sbatch: error: Batch job submission failed: Resource temporarily
    >> unavailable.
    >>
    >> The thing we noticed at this time on our master host is that
    slurmctld
    >> was pegging at 100% on one cpu quite regularly and paged 16GB
    of virtual
    >> memory, while all other cpu's were completely idle.
    >>
    >> We wondered if the pegging out of the control daemon is what
    led to the
    >> submission failure, as we haven't found any limits set anywhere
    to any
    >> specific job or user, and wondered if perhaps we missed a configure
    >> option for this when we did our original install.
    >>
    >> Any thoughts or ideas? We're running Slurm 2.5.4 on RHEL6.
    >>
    >> AC



Reply via email to