Alan, are you using the port range option on SlurmctldPort (e.g.,
SlurmctldPort=6817-6818) in slurm.conf<http://slurm.schedmd.com/slurm.conf.html>
?


On Wed, Jun 12, 2013 at 9:55 AM, Alan V. Cowles <[email protected]>wrote:

>
> Under the Data Objects section on the following page
> http://slurm.schedmd.com/selectplugins.html we find the statement:
>
> "Slurmctld is a multi-threaded program with independent read and write
> locks on each data structure type."
>
> Which is what lead me to believe it's there, that we perhaps missed a
> configuration option.
>
> AC
>
>
>
> On 06/12/2013 12:43 PM, Paul Edmon wrote:
> > I'm also interested in this as I've only ever seen one slurmctld and
> > only at 100%.  It would be good if making slurm multithreaded was on the
> > path for the future.  I know we will have 100,000's of jobs in flight
> > for our config so it would be good to have something that can take that
> > load.
> >
> > -Paul Edmon-
> >
> > On 06/12/2013 12:30 PM, Alan V. Cowles wrote:
> >> Hey Guys,
> >>
> >> I've seen a few references to the slurmctld as a multithreaded process
> >> but it doesn't seem that way.
> >>
> >> We had a user submit 18000 jobs to our cluster (512 slots) and it shows
> >> 512 fully loaded, shows those jobs running, shows about 9800 currently
> >> pending, but upon her submission threw errors around 16500.
> >>
> >> Submitted batch job 16589
> >> Submitted batch job 16590
> >> Submitted batch job 16591
> >> sbatch: error: Slurm temporarily unable to accept job, sleeping and
> >> retrying.
> >> sbatch: error: Batch job submission failed: Resource temporarily
> >> unavailable.
> >>
> >> The thing we noticed at this time on our master host is that slurmctld
> >> was pegging at 100% on one cpu quite regularly and paged 16GB of virtual
> >> memory, while all other cpu's were completely idle.
> >>
> >> We wondered if the pegging out of the control daemon is what led to the
> >> submission failure, as we haven't found any limits set anywhere to any
> >> specific job or user, and wondered if perhaps we missed a configure
> >> option for this when we did our original install.
> >>
> >> Any thoughts or ideas? We're running Slurm 2.5.4 on RHEL6.
> >>
> >> AC
>

Reply via email to