Worth giving it a try, I'd say On Jun 12, 2013, at 10:54 AM, Alan V. Cowles <[email protected]> wrote:
> Our machine running the daemon is actually a beefy machine we acquired for > another purpose that later fell through, so we decided to use it here, it has > 16 physical cores, if we set a port range of say 8... 6817-6824, and made > slurmd 6825, would that make a significant difference? > > AC > > On 06/12/2013 01:52 PM, Ralph Castain wrote: >> Not isolating, but blocking. If you have more ports, I believe it will add >> more threads to listen on those ports. Each RPC received blocks until it >> completes, so having more ports should improve thruput. >> >> >> On Jun 12, 2013, at 10:03 AM, "Alan V. Cowles" <[email protected]> wrote: >> >>> No we have it set exclusively to 6817, and slurmdPort 2 lines later to 6818. >>> >>> Is it isolating to processors based on incoming port? >>> >>> AC >>> >>> On 06/12/2013 01:00 PM, Lyn Gerner wrote: >>>> Alan, are you using the port range option on SlurmctldPort (e.g., >>>> SlurmctldPort=6817-6818) in slurm.conf? >>>> >>>> >>>> On Wed, Jun 12, 2013 at 9:55 AM, Alan V. Cowles <[email protected]> >>>> wrote: >>>> >>>> Under the Data Objects section on the following page >>>> http://slurm.schedmd.com/selectplugins.html we find the statement: >>>> >>>> "Slurmctld is a multi-threaded program with independent read and write >>>> locks on each data structure type." >>>> >>>> Which is what lead me to believe it's there, that we perhaps missed a >>>> configuration option. >>>> >>>> AC >>>> >>>> >>>> >>>> On 06/12/2013 12:43 PM, Paul Edmon wrote: >>>> > I'm also interested in this as I've only ever seen one slurmctld and >>>> > only at 100%. It would be good if making slurm multithreaded was on the >>>> > path for the future. I know we will have 100,000's of jobs in flight >>>> > for our config so it would be good to have something that can take that >>>> > load. >>>> > >>>> > -Paul Edmon- >>>> > >>>> > On 06/12/2013 12:30 PM, Alan V. Cowles wrote: >>>> >> Hey Guys, >>>> >> >>>> >> I've seen a few references to the slurmctld as a multithreaded process >>>> >> but it doesn't seem that way. >>>> >> >>>> >> We had a user submit 18000 jobs to our cluster (512 slots) and it shows >>>> >> 512 fully loaded, shows those jobs running, shows about 9800 currently >>>> >> pending, but upon her submission threw errors around 16500. >>>> >> >>>> >> Submitted batch job 16589 >>>> >> Submitted batch job 16590 >>>> >> Submitted batch job 16591 >>>> >> sbatch: error: Slurm temporarily unable to accept job, sleeping and >>>> >> retrying. >>>> >> sbatch: error: Batch job submission failed: Resource temporarily >>>> >> unavailable. >>>> >> >>>> >> The thing we noticed at this time on our master host is that slurmctld >>>> >> was pegging at 100% on one cpu quite regularly and paged 16GB of virtual >>>> >> memory, while all other cpu's were completely idle. >>>> >> >>>> >> We wondered if the pegging out of the control daemon is what led to the >>>> >> submission failure, as we haven't found any limits set anywhere to any >>>> >> specific job or user, and wondered if perhaps we missed a configure >>>> >> option for this when we did our original install. >>>> >> >>>> >> Any thoughts or ideas? We're running Slurm 2.5.4 on RHEL6. >>>> >> >>>> >> AC >>>> >>> >> >> >
