By this time next year, SLURM should be running on some much larger systems than listed below, including those with the slurmd daemons on each compute node. The scalability issues we see are mostly related to the rate of job submissions rather than the system size and we're working on that now.
Moe ________________________________________ From: [email protected] [[email protected]] On Behalf Of Rayson Ho [[email protected]] Sent: Monday, March 21, 2011 8:56 AM To: [email protected] Subject: Re: [slurm-dev] design limits for 2.2? SLURM scalability Seems like SLURM daemons will not be running on each node on Sequoia - slurmd will run on the I/O nodes but not the compute nodes if I read this presentation correctly: Multi-Petascale Computing on the Sequoia Architecture: https://hpcrd.lbl.gov/scidac09/talks/Seager-Sequoia4SciDACv1.pdf Nevertheless, the installations Jette listed are really massive!! The largest known Grid Engine installation is Sun's Ranger at TACC, which only has 62,976 processor cores in 3,936 nodes. As a developer & maintainer of a Grid Engine fork (Oracle ended developing the open-source SGE code-base in 2010, and thus we forked the code and started the pure open-source project called "Open Grid Scheduler"), I think Grid Engine won't be able to scale to those numbers in the near or not so near future! :-( Rayson On Sat, Nov 20, 2010 at 1:49 PM, Jette, Moe <[email protected]> wrote: > I believe that SLURM can manage any machine that HP can build and a customer > can pay for ;-) > > We have not seen any scaling issues and some of the machines running SLURM > today include: > Tianhe-1A in China with 186368 cores > Tera-100 at CEA with 138368 cores and a > BlueGene/L at LLNL with 212992 cores > > We plan to run SLURM on LLNL's 20 PFlop Bluegene/Q system next year with 1.6 > million > processors > (http://www-304.ibm.com/jct03004c/press/us/en/pressrelease/26599.wss) and > I am not expecting any scalability problems, although task launch on the > BlueGene systems > differs from typical Linux systems. > > At the other end of the spectrum, Intel is using SLURM on their 48-core > "cluster on a chip" > (http://www.hpcwire.com/features/Intel-Unveils-48-Core-Research-Chip-78378487.html). > SLURM's architecture with a multitude of plugin options gives it tremendous > flexibility. > > Moe > ________________________________________ > From: [email protected] [[email protected]] On > Behalf Of Andy Riebs [[email protected]] > Sent: Friday, November 19, 2010 8:14 AM > To: [email protected] > Subject: [slurm-dev] design limits for 2.2? > > How large a cluster should one expect to be able to support with Slurm > 2.2? (One suspects that the number is getting rather large!) > > Thanks! > Andy > > -- > Andy Riebs > Hewlett-Packard Company > SCI Solutions > +1-786-263-9743 > My opinions are not necessarily those of HP > > >
