I passed on the question to our resident authority in this area, and here was her reply...
The largest job I could find was a 1K node (16K processes) hello world MPI job on Cab. It took 51 seconds to start the job, print out 16K hello world messages and exit normally. For what it's worth, I ran the same hello world job on Sequoia on 122880 nodes with 16 processes per node for a job with 1966080 total MPI processes. This job took 322 sec and produced an output file of about 230 MB (a whole lot of hello world messages). *************************************************** * Sheila A. Faulkner * * Software Development Group * * Livermore Computing * * Lawrence Livermore National Laboratory * * P.O. Box 808 L-557 * * Livermore, CA 94551 * * Phone: (925)-424-8471 * * Email: [email protected] * *************************************************** > -----Original Message----- > From: Andy Riebs [mailto:[email protected]] > Sent: Thursday, February 28, 2013 12:05 PM > To: slurm-dev > Subject: [slurm-dev] job start-up times? > > > I've been asked (and have no good answer) about how long it takes to > start a SLURM job on an x86_64 cluster that allocates, say, 2000 16-core > nodes (32k processes). My googling skills have failed me on this query; > does anyone have anecdotal evidence or pointers for the time SLURM > requires to start huge jobs? > > Thanks in advance! > Andy > > -- > Andy Riebs > Hewlett-Packard Company > High Performance Computing > +1-786-263-9743 > My opinions are not necessarily those of HP
