Thanks for the suggestion Fritz. I did not go much further (at least for this past weekend) as I believe most people are satisfied with around 150 jobs per second (if they have better hardware than my laptop!).
I read that before in the Oracle white paper (I forgot whether it was mentioned by DanT or someone else that DRMAA does not need to go through the security & session checking when submitting more than 1 job...). Oh, and another thing I did was -b y vs n, and on Linux it also did not make much difference with the local disk. Rayson On Mon, Apr 11, 2011 at 12:28 PM, Fritz Ferstl <[email protected]> wrote: > Another thing you might want to try is sumitting jobs via DRMAA instead of > via qsub. You'll get roughly 900 jobs / sec submitted with a single DRMAA > client. So this will load the system more than when using qsub. > > Cheers, > > Fritz > > > On Fri, Apr 8, 2011 at 10:44 AM, Chris Dagdigian <[email protected]> > <[email protected]> wrote: > > - Job submission rate and job "churn". I think DanT said this in a blog post > years ago but if you expect to need 200+ qsubs per second then you are going > to need berkeley spooling. > > I did some classic spooling benchmarks during the weekend: > > submitting 1000 jobs with 1 qsub session - 19 sec > submitting 2000 jobs with 2 qsub sessions - 20 sec > submitting 4000 jobs with 4 qsub sessions - 31 sec > > (ie. each session submits 1000 jobs, and qsub sessions are done in parallel.) > > I then modified the classic spooling code so that qmaster does not > write to the disk when jobs are submitted (which is fine as an > experiment, and as long as the qmaster is not restarted, jobs are not > lost), and got identical results. > > My conclusion is that Linux caches most of the disk writes and thus > I/O performance would not affect the qsub performance much. However, > even with a journaling filesystem with consistency, there is a small > chance that some jobs can be lost when the qmaster crashes during the > write operations. On the other hand, hardware resource contention > and/or LOCK_GLOBAL contention might be causing the slowdown for the 4 > parallel qsub case. And even when the number of worker_threads & > listener_threads is increased, the results were the same. > http://gridscheduler.sourceforge.net/htmlman/htmlman5/bootstrap.html > > Hardware: local disk, Thinkpad T510 - 64-bit Linux, 4GB memory, 2 > cores/4 threads, 2.67GHz > > I have not benchmarked Berkeley DB spooling, but I believe I will need > server hardware to get greater than 200 jobs per second qsub > performance. > > Rayson > > > > > Same goes for clusters that experience huge > amounts of job flows or state changes. I have less experience here but in > these sorts of systems I think binary spooling makes a real difference > > My $.02 of course! > > -chris > > > > > Mark Suhovecky wrote: > > OK, I got SGE6.2u5p1 to build with version 4.4.20 of Berkeley DB, > and proceeded to try and install Grid Engine on the master host > via inst_sge. > > At some point it tells me that I should install Berkeley DB > on the master host first, so I do "inst_sge -db", which hangs when it > tries > to start the DB for the first time. Then, because some > days I'm not terribly bright, I decide to see if the DB will start > at machine reboot. Well, now it hangs when sgedb start > runs from init. Still gotta fix that. > > So let me back up for a minute and ask about Berkeley DB... > > We currently run sge_6.2u1 on 1250 or so hosts, with "classic" > flat file spooling, and it's pretty stable. > When we move to SGE6.2u5p1, we'd like > to use the Arco reporting package, and I'm blithely assuming > that I need a DB with an SQL interface to accomodate this. > > Is that true? Can we use Arco w/o DB spooling? > > > _______________________________________________ > users mailing > [email protected]https://gridengine.org/mailman/listinfo/users > > _______________________________________________ > users mailing > [email protected]https://gridengine.org/mailman/listinfo/users > > > -- > > [image: Univa]Fritz Ferstl | CTO and Business Development, EMEA > Univa Corporation <http://www.univa.com/> | The Data Center Optimization > Company > E-Mail: [email protected] | Phone: +49.9471.200.195 | Mobile: > +49.170.819.7390 > > [image: Where Grid Engine lives] > >
<<Grafik1>>
<<Where>>
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
