|
Another thing you might want to try is sumitting jobs via DRMAA
instead of via qsub. You'll get roughly 900 jobs / sec submitted
with a single DRMAA client. So this will load the system more than
when using qsub. Cheers, Fritz On Fri, Apr 8, 2011 at 10:44 AM, Chris Dagdigian <[email protected]> wrote:- Job submission rate and job "churn". I think DanT said this in a blog post years ago but if you expect to need 200+ qsubs per second then you are going to need berkeley spooling.I did some classic spooling benchmarks during the weekend:submitting 1000 jobs with 1 qsub session - 19 sec submitting 2000 jobs with 2 qsub sessions - 20 sec submitting 4000 jobs with 4 qsub sessions - 31 sec (ie. each session submits 1000 jobs, and qsub sessions are done in parallel.) I then modified the classic spooling code so that qmaster does not write to the disk when jobs are submitted (which is fine as an experiment, and as long as the qmaster is not restarted, jobs are not lost), and got identical results. My conclusion is that Linux caches most of the disk writes and thus I/O performance would not affect the qsub performance much. However, even with a journaling filesystem with consistency, there is a small chance that some jobs can be lost when the qmaster crashes during the write operations. On the other hand, hardware resource contention and/or LOCK_GLOBAL contention might be causing the slowdown for the 4 parallel qsub case. And even when the number of worker_threads & listener_threads is increased, the results were the same. http://gridscheduler.sourceforge.net/htmlman/htmlman5/bootstrap.html Hardware: local disk, Thinkpad T510 - 64-bit Linux, 4GB memory, 2 cores/4 threads, 2.67GHz I have not benchmarked Berkeley DB spooling, but I believe I will need server hardware to get greater than 200 jobs per second qsub performance. RaysonSame goes for clusters that experience huge amounts of job flows or state changes. I have less experience here but in these sorts of systems I think binary spooling makes a real difference My $.02 of course! -chris Mark Suhovecky wrote:OK, I got SGE6.2u5p1 to build with version 4.4.20 of Berkeley DB, and proceeded to try and install Grid Engine on the master host via inst_sge. At some point it tells me that I should install Berkeley DB on the master host first, so I do "inst_sge -db", which hangs when it tries to start the DB for the first time. Then, because some days I'm not terribly bright, I decide to see if the DB will start at machine reboot. Well, now it hangs when sgedb start runs from init. Still gotta fix that. So let me back up for a minute and ask about Berkeley DB... We currently run sge_6.2u1 on 1250 or so hosts, with "classic" flat file spooling, and it's pretty stable. When we move to SGE6.2u5p1, we'd like to use the Arco reporting package, and I'm blithely assuming that I need a DB with an SQL interface to accomodate this. Is that true? Can we use Arco w/o DB spooling?_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users --
|
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
Fritz Ferstl | CTO and Business
Development, EMEA