Re: [gridengine users] Berkeley DB (was building RHEL5)

Rayson Ho Mon, 11 Apr 2011 09:40:05 -0700

Thanks for the suggestion Fritz.

I did not go much further (at least for this past weekend) as I believe most
people are satisfied with around 150 jobs per second (if they have better
hardware than my laptop!).


I read that before in the Oracle white paper (I forgot whether it was
mentioned by DanT or someone else that DRMAA does not need to go through the
security & session checking when submitting more than 1 job...). Oh, and
another thing I did was -b y vs n, and on Linux it also did not make much
difference with the local disk.

Rayson



On Mon, Apr 11, 2011 at 12:28 PM, Fritz Ferstl <[email protected]> wrote:

>  Another thing you might want to try is sumitting jobs via DRMAA instead of
> via qsub. You'll get roughly 900 jobs / sec submitted with a single DRMAA
> client. So this will load the system more than when using qsub.
>
> Cheers,
>
> Fritz
>
>
>  On Fri, Apr 8, 2011 at 10:44 AM, Chris Dagdigian <[email protected]> 
> <[email protected]> wrote:
>
>  - Job submission rate and job "churn". I think DanT said this in a blog post
> years ago but if you expect to need 200+ qsubs per second then you are going
> to need berkeley spooling.
>
>  I did some classic spooling benchmarks during the weekend:
>
> submitting 1000 jobs with 1 qsub session  - 19 sec
> submitting 2000 jobs with 2 qsub sessions - 20 sec
> submitting 4000 jobs with 4 qsub sessions - 31 sec
>
> (ie. each session submits 1000 jobs, and qsub sessions are done in parallel.)
>
> I then modified the classic spooling code so that qmaster does not
> write to the disk when jobs are submitted (which is fine as an
> experiment, and as long as the qmaster is not restarted, jobs are not
> lost), and got identical results.
>
> My conclusion is that Linux caches most of the disk writes and thus
> I/O performance would not affect the qsub performance much. However,
> even with a journaling filesystem with consistency, there is a small
> chance that some jobs can be lost when the qmaster crashes during the
> write operations. On the other hand, hardware resource contention
> and/or LOCK_GLOBAL contention might be causing the slowdown for the 4
> parallel qsub case. And even when the number of worker_threads &
> listener_threads is increased, the results were the same.
> http://gridscheduler.sourceforge.net/htmlman/htmlman5/bootstrap.html
>
> Hardware: local disk, Thinkpad T510 - 64-bit Linux, 4GB memory, 2
> cores/4 threads, 2.67GHz
>
> I have not benchmarked Berkeley DB spooling, but I believe I will need
> server hardware to get greater than 200 jobs per second qsub
> performance.
>
> Rayson
>
>
>
>
>  Same goes for clusters that experience huge
> amounts of job flows or state changes. I have less experience here but in
> these sorts of systems I think binary spooling makes a real difference
>
> My $.02 of course!
>
> -chris
>
>
>
>
> Mark Suhovecky wrote:
>
>  OK, I got SGE6.2u5p1 to build with version 4.4.20 of Berkeley DB,
> and proceeded to try and install Grid Engine on the master host
> via inst_sge.
>
>  At some point it tells me that I should install Berkeley DB
> on the master host  first, so I do "inst_sge -db", which hangs when it
> tries
> to start the DB for the first time. Then, because some
> days I'm not terribly bright, I decide to see if the DB will start
> at machine reboot. Well, now it hangs when sgedb start
> runs from init. Still gotta fix that.
>
> So let me back up for a minute and ask about Berkeley DB...
>
> We currently run sge_6.2u1 on 1250 or so hosts, with "classic"
> flat file spooling, and it's pretty stable.
> When we move to SGE6.2u5p1, we'd like
> to use the Arco reporting package, and I'm blithely assuming
> that I need a DB with an SQL interface to accomodate this.
>
> Is that true? Can we use Arco w/o DB spooling?
>
>
>  _______________________________________________
> users mailing 
> [email protected]https://gridengine.org/mailman/listinfo/users
>
>  _______________________________________________
> users mailing 
> [email protected]https://gridengine.org/mailman/listinfo/users
>
>
> --
>
> [image: Univa]Fritz Ferstl | CTO and Business Development, EMEA
> Univa Corporation <http://www.univa.com/> | The Data Center Optimization
> Company
> E-Mail: [email protected] | Phone: +49.9471.200.195 | Mobile:
> +49.170.819.7390
>
> [image: Where Grid Engine lives]
>
>

<<Grafik1>>

<<Where>>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Berkeley DB (was building RHEL5)

Reply via email to