Hi Mark I am with Chris. Classic meets large scale performance and reliability requirements.
I do not know any compelling reason to use BerkleyDB. Stephen ________________________________________ From: [email protected] [[email protected]] On Behalf Of Mark Suhovecky [[email protected]] Sent: Friday, April 08, 2011 11:08 AM Cc: [email protected] Subject: Re: [gridengine users] Berkeley DB (was building RHEL5) Chris, Rayson- Thanks for the links, and the info. What I'm hearing is that the performance considerations of my Grid Engine should drive my decision to move/not move to berkeley spooling. Our current installation uses AFS and Panasas filesystems, and we might see a million jobs in a month. Grid Engine performance has not been an issue. So perhaps I'm better served sticking with classic spooling. Mark ________________________________________ From: Chris Dagdigian [[email protected]] Sent: Friday, April 08, 2011 10:44 AM To: Mark Suhovecky Cc: [email protected] Subject: Re: [gridengine users] Berkeley DB (was building RHEL5) Rayson answered the ARCO question - spooling does not matter since the only ARCO involved files that get scraped are the accounting and reporting files classic vs. berkeley is always an interesting question. I also am firmly in the classic spooling camp but we sometimes use berkeley spooling. There seem to be two main things driving the choice: - NFS performance. If your NFS server is poor and you have a large client count than at some point spooling may become a bottleneck. However, on the flip side if you have a great NFS server you can use classic spooling at large scale. One trivial example -- a 4,000-core cluster easily using classic spooling even with more than ~500,000 jobs per day because the NFS service is coming from a small Isilon scale-out NAS system that is running wire-speed across a dozen GbE NICs - Job submission rate and job "churn". I think DanT said this in a blog post years ago but if you expect to need 200+ qsubs per second then you are going to need berkeley spooling. Same goes for clusters that experience huge amounts of job flows or state changes. I have less experience here but in these sorts of systems I think binary spooling makes a real difference My $.02 of course! -chris Mark Suhovecky wrote: > > OK, I got SGE6.2u5p1 to build with version 4.4.20 of Berkeley DB, > and proceeded to try and install Grid Engine on the master host > via inst_sge. > > At some point it tells me that I should install Berkeley DB > on the master host first, so I do "inst_sge -db", which hangs when it tries > to start the DB for the first time. Then, because some > days I'm not terribly bright, I decide to see if the DB will start > at machine reboot. Well, now it hangs when sgedb start > runs from init. Still gotta fix that. > > So let me back up for a minute and ask about Berkeley DB... > > We currently run sge_6.2u1 on 1250 or so hosts, with "classic" > flat file spooling, and it's pretty stable. > When we move to SGE6.2u5p1, we'd like > to use the Arco reporting package, and I'm blithely assuming > that I need a DB with an SQL interface to accomodate this. > > Is that true? Can we use Arco w/o DB spooling? > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users --------------------------------------------------------------------- Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system. --------------------------------------------------------------------- _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
