Chris, Rayson- Thanks for the links, and the info.
What I'm hearing is that the performance considerations of my Grid Engine should drive my decision to move/not move to berkeley spooling. Our current installation uses AFS and Panasas filesystems, and we might see a million jobs in a month. Grid Engine performance has not been an issue. So perhaps I'm better served sticking with classic spooling. Mark ________________________________________ From: Chris Dagdigian [[email protected]] Sent: Friday, April 08, 2011 10:44 AM To: Mark Suhovecky Cc: [email protected] Subject: Re: [gridengine users] Berkeley DB (was building RHEL5) Rayson answered the ARCO question - spooling does not matter since the only ARCO involved files that get scraped are the accounting and reporting files classic vs. berkeley is always an interesting question. I also am firmly in the classic spooling camp but we sometimes use berkeley spooling. There seem to be two main things driving the choice: - NFS performance. If your NFS server is poor and you have a large client count than at some point spooling may become a bottleneck. However, on the flip side if you have a great NFS server you can use classic spooling at large scale. One trivial example -- a 4,000-core cluster easily using classic spooling even with more than ~500,000 jobs per day because the NFS service is coming from a small Isilon scale-out NAS system that is running wire-speed across a dozen GbE NICs - Job submission rate and job "churn". I think DanT said this in a blog post years ago but if you expect to need 200+ qsubs per second then you are going to need berkeley spooling. Same goes for clusters that experience huge amounts of job flows or state changes. I have less experience here but in these sorts of systems I think binary spooling makes a real difference My $.02 of course! -chris Mark Suhovecky wrote: > > OK, I got SGE6.2u5p1 to build with version 4.4.20 of Berkeley DB, > and proceeded to try and install Grid Engine on the master host > via inst_sge. > > At some point it tells me that I should install Berkeley DB > on the master host first, so I do "inst_sge -db", which hangs when it tries > to start the DB for the first time. Then, because some > days I'm not terribly bright, I decide to see if the DB will start > at machine reboot. Well, now it hangs when sgedb start > runs from init. Still gotta fix that. > > So let me back up for a minute and ask about Berkeley DB... > > We currently run sge_6.2u1 on 1250 or so hosts, with "classic" > flat file spooling, and it's pretty stable. > When we move to SGE6.2u5p1, we'd like > to use the Arco reporting package, and I'm blithely assuming > that I need a DB with an SQL interface to accomodate this. > > Is that true? Can we use Arco w/o DB spooling? > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
