Chris, Rayson-

Thanks for the links, and the info.

What I'm hearing is that the performance considerations of my Grid Engine
should drive my decision to move/not move to berkeley spooling. 

Our current installation uses  AFS and Panasas filesystems, and 
we might see a million jobs in a month. Grid Engine performance
has not been an issue. So perhaps I'm better served sticking with classic
spooling.

Mark 
________________________________________
From: Chris Dagdigian [[email protected]]
Sent: Friday, April 08, 2011 10:44 AM
To: Mark Suhovecky
Cc: [email protected]
Subject: Re: [gridengine users] Berkeley DB (was building RHEL5)

Rayson answered the ARCO question - spooling does not matter since the
only ARCO involved files that get scraped are the accounting and
reporting files

classic vs. berkeley is always an interesting question.

I also am firmly in the classic spooling camp but we sometimes use
berkeley spooling. There seem to be two main things driving the choice:

- NFS performance. If your NFS server is poor and you have a large
client count than at some point spooling may become a bottleneck.
However, on the flip side if you have a great NFS server you can use
classic spooling at large scale. One trivial example -- a 4,000-core
cluster easily using classic spooling even with more than ~500,000 jobs
per day because the NFS service is coming from a small Isilon scale-out
NAS system that is running wire-speed across a dozen GbE NICs

- Job submission rate and job "churn". I think DanT said this in a blog
post years ago but if you expect to need 200+ qsubs per second then you
are going to need berkeley spooling. Same goes for clusters that
experience huge amounts of job flows or state changes. I have less
experience here but in these sorts of systems I think binary spooling
makes a real difference

My $.02 of course!

-chris




Mark Suhovecky wrote:
>
> OK, I got SGE6.2u5p1 to build with version 4.4.20 of Berkeley DB,
> and proceeded to try and install Grid Engine on the master host
> via inst_sge.
>
>   At some point it tells me that I should install Berkeley DB
> on the master host  first, so I do "inst_sge -db", which hangs when it tries
> to start the DB for the first time. Then, because some
> days I'm not terribly bright, I decide to see if the DB will start
> at machine reboot. Well, now it hangs when sgedb start
> runs from init. Still gotta fix that.
>
> So let me back up for a minute and ask about Berkeley DB...
>
> We currently run sge_6.2u1 on 1250 or so hosts, with "classic"
> flat file spooling, and it's pretty stable.
> When we move to SGE6.2u5p1, we'd like
> to use the Arco reporting package, and I'm blithely assuming
> that I need a DB with an SQL interface to accomodate this.
>
> Is that true? Can we use Arco w/o DB spooling?
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to