> If you don't run the DB analysis... ;-) Analysis can eat up a terabyte 
> for breakfast.

Indeed!  we stopped doing db analyze and turned on the scoring per Doug's
recommendations - that saved tons of time & resources :)

> > That leaves you enough room for your segmetns, db and the space
> needed to
> > process (about double your db size)
> 
> I'm curious, how do you address the segment life-cycle problem? I'm 
> still missing a good tool in Nutch to handle this, i.e. to phase-out 
> ageing segments.
> 

Right now it is a pain in the butt, but we manage as well as we can.  I
have everything on NFS for segments and i typically generate a full query
server of segments at a time.  Our bencmark is 10 million urls per server.

So as we build segments i generate fetchlists at 100k urls at a time,
merge to 10million url segments and then update db, nfs mount to query
servers and symlink a date_svername to the segment folder associated with
that group and then from the query server. (to offload db to do more work)..

Once we hit the expire, i usually dump the segment data, delete and do the
same process and update the query server and bounce the application server.

i drastically need to automate this too and was thinking of the JMX
console to manage this across nodes and write processes within to automate
whenever possible.

> > The biggest boost you can give your query servers is tons of memory.
> SATA
> > 150 or Scsi drives at 10krpm is also a bonus.
> > 
> > We have finished migrating to entirely Athlon 64's and i'll be
> posting our
> > build on the site and wiki
> 
> That would be of big help!

I'll hopefully get to that midweek - running a financials upgrade right
now and were on hour 58 :)

-byron



Reply via email to