> If you don't run the DB analysis... ;-) Analysis can eat up a terabyte > for breakfast.
Indeed! we stopped doing db analyze and turned on the scoring per Doug's recommendations - that saved tons of time & resources :) > > That leaves you enough room for your segmetns, db and the space > needed to > > process (about double your db size) > > I'm curious, how do you address the segment life-cycle problem? I'm > still missing a good tool in Nutch to handle this, i.e. to phase-out > ageing segments. > Right now it is a pain in the butt, but we manage as well as we can. I have everything on NFS for segments and i typically generate a full query server of segments at a time. Our bencmark is 10 million urls per server. So as we build segments i generate fetchlists at 100k urls at a time, merge to 10million url segments and then update db, nfs mount to query servers and symlink a date_svername to the segment folder associated with that group and then from the query server. (to offload db to do more work).. Once we hit the expire, i usually dump the segment data, delete and do the same process and update the query server and bounce the application server. i drastically need to automate this too and was thinking of the JMX console to manage this across nodes and write processes within to automate whenever possible. > > The biggest boost you can give your query servers is tons of memory. > SATA > > 150 or Scsi drives at 10krpm is also a bonus. > > > > We have finished migrating to entirely Athlon 64's and i'll be > posting our > > build on the site and wiki > > That would be of big help! I'll hopefully get to that midweek - running a financials upgrade right now and were on hour 58 :) -byron
