Byron Miller wrote:
Actually at mozdex we have consolidated a bit and we are rebuilding under the latest release. For 50 million urls a 200 gig disk is all you need.
If you don't run the DB analysis... ;-) Analysis can eat up a terabyte for breakfast.
That leaves you enough room for your segmetns, db and the space needed to process (about double your db size)
I'm curious, how do you address the segment life-cycle problem? I'm still missing a good tool in Nutch to handle this, i.e. to phase-out ageing segments.
The biggest boost you can give your query servers is tons of memory. SATA 150 or Scsi drives at 10krpm is also a bonus. We have finished migrating to entirely Athlon 64's and i'll be posting our build on the site and wiki
That would be of big help! -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
