Scaling Nutch 0.8 via Map/Reduce

Goldschmidt, Dave Wed, 04 Jan 2006 12:39:45 -0800

Hi, in working with Map/Reduce in Nutch 0.8, I'd like to distribute
segments to multiple machines via NDFS.  Let's say I've got ~250GB of
hard-drive space per machine; to store terabytes of data, should I
generate a bunch of ~200GB segments and push them out into NDFS?


 

How do I partition/organize these segments?  Randomly?  By URL or host?
The relevant use case is to randomly access a given URL or host---or is
this accomplished via map/reduce?

 

Thanks for any insight or ideas!

 

DaveG

Scaling Nutch 0.8 via Map/Reduce

Reply via email to