Hi All, I was wondering if someone could point me in the right direction for carrying out a distributed crawl. Basically I was to split a crawl over a few machines. Is there a way of just 'fetching' the pages using multiple machines and then merging the results onto a single machine? Can I then run the Nutch indexing process over that single machine?
Thanks Karen
