[Nutch-general] Re: mapred -numFetchers gone?

Doug Cutting Sun, 02 Oct 2005 14:19:03 -0700

Rod Taylor wrote:

With -numFetchers gone it appears I require a generate/update for each
fetch which serializes the process.

That's correct. It would be possible to implement something like theformer behaviour by (as before) setting page's nextFetch date to a weekout when they're added to a fetchlist. But, in mapreduce, dbupdate andgenerate are much faster, both since the crawldb doesn't have links (andis thus a lot smaller) and the crawldb update is distributed, so thedowntime between fetcher cycles is much less and this technique may notbe required. Previously dbupdate took nearly as long as fetches, soparallelizing these made a big difference. But now, in my experience,the dbupdate/generate overhead is more like 10-20%. With mapreduce,what percent of the time do you find that you're not fetching?


Doug


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: mapred -numFetchers gone?

Reply via email to