Re: Nutch fetching on only one node

Julien Nioche Fri, 16 May 2014 15:55:38 -0700

Hi,

Usage: Generator <crawldb> <segments_dir> [-force] [-topN N] *[-numFetchers
numFetchers]* [-adddays numDays] [-noFilter] [-noNorm][-maxNumSegments num]


set -numFetchers 10 to use all your slaves. Of course if all your URLs
belong to the same host they'll end up being processed by a single mapper.

See crawl script


>
>
>
>
> *############################################## MODIFY THE PARAMETERS
> BELOW TO YOUR NEEDS ############################################### set the
> number of slaves nodesnumSlaves=1*


and further down


> *  echo "Generating a new segment"**  $bin/nutch generate $commonOptions
> $CRAWL_PATH/crawldb $CRAWL_PATH/segments -topN $sizeFetchlist -numFetchers
> $numSlaves -noFilter*


Julien


On 7 May 2014 12:09, chethan <[email protected]> wrote:

> Hi,
>
> I'm running Nutch 1.7 on 10 nodes but the fetch happens only on one node, I
> realize that this is because the generator has only 1 reduce task and
> generated only 1 fetch list, question is how do you change that? I would
> want the fetch to happen on all nodes there by improving performance
> drastically. Thanks for your help!
>
> Regards,
>
> --
> Chethan Prasad
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Nutch fetching on only one node

Reply via email to