Hi, Usage: Generator <crawldb> <segments_dir> [-force] [-topN N] *[-numFetchers numFetchers]* [-adddays numDays] [-noFilter] [-noNorm][-maxNumSegments num]
set -numFetchers 10 to use all your slaves. Of course if all your URLs belong to the same host they'll end up being processed by a single mapper. See crawl script > > > > > *############################################## MODIFY THE PARAMETERS > BELOW TO YOUR NEEDS ############################################### set the > number of slaves nodesnumSlaves=1* and further down > * echo "Generating a new segment"** $bin/nutch generate $commonOptions > $CRAWL_PATH/crawldb $CRAWL_PATH/segments -topN $sizeFetchlist -numFetchers > $numSlaves -noFilter* Julien On 7 May 2014 12:09, chethan <[email protected]> wrote: > Hi, > > I'm running Nutch 1.7 on 10 nodes but the fetch happens only on one node, I > realize that this is because the generator has only 1 reduce task and > generated only 1 fetch list, question is how do you change that? I would > want the fetch to happen on all nodes there by improving performance > drastically. Thanks for your help! > > Regards, > > -- > Chethan Prasad > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

