Hello Folks,

I am  unable to run multiple fetch Map taks for Nutch 1.7 on Hadoop YARN.

Based on Julien's suggestion I am using the bin/crawl script and did the
following tweaks to trigger a fetch with multiple map tasks , however I am
unable to do so.

1. Added maxNumSegments and numFetchers parameters to the generate phase.
$bin/nutch generate $commonOptions $CRAWL_PATH/crawldb $CRAWL_PATH/segments
-maxNumSegments $numFetchers -numFetchers $numFetchers -noFilter

2. Removed the topN paramter and removed the noParsing parameter because I
want the parsing to happen at the time of fetch.
$bin/nutch fetch $commonOptions -D fetcher.timelimit.mins=$timeLimitFetch
$CRAWL_PATH/segments/$SEGMENT -threads $numThreads #-noParsing#

The generate phase is not generating more than one segment.

And as a result the fetch phase is not creating multiple map tasks, also I
belive the way the script is written it does not allow the fecth to fecth
multiple segements in parallel  even if the generate were to generate
multiple segments.

Can someone please let me know , how they go the script to run in a
distributed Hadoop cluster ? Or if there is a different version of script
that should be used?

Thanks.

Reply via email to