Hello Folks, I am unable to run multiple fetch Map taks for Nutch 1.7 on Hadoop YARN.
Based on Julien's suggestion I am using the bin/crawl script and did the following tweaks to trigger a fetch with multiple map tasks , however I am unable to do so. 1. Added maxNumSegments and numFetchers parameters to the generate phase. $bin/nutch generate $commonOptions $CRAWL_PATH/crawldb $CRAWL_PATH/segments -maxNumSegments $numFetchers -numFetchers $numFetchers -noFilter 2. Removed the topN paramter and removed the noParsing parameter because I want the parsing to happen at the time of fetch. $bin/nutch fetch $commonOptions -D fetcher.timelimit.mins=$timeLimitFetch $CRAWL_PATH/segments/$SEGMENT -threads $numThreads #-noParsing# The generate phase is not generating more than one segment. And as a result the fetch phase is not creating multiple map tasks, also I belive the way the script is written it does not allow the fecth to fecth multiple segements in parallel even if the generate were to generate multiple segments. Can someone please let me know , how they go the script to run in a distributed Hadoop cluster ? Or if there is a different version of script that should be used? Thanks.

