The fetching operates segment by segment and won't fetch more than one at the same time. You can get the generation step to build multiple segments in one go but you'd need to modify the script so that the fetching step is called as many times as you have segments + you'd probably need to add some logic for detecting that they've all finished before you move on to the update step. Out of curiosity : why do you want to fetch multiple segments at the same time?
On 19 September 2014 06:00, Meraj A. Khan <[email protected]> wrote: > Hello Folks, > > I am unable to run multiple fetch Map taks for Nutch 1.7 on Hadoop YARN. > > Based on Julien's suggestion I am using the bin/crawl script and did the > following tweaks to trigger a fetch with multiple map tasks , however I am > unable to do so. > > 1. Added maxNumSegments and numFetchers parameters to the generate phase. > $bin/nutch generate $commonOptions $CRAWL_PATH/crawldb $CRAWL_PATH/segments > -maxNumSegments $numFetchers -numFetchers $numFetchers -noFilter > > 2. Removed the topN paramter and removed the noParsing parameter because I > want the parsing to happen at the time of fetch. > $bin/nutch fetch $commonOptions -D fetcher.timelimit.mins=$timeLimitFetch > $CRAWL_PATH/segments/$SEGMENT -threads $numThreads #-noParsing# > > The generate phase is not generating more than one segment. > > And as a result the fetch phase is not creating multiple map tasks, also I > belive the way the script is written it does not allow the fecth to fecth > multiple segements in parallel even if the generate were to generate > multiple segments. > > Can someone please let me know , how they go the script to run in a > distributed Hadoop cluster ? Or if there is a different version of script > that should be used? > > Thanks. > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

