The fetching operates segment by segment and won't fetch more than one at
the same time. You can get the generation step to build multiple segments
in one go but you'd need to modify the script so that the fetching step is
called as many times as you have segments + you'd probably need to add some
logic for detecting that they've all finished before you move on to the
update step.
Out of curiosity : why do you want to fetch multiple segments at the same
time?

On 19 September 2014 06:00, Meraj A. Khan <[email protected]> wrote:

> Hello Folks,
>
> I am  unable to run multiple fetch Map taks for Nutch 1.7 on Hadoop YARN.
>
> Based on Julien's suggestion I am using the bin/crawl script and did the
> following tweaks to trigger a fetch with multiple map tasks , however I am
> unable to do so.
>
> 1. Added maxNumSegments and numFetchers parameters to the generate phase.
> $bin/nutch generate $commonOptions $CRAWL_PATH/crawldb $CRAWL_PATH/segments
> -maxNumSegments $numFetchers -numFetchers $numFetchers -noFilter
>
> 2. Removed the topN paramter and removed the noParsing parameter because I
> want the parsing to happen at the time of fetch.
> $bin/nutch fetch $commonOptions -D fetcher.timelimit.mins=$timeLimitFetch
> $CRAWL_PATH/segments/$SEGMENT -threads $numThreads #-noParsing#
>
> The generate phase is not generating more than one segment.
>
> And as a result the fetch phase is not creating multiple map tasks, also I
> belive the way the script is written it does not allow the fecth to fecth
> multiple segements in parallel  even if the generate were to generate
> multiple segments.
>
> Can someone please let me know , how they go the script to run in a
> distributed Hadoop cluster ? Or if there is a different version of script
> that should be used?
>
> Thanks.
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to