Hi Meraj,

The generator will place all the URLs in a single segment if all they
belong to the same host for politeness reason. Otherwise it will use
whichever value is passed with the -numFetchers parameter in the generation
step.

Why don't you use the crawl script in /bin instead of tinkering with the
(now deprecated) Crawl class? It comes with a good default configuration
and should make your life easier.

Julien


On 28 August 2014 06:47, Meraj A. Khan <[email protected]> wrote:

> Hi All,
>
> I am running Nutch 1.7 on Hadoop 2.3.0 cluster and and I noticed that there
> is only a single reducer in the generate partition job. I am  running in a
> situation where the subsequent fetch is only running in a single map task
> (I believe as a consequence of a single reducer in the earlier phase).  How
> can I force Nutch to do fetch in multiple map tasks , is there a setting to
> force more than one reducers in the generate-partition job to have more map
> tasks ?.
>
> Please also note that I have commented out the code in Crawl.java to not do
> the LInkInversion phase as , I dont need the scoring of the URLS that Nutch
> crawls, every URL is equally important to me.
>
> Thanks.
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to