Hi Meraj, The generator will place all the URLs in a single segment if all they belong to the same host for politeness reason. Otherwise it will use whichever value is passed with the -numFetchers parameter in the generation step.
Why don't you use the crawl script in /bin instead of tinkering with the (now deprecated) Crawl class? It comes with a good default configuration and should make your life easier. Julien On 28 August 2014 06:47, Meraj A. Khan <[email protected]> wrote: > Hi All, > > I am running Nutch 1.7 on Hadoop 2.3.0 cluster and and I noticed that there > is only a single reducer in the generate partition job. I am running in a > situation where the subsequent fetch is only running in a single map task > (I believe as a consequence of a single reducer in the earlier phase). How > can I force Nutch to do fetch in multiple map tasks , is there a setting to > force more than one reducers in the generate-partition job to have more map > tasks ?. > > Please also note that I have commented out the code in Crawl.java to not do > the LInkInversion phase as , I dont need the scoring of the URLS that Nutch > crawls, every URL is equally important to me. > > Thanks. > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

