Hi All,
I am running Nutch 1.7 on Hadoop 2.3.0 cluster and and I noticed that there
is only a single reducer in the generate partition job. I am running in a
situation where the subsequent fetch is only running in a single map task
(I believe as a consequence of a single reducer in the earlier phase). How
can I force Nutch to do fetch in multiple map tasks , is there a setting to
force more than one reducers in the generate-partition job to have more map
tasks ?.
Please also note that I have commented out the code in Crawl.java to not do
the LInkInversion phase as , I dont need the scoring of the URLS that Nutch
crawls, every URL is equally important to me.
Thanks.