Nutch 1.7 fetch happening in a single map task.

2014-08-27 Thread Meraj A. Khan
Hi All,

I am running Nutch 1.7 on Hadoop 2.3.0 cluster and and I noticed that there
is only a single reducer in the generate partition job. I am  running in a
situation where the subsequent fetch is only running in a single map task
(I believe as a consequence of a single reducer in the earlier phase).  How
can I force Nutch to do fetch in multiple map tasks , is there a setting to
force more than one reducers in the generate-partition job to have more map
tasks ?.

Please also note that I have commented out the code in Crawl.java to not do
the LInkInversion phase as , I dont need the scoring of the URLS that Nutch
crawls, every URL is equally important to me.

Thanks.


nutch hadoop 2 library

2014-08-27 Thread Ali Nazemian
Hi,
I am going to use hdfs for storing some content on hadoop 2 cluster.
Unfortunately nutch is using hadoop 1 library for running so I have to
update the library jar file to hadoop 2 jar files. Is there anybody that
already did that? It seems that there are many line of codes that should
change in this regards. And I did not find any patch or tutorial for this
purpose.
Best regards.

-- 
A.Nazemian