Hi Ali See http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial for the hadoop cluster part. Re- Crawl class : there is now a script for crawling, see bin/crawl.sh. It is easier to modify than the all in one Crawl class and gives a good understanding of the underlying processing steps.
Best Julien On 19 May 2014 11:55, Ali Nazemian <[email protected]> wrote: > Hi, > I was wondering how can I run nutch 1.8 job on hadoop cluster? As far as > know for running nutch 1.7 job on hadoop cluster we could use > org.apache.nutch.crawl.Crawl class. Since this class was deprecated and > removed from nutch 1.8 what class would be responsible for crawling job? > Which class should I put in hadoop command? > Best regards. > -- > A.Nazemian > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

