To add to this, you might wish to have a look at the rest of the wiki, in 
particular

http://wiki.apache.org/nutch/NutchHadoopTutorial

This is a significant step up from running a crawl command, but it will greatly 
reduce the complexity and disadvantages
of undertaking the type of process you wish on a single workstation.

Lewis
________________________________________
From: Hannes Carl Meyer [[email protected]]
Sent: 26 February 2011 18:02
To: [email protected]
Subject: Re: Can I use the Nutch crawl command for large crawls?

I would not recommend using the Crawl command for large crawls, because:
1. Tuning Hadoop ist not possible at all
2. Incremental Crawling is also pretty difficult because you can't control
the different processes/steps


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Reply via email to