Best one for this is the wiki. I managed to improve this by implementing as many suggestions as possible
http://wiki.apache.org/nutch/OptimizingCrawls Lewis -----Original Message----- From: Arjun Kumar Reddy [mailto:[email protected]] Sent: 02 February 2011 07:52 To: [email protected] Subject: How to speed up nutch crawling! Hi list, I am Arjun. I am trying to develop an application in which I'll give a constrained set of urls to the urls file in Nutch. I am able to crawl these urls and get the contents of them by reading the data from the segments. I have crawled by giving the depth 1 as I am no way concerned about the outlinks or inlinks in the webpage. I only need the contents of that webpages in the urls file. But performing this crawl takes time. So, suggest me a way to decrease the crawl time and increase the speed of crawl. I also dont need indexing because I am not concerned about the search part. Kindly suggest me how to speed up the crawl. Thanks and regards,* *Ch. Arjun Kumar Reddy Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

