Try Hadoop'in it up... http://wiki.apache.org/nutch/NutchHadoopTutorial. The version of Nutch in trunk is dependent on a project called Gora which is supposed to help speed things up as well but I have yet to make it work...I'd stick with the tagged version 1.2 and go the Hadoop route.
Best, Adam On Wed, Feb 2, 2011 at 7:39 AM, McGibbney, Lewis John <[email protected]> wrote: > Best one for this is the wiki. I managed to improve this by implementing as > many suggestions as possible > > http://wiki.apache.org/nutch/OptimizingCrawls > Lewis > > > > -----Original Message----- > From: Arjun Kumar Reddy [mailto:[email protected]] > Sent: 02 February 2011 07:52 > To: [email protected] > Subject: How to speed up nutch crawling! > > Hi list, > > I am Arjun. > > I am trying to develop an application in which I'll give a constrained set > of urls to the urls file in Nutch. I am able to crawl these urls and get the > contents of them by reading the data from the segments. > > I have crawled by giving the depth 1 as I am no way concerned about the > outlinks or inlinks in the webpage. I only need the contents of that > webpages in the urls file. > > But performing this crawl takes time. So, suggest me a way to decrease the > crawl time and increase the speed of crawl. I also dont need indexing > because I am not concerned about the search part. > > Kindly suggest me how to speed up the crawl. > > Thanks and regards,* > *Ch. Arjun Kumar Reddy > > Email has been scanned for viruses by Altman Technologies' email management > service - www.altman.co.uk/emailsystems > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education’s Widening Participation Initiative of the > Year 2009 and Herald Society’s Education Initiative of the Year 2009. > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html > > Winner: Times Higher Education’s Outstanding Support for Early Career > Researchers of the Year 2010, GCU as a lead with Universities Scotland > partners. > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html >

