Best one for this is the wiki. I managed to improve this by implementing as 
many suggestions as possible

http://wiki.apache.org/nutch/OptimizingCrawls
Lewis



-----Original Message-----
From: Arjun Kumar Reddy [mailto:[email protected]]
Sent: 02 February 2011 07:52
To: [email protected]
Subject: How to speed up nutch crawling!

Hi list,

I am Arjun.

I am trying to develop an application in which I'll give a constrained set
of urls to the urls file in Nutch. I am able to crawl these urls and get the
contents of them by reading the data from the segments.

I have crawled by giving the depth 1 as I am no way concerned about the
outlinks or inlinks in the webpage. I only need the contents of that
webpages in the urls file.

But performing this crawl takes time. So, suggest me a way to decrease the
crawl time and increase the speed of crawl. I also dont need indexing
because I am not concerned about the search part.

Kindly suggest me how to speed up the crawl.

Thanks and regards,*
*Ch. Arjun Kumar Reddy

Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Reply via email to