Hello,

I'm trying to crawl a large number of sites (eventually) one by one. After
playing with Nutch and Solr for a couple of days now, I 'm not really sure
why crawling takes such a long time.

I was crawling ONE web-site, that has 5 pages on it with very minimal text
content with about 10 pictures, and it took ~3 minutes.

- I turned off external-link crawling in the configuration,
- command: bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 2
-topN 10000
- the URL file has one URL in it (MYDOMAIN.COM as an example!),
- and in the conf-crawl-urlfilter.txt has 1 rule set
+^http://([a-z0-9]*\.)*MYDOMAIN.COM/

Is there a way I can speed this up?

thanks,
--i



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-is-my-Nutch-crawling-so-slow-tp4037964.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to