Thank you. I have not used those settings live. I used defaults. :-) I'm a default guy, look at results, research and modify, then I use HELP. THEN, I commit. :-)
Ok, I will adjust the n settings in my file to 1. I emailed Sigram.com to begin talks for some vertical customization. I am grateful for your comments as I spent the last three weeks understanding the process so I can effectively communicate with an expert. In RE: "Simply rerun it using the same locations." I use Nutch/Solr/Drupal (just a law student!) so I am not sure how to get that done without losing my crawls. So off to google I go! -----Original Message----- From: Andrzej Bialecki [mailto:[email protected]] Sent: Wednesday, November 03, 2010 9:25 AM To: [email protected] Subject: Re: Logs Spin - Active Thread - Spin Waiting - Basic On 2010-11-03 17:12, Eric Martin wrote: > Thank you very much. I can understand what you wrote! My crawl has been > running for days. Here are some new settings I would like to put into > nutch-site.xml. > > 1.) Does anyone see them as rude? See below. > 2.) How can I stop the crawler without losing the crawled data (I'm running > 1.2 and should be able to use kill-nutch but can't find out where to do that) It's not possible in this version of Nutch. > 3.) how do I restart the currently stopped crawler with the new > nutch-site.xml settings? Simply rerun it using the same locations. > <name>http.threads.per.host</name> > <value>50</value> Uh, oh... I'm pretty sure people will be mad at you for this. It's considered impolite to exceed value of 1. Also, you can set the property generate.max.count.per.host to e.g. 500 to limit the number of urls in a fetchlist from any given host. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

