On 2010-11-03 17:12, Eric Martin wrote:
> Thank you very much. I can understand what you wrote! My crawl has been 
> running for days. Here are some new settings I would like to put into 
> nutch-site.xml. 
> 
> 1.) Does anyone see them as rude? 

See below.

> 2.) How can I stop the crawler without losing the crawled data (I'm running 
> 1.2 and should be able to use kill-nutch but can't find out where to do that) 

It's not possible in this version of Nutch.

> 3.) how do I restart the currently stopped crawler with the new 
> nutch-site.xml settings?

Simply rerun it using the same locations.

>   <name>http.threads.per.host</name>
>   <value>50</value>

Uh, oh... I'm pretty sure people will be mad at you for this. It's
considered impolite to exceed value of 1.

Also, you can set the property generate.max.count.per.host to e.g. 500
to limit the number of urls in a fetchlist from any given host.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to