RE: Logs Spin - Active Thread - Spin Waiting - Basic

Eric Martin Wed, 03 Nov 2010 09:32:47 -0700

Thank you. I have not used those settings live. I used defaults. :-) I'm a 
default guy, look at results, research and modify, then I use HELP. THEN, I 
commit. :-)

Ok, I will adjust the n settings in my file to 1. I emailed Sigram.com to begin 
talks for some vertical customization. I am grateful for your comments as I 
spent the last three weeks understanding the process so I can effectively 
communicate with an expert.

In RE: "Simply rerun it using the same locations."

I use Nutch/Solr/Drupal (just a law student!) so I am not sure how to get that 
done without losing my crawls. So off to google I go!

-----Original Message-----
From: Andrzej Bialecki [mailto:[email protected]] 
Sent: Wednesday, November 03, 2010 9:25 AM
To: [email protected]
Subject: Re: Logs Spin - Active Thread - Spin Waiting - Basic

On 2010-11-03 17:12, Eric Martin wrote:
> Thank you very much. I can understand what you wrote! My crawl has been 
> running for days. Here are some new settings I would like to put into 
> nutch-site.xml. 
> 
> 1.) Does anyone see them as rude? 

See below.

> 2.) How can I stop the crawler without losing the crawled data (I'm running 
> 1.2 and should be able to use kill-nutch but can't find out where to do that) 

It's not possible in this version of Nutch.

> 3.) how do I restart the currently stopped crawler with the new 
> nutch-site.xml settings?

Simply rerun it using the same locations.

>   <name>http.threads.per.host</name>
>   <value>50</value>

Uh, oh... I'm pretty sure people will be mad at you for this. It's
considered impolite to exceed value of 1.

Also, you can set the property generate.max.count.per.host to e.g. 500
to limit the number of urls in a fetchlist from any given host.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

RE: Logs Spin - Active Thread - Spin Waiting - Basic

Reply via email to