> Jon Shoberg wrote:
>>
>>   Is there a way to set the lifetime of a fetching thread?  As in if
>> it can not complete the entire fetching process in X minutes to
>> gracefully give up?
>>
>>   Anyone else experience the fetcher hanging for a long period of time
>> (hour+)?  I'm using 100 threads, 30 per host.  I'm guessing that I
>> have one host which it is "stuck" on.

> Paul van Brouwershaven wrote:
Helle Jon,

I have the same problem here, the fetcher get stuck aftyher running a few hours.

How do you get a good crawler if you everytime must repair the database and start again?


I run on stable hardware so I run everything within a screen process which allows me to interactivly watch whats going on. I'm in the testing phases of a nutch implementation so I pay close attention to it.

My request to experienced nutch users / developers:

The wiki has good info. It would be helful to hear about people's small, medium, and large implementations. What configurations are used? What tweaks to the conf files? What are performance bottle necks? Common implementation problems and how to fix. How have you allowed for dynamic URLs (question marks)?

I'd be willing to aggregate input to wiki entries.

For myself, I'm running a crawling script inside a SCREEN process. This allows me to SSH in and see whats going on at the console and gracesully exit the session. If I don't like a crawling session I'll CTRL-C it and let the script keep going.

The perl script generates a segment with -numFetchers and starts calling the fetcher via a system call.

-j



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to