[Nutch-general] Re: details: stackoverflow error

Doug Cutting Wed, 12 Apr 2006 11:15:01 -0700

Stefan Groschupf wrote:

I already suggested to add a kind of timeout mechanism here and had
done this for my installation,
however the patch  suggestion was rejected since it was a 'non
reproducible' problem.

Stefan, do you refer to NUTCH-233?

No:

http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200603.mbox/[EMAIL PROTECTED]

I don't think that's why it was rejected. Spawning an extra thread forevery url and every rule is pretty crude. Hadoop should indeed have abetter mechanism to handle this sort of thing, but there's no reason wecannot also first fix this in the fetcher.

Perhaps we could enhance the logic of the loop at Fetcher.java:320.Currently this exits the fetcher when all threads exceed a timeout.Instead it could kill any thread that exceeds the timeout, and restart anew thread to replace it. So instead of just keeping a count of fetcherthreads, we could maintain a table of all running fetcher threads, eachwith a lastRequestStart time, rather than a global lastRequestStart.Then, in this loop, we can check to see if any thread has exceeded themaximum timeout, and, if it has, kill it and start a new thread. Whenno urls remain, threads will exit and remove themselves from the set ofthreads, so the loop can exit as it does now, when there are no morerunning fetcher threads. Does this make sense? It would prevent allsorts thread hangs, not just in regexes.


Doug


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: details: stackoverflow error

Reply via email to