Re: Fetch jumps to 1.0 complete

Dennis Kubes Fri, 04 Aug 2006 14:52:52 -0700

Well, I eliminated the regular expressions and changed the timeout valueon http to 5000 and the max delays to 5 and although I still have sometask running slower and I am getting a few more timeout errors (which isok for what I am doing) it seems to have moved beyond the point at whichit was failing. As soon as I get this running automatically inproduction I am going to try and implement the 339 patch.


Dennis


Andrzej Bialecki wrote:

Dennis Kubes wrote:
I moved off of the most recent dev branches for our "production"system and put them on the release version for 0.8. I only noticedit recently although it may have been happening before and I justdidn't notice it. The one change that I did do that may have made itworse was I removed the crawl-url filter regular expressions for[EMAIL PROTECTED] and
This shouldn't be a problem unless your fetcher hangs on regexprocessing. This is relatively easy to check - pick up one of theremaining longer-running tasks and do a couple of thread dumps, youwill see what occupies most of the time; if you can hook up a profilerin a fast-sampling mode then it's even better ...
-.*(/.+?)/.*?\1/.*?\1/. Andrzej , didn't you say awhile back when wewere looking at regular expressions for a different stalling problemthat you don't use these in your production systems?
True, I don't - I'm using only a combination of prefix/suffix filters.Prefix filters give me the domains of interest, and suffix filtersgive me (more or less) mime types I'm interested in. Any other bordercases I can hardcode in a separate urlfilter, thus avoiding regexescompletely.

Re: Fetch jumps to 1.0 complete

Reply via email to