Dennis Kubes wrote:
> The thread dumps pointed me to the Regex URL Filter and greedy pattern 
> matching.  It seems that there is a standing "error" in the JVM where 
> the "wrong" regular expression will cause the program to hang and the 
> cpu to go to 100%.  Basically the behaviors that we are seeing.  And 
> this would make sense as this error wouldn't appear unless the "right" 
> url came up.  See this link for a complete explanation.

Ah, that would explain why I don't see this behavior - one of the first 
changes I do in my installations is to remove regex-urlfilter and 
replace it with a suitable combination of prefix/suffix-urlfilter, or a 
custom one ... Of course, we should solve this issue in our code, if 
possible, but using different urlfilters is a quick workaround.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to