why not using urlfilter-automaton instead? It is much faster than the regex
one

On 3 August 2010 13:19, Torsten Krah
<[email protected]>wrote:

> Am Montag, 2. August 2010, um 20:14:32 schrieb brad:
> >  I do have about 10
> > entries in the regex-urlfilter.txt file, but they are mainly to exclude
> > sites.  For Example:
>
> I've got too this problem with 1.1. nutch often hanging at util.regexp...
> forever.
> It does hang if i just use (in regexfilter property files) something like:
>
> http://www.mydomain.local/
>
> If i change this to be:
>
> http://www\.mydomain\.local/
>
> it does work - i have no glue why i have to escape the "." to be a period
> as
> "." should match the period too. However for me it solved this annoying
> hang
> @java util pattern matching. Maybe you can give this a try - maybe it does
> help, maybe not :-).
>
> You can get more information on "which" regex nutch "hangs" if you
> overwrite
> the extension point or the plugin code and add some debugging line just
> before
> the match call and find some other regex which does match and does not hang
> ;-).
>
> Torsten
>
>
> --
> Bitte senden Sie mir keine Word- oder PowerPoint-Anhänge.
> Siehe http://www.gnu.org/philosophy/no-word-attachments.de.html
>
> Really, I'm not out to destroy Microsoft. That will just be a
> completely unintentional side effect."
>        -- Linus Torvalds
>



-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to