why not using urlfilter-automaton instead? It is much faster than the regex one
On 3 August 2010 13:19, Torsten Krah <[email protected]>wrote: > Am Montag, 2. August 2010, um 20:14:32 schrieb brad: > > I do have about 10 > > entries in the regex-urlfilter.txt file, but they are mainly to exclude > > sites. For Example: > > I've got too this problem with 1.1. nutch often hanging at util.regexp... > forever. > It does hang if i just use (in regexfilter property files) something like: > > http://www.mydomain.local/ > > If i change this to be: > > http://www\.mydomain\.local/ > > it does work - i have no glue why i have to escape the "." to be a period > as > "." should match the period too. However for me it solved this annoying > hang > @java util pattern matching. Maybe you can give this a try - maybe it does > help, maybe not :-). > > You can get more information on "which" regex nutch "hangs" if you > overwrite > the extension point or the plugin code and add some debugging line just > before > the match call and find some other regex which does match and does not hang > ;-). > > Torsten > > > -- > Bitte senden Sie mir keine Word- oder PowerPoint-Anhänge. > Siehe http://www.gnu.org/philosophy/no-word-attachments.de.html > > Really, I'm not out to destroy Microsoft. That will just be a > completely unintentional side effect." > -- Linus Torvalds > -- DigitalPebble Ltd Open Source Solutions for Text Engineering http://www.digitalpebble.com

