The regex can be tested without the need to run the fetcher with: cat file-with-test-urls | nutch net/nutch/net/RegexURLFilter
Good luck
[EMAIL PROTECTED] wrote:
Hi
Can anyone assist me with why URL's are still being fetched which (i think) match the following regex entries:?
-http:\/\/.*\/.*\/.*\/.*\/.* [NEWLINE] (E-mail client may distort) -.*\.\..* [NEWLINE] (E-mail client may distort) -http:\/\/.*\/.*(print|friend|email|emailto|register|signin|login|logon|signmein|menus|Print|Friend|Email|Emailto|Register|Signin|Login|Logon|Signmein|Menus).* [NEWLINE] (E-mail client may distort)
Can any1 please help me?
Thanks
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote
