Hi All,

This is a note on entering the regex strings in the Crawl URL Filter
text (crawl-urlfilter.txt) file.

Make sure that you enter the exclusion "-" strings before the
inclusion "+" strings.

RegexURLFIlter does the regex pattern matching from top to bottom, and
if there is a match then that takes precedence. In such a case, if you
have the inclusion pattern first then the exclusion patterns following
it would not take effect.

For example: if you have the entries like below:

+^http://xyz.com/doc
-^http://xyz.com/doc/new

then the 'new' exclusion will never take effect, as the doc matching
takes precedence.

Regards,
Ravi Chintakunta


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to