Hi All,

This is a note on entering the regex strings in the Crawl URL Filter
text (crawl-urlfilter.txt) file.

Make sure that you enter the exclusion "-" strings before the
inclusion "+" strings.

RegexURLFIlter does the regex pattern matching from top to bottom, and
if there is a match then that takes precedence. In such a case, if you
have the inclusion pattern first then the exclusion patterns following
it would not take effect.

For example: if you have the entries like below:

+^http://xyz.com/doc
-^http://xyz.com/doc/new

then the 'new' exclusion will never take effect, as the doc matching
takes precedence.

Regards,
Ravi Chintakunta

Reply via email to