Hello,

A quick question!

I am crawling different sources using the "crawl" command. As you know,
I can define my crawling space editing a series of regex available in
the crawl-urlfilter.txt. Based on my tests, I concluded that this file
is actually used by the "urlfilter-regex" plugin. But, in my
nutch-default.txt file, this plugin is actually configured to read info
out of the regex-urlfilter.txt file. 

Am I right when I say that the crawl-urlfilter.txt is overriding the
regex-urlfilter.txt like nutch-site.xml is overriding the
nutch-default.txt file? 

But then, what happen if I use the urlfilter-prefix? Is my regex inside
my crawl-urlfilter.txt file still used?

Thank you,

David


Reply via email to