Am 02.11.2010 15:43, schrieb Erlend Garåsen: > Since it defaults to regex-urlfilter.txt, I removed "urlfilter-regex" > from "plugin.includes", so now it is just: > <value>protocol-httpclient|parse-(text|html|tika)|index-(basic|more)|query-(basic|site|url|lang)</value>
i guess there is some misunderstanding. you need urlfilter-regex in plugin.includes nutch-site.xml!! dont remove it, if you want to apply it. it defaults to regex-urlfilter.txt means that if this plugin is included, the patterns are taken from this file. but you have not included the plugin.

