All. I want to run the RegexURLFilter's main() method for testing the regex-urlfilter.txt.
I set up NUTCH_HOME and NUTCH_CONF_DIR so I think I set up my environment correctly. When I run nutch org.apache.nutch.net.RegexURLFilter I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/nutch/net/RegexURLFilter. Assuming this was a classpath issue, I added NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar to my classpath. This did not solve the problem, as I am still getting the NoClassDefFoundError. So my first question is how to set up my environment correctly for testing the regex-urlfilter. Secondly, I want to tune my regex-urlfilter for maximum relevancy of the crawl result. By now, I have around 50 entries. My second question is if I can expect any performance impact? Your help is greatly appreciated. Kind regards, Thomas Delnoij.
