All.

I want to run the RegexURLFilter's main() method for testing the
regex-urlfilter.txt.

I set up NUTCH_HOME and NUTCH_CONF_DIR so I think I set up my environment
correctly.

When I run nutch org.apache.nutch.net.RegexURLFilter I get Exception in
thread "main" java.lang.NoClassDefFoundError:
org/apache/nutch/net/RegexURLFilter.

Assuming this was a classpath issue, I added
NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar to my classpath.

This did not solve the problem, as I am still getting the
NoClassDefFoundError.

So my first question is how to set up my environment correctly for testing
the regex-urlfilter.

Secondly, I want to tune my regex-urlfilter for maximum relevancy of the
crawl result. By now, I have around 50 entries. My second question is if I
can expect any performance impact?

Your help is greatly appreciated.

Kind regards, Thomas Delnoij.

Reply via email to