For the sake of the archives, I will answer my own question here: I had to
add the following line to the bin/nutch script to be able to run
org.apache.nutch.net.RegexURLFilter from the command line:
CLASSPATH=${CLASSPATH}:$NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar
The nutch script overrides the classpath environment variable, so adding the
jar there didn't help.
Rgrds, Thomas Delnoij
On 10/5/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
>
> All.
>
> The problem is actualy a bit different. I was a bit in a hurry when I
> posted the previous message, apologies.
>
> I added both urlfilter-regex.jar and nutch-0.7.1.jar to my classpath.
>
> When I run java org.apache.nutch.net.RegexURLFilter, I am getting
>
> 051005 221040 parsing jar:file:/C:/Personal/vvdb/Nutch/nutch-0.7.1/nutch-
> 0.7.1.jar!/nutch-default.xml
> 051005 221040 parsing jar:file:/C:/Personal/vvdb/Nutch/nutch-0.7.1/nutch-
> 0.7.1.jar!/nutch-site.xml
> 051005 221040 Plugins: directory not found: plugins
> Exception in thread "main" java.lang.ExceptionInInitializerError
> Caused by: java.lang.NullPointerException
> at org.apache.nutch.net.RegexURLFilter.<clinit>(
> RegexURLFilter.java:64)
>
> when I run nutch org.apache.nutch.net.RegexURLFilter, I am getting
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/nutch/net/RegexURLFilter
>
> I know I am missing something obvious, but your help is really
> appreciated.
>
> Kind regards, Thomas Delnoij
>
>
> On 10/5/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> >
> > I was a bit in a hurry when I posted this message, apologies.
> >
> > The problem is actualy a bit different.
> >
> > I added both urlfilter-regex.jar and nutch-0.7.1.jar to my classpath.
> >
> > When I run java org.apache.nutch.net.RegexURLFilter,
> >
> > On 10/5/05, Thomas Delnoij < [EMAIL PROTECTED]> wrote:
> > >
> > > All.
> > >
> > > I want to run the RegexURLFilter's main() method for testing the
> > > regex-urlfilter.txt.
> > >
> > > I set up NUTCH_HOME and NUTCH_CONF_DIR so I think I set up my
> > > environment correctly.
> > >
> > > When I run nutch org.apache.nutch.net.RegexURLFilter I get Exception
> > > in thread "main" java.lang.NoClassDefFoundError:
> > > org/apache/nutch/net/RegexURLFilter.
> > >
> > > Assuming this was a classpath issue, I added
> > > NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar to my
> > > classpath.
> > >
> > > This did not solve the problem, as I am still getting the
> > > NoClassDefFoundError.
> > >
> > > So my first question is how to set up my environment correctly for
> > > testing the regex-urlfilter.
> > >
> > > Secondly, I want to tune my regex-urlfilter for maximum relevancy of
> > > the crawl result. By now, I have around 50 entries. My second question is
> > > if
> > > I can expect any performance impact?
> > >
> > > Your help is greatly appreciated.
> > >
> > > Kind regards, Thomas Delnoij.
> > >
> > >
> > >
> > >
> > >
> > >
> >
>