Sorry if the answer to this question should be obvious, but where in
the bin/nutch script do you need to add the following line to be able
to test your regex-urlfilter.txt file from the command line?

CLASSPATH=${CLASSPATH}:$NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar



On 11/29/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> For the sake of the archives, I will answer my own question here: I had to
> add the following line to the bin/nutch script to be able to run
> org.apache.nutch.net.RegexURLFilter from the command line:
>
> CLASSPATH=${CLASSPATH}:$NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar
>
> The nutch script overrides the classpath environment variable, so adding the
> jar there didn't help.
>
> Rgrds, Thomas Delnoij
>
>
> On 10/5/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> >
> > All.
> >
> > The problem is actualy a bit different. I was a bit in a hurry when I
> > posted the previous message, apologies.
> >
> > I added both urlfilter-regex.jar and nutch-0.7.1.jar to my classpath.
> >
> > When I run java org.apache.nutch.net.RegexURLFilter, I am getting
> >
> > 051005 221040 parsing jar:file:/C:/Personal/vvdb/Nutch/nutch-0.7.1/nutch-
> > 0.7.1.jar!/nutch-default.xml
> > 051005 221040 parsing jar:file:/C:/Personal/vvdb/Nutch/nutch-0.7.1/nutch-
> > 0.7.1.jar!/nutch-site.xml
> > 051005 221040 Plugins: directory not found: plugins
> > Exception in thread "main" java.lang.ExceptionInInitializerError
> > Caused by: java.lang.NullPointerException
> >         at org.apache.nutch.net.RegexURLFilter.<clinit>(
> > RegexURLFilter.java:64)
> >
> > when I run nutch org.apache.nutch.net.RegexURLFilter, I am getting
> >
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/nutch/net/RegexURLFilter
> >
> > I know I am missing something obvious, but your help is really
> > appreciated.
> >
> > Kind regards, Thomas Delnoij
> >
> >
> > On 10/5/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> > >
> > > I was a bit in a hurry when I posted this message, apologies.
> > >
> > > The problem is actualy a bit different.
> > >
> > > I added both urlfilter-regex.jar and nutch-0.7.1.jar to my classpath.
> > >
> > > When I run java org.apache.nutch.net.RegexURLFilter,
> > >
> > > On 10/5/05, Thomas Delnoij < [EMAIL PROTECTED]> wrote:
> > > >
> > > > All.
> > > >
> > > > I want to run the RegexURLFilter's main() method for testing the
> > > > regex-urlfilter.txt.
> > > >
> > > > I set up NUTCH_HOME and NUTCH_CONF_DIR so I think I set up my
> > > > environment correctly.
> > > >
> > > > When I run nutch org.apache.nutch.net.RegexURLFilter I get Exception
> > > > in thread "main" java.lang.NoClassDefFoundError:
> > > > org/apache/nutch/net/RegexURLFilter.
> > > >
> > > > Assuming this was a classpath issue, I added
> > > > NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar to my
> > > > classpath.
> > > >
> > > > This did not solve the problem, as I am still getting the
> > > > NoClassDefFoundError.
> > > >
> > > > So my first question is how to set up my environment correctly for
> > > > testing the regex-urlfilter.
> > > >
> > > > Secondly, I want to tune my regex-urlfilter for maximum relevancy of
> > > > the crawl result. By now, I have around 50 entries. My second question 
> > > > is if
> > > > I can expect any performance impact?
> > > >
> > > > Your help is greatly appreciated.
> > > >
> > > > Kind regards, Thomas Delnoij.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>
>

Reply via email to