Sorry if the answer to this question should be obvious, but where in
the bin/nutch script do you need to add the following line to be able
to test your regex-urlfilter.txt file from the command line?
CLASSPATH=${CLASSPATH}:$NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar
On 11/29/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> For the sake of the archives, I will answer my own question here: I had to
> add the following line to the bin/nutch script to be able to run
> org.apache.nutch.net.RegexURLFilter from the command line:
>
> CLASSPATH=${CLASSPATH}:$NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar
>
> The nutch script overrides the classpath environment variable, so adding the
> jar there didn't help.
>
> Rgrds, Thomas Delnoij
>
>
> On 10/5/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> >
> > All.
> >
> > The problem is actualy a bit different. I was a bit in a hurry when I
> > posted the previous message, apologies.
> >
> > I added both urlfilter-regex.jar and nutch-0.7.1.jar to my classpath.
> >
> > When I run java org.apache.nutch.net.RegexURLFilter, I am getting
> >
> > 051005 221040 parsing jar:file:/C:/Personal/vvdb/Nutch/nutch-0.7.1/nutch-
> > 0.7.1.jar!/nutch-default.xml
> > 051005 221040 parsing jar:file:/C:/Personal/vvdb/Nutch/nutch-0.7.1/nutch-
> > 0.7.1.jar!/nutch-site.xml
> > 051005 221040 Plugins: directory not found: plugins
> > Exception in thread "main" java.lang.ExceptionInInitializerError
> > Caused by: java.lang.NullPointerException
> > at org.apache.nutch.net.RegexURLFilter.<clinit>(
> > RegexURLFilter.java:64)
> >
> > when I run nutch org.apache.nutch.net.RegexURLFilter, I am getting
> >
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/nutch/net/RegexURLFilter
> >
> > I know I am missing something obvious, but your help is really
> > appreciated.
> >
> > Kind regards, Thomas Delnoij
> >
> >
> > On 10/5/05, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
> > >
> > > I was a bit in a hurry when I posted this message, apologies.
> > >
> > > The problem is actualy a bit different.
> > >
> > > I added both urlfilter-regex.jar and nutch-0.7.1.jar to my classpath.
> > >
> > > When I run java org.apache.nutch.net.RegexURLFilter,
> > >
> > > On 10/5/05, Thomas Delnoij < [EMAIL PROTECTED]> wrote:
> > > >
> > > > All.
> > > >
> > > > I want to run the RegexURLFilter's main() method for testing the
> > > > regex-urlfilter.txt.
> > > >
> > > > I set up NUTCH_HOME and NUTCH_CONF_DIR so I think I set up my
> > > > environment correctly.
> > > >
> > > > When I run nutch org.apache.nutch.net.RegexURLFilter I get Exception
> > > > in thread "main" java.lang.NoClassDefFoundError:
> > > > org/apache/nutch/net/RegexURLFilter.
> > > >
> > > > Assuming this was a classpath issue, I added
> > > > NUTCH_HOME/plugins/urlfilter-regex/urlfilter-regex.jar to my
> > > > classpath.
> > > >
> > > > This did not solve the problem, as I am still getting the
> > > > NoClassDefFoundError.
> > > >
> > > > So my first question is how to set up my environment correctly for
> > > > testing the regex-urlfilter.
> > > >
> > > > Secondly, I want to tune my regex-urlfilter for maximum relevancy of
> > > > the crawl result. By now, I have around 50 entries. My second question
> > > > is if
> > > > I can expect any performance impact?
> > > >
> > > > Your help is greatly appreciated.
> > > >
> > > > Kind regards, Thomas Delnoij.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>
>
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general