Dear all,

I found the following tutorial on the web:

http://wiki.apache.org/nutch/NutchTutorial

It starts with a binary version of Nutch. Unfortunateley, I didn't found any binary version, just the source code on the web page? So, I downloaded the latest version and compiled it with "ant". Everything seems to work, but I'm a little bit confused about the paths and how I should go on?

Following the tutorial, I have to change some files, but they exist in several versions:

 find . -iname regex-urlfilter.txt
./runtime/local/conf/regex-urlfilter.txt
./conf/regex-urlfilter.txt

The same goes for the "nutch" command, I'm not sure which one is the right one. When I execute /src/bin/nutch with the following parameters:

./nutch crawl /opt/crawls/ -dir /opt/crawls/ -depth 3 -topN 5

I got an error which I understand that the script can not find the jar files:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/nutch/crawl/Crawler
Caused by: java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawler
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.nutch.crawl.Crawler. Program will exit.


Any help would be nice ;-)

Best regards and thank you for the software!

Tom


--
Dr. Thomas Zastrow
Süsser Str. 5
72074 Tübingen

www.thomas-zastrow.de

Reply via email to