Dear all,
I found the following tutorial on the web:
http://wiki.apache.org/nutch/NutchTutorial
It starts with a binary version of Nutch. Unfortunateley, I didn't
found any binary version, just the source code on the web page? So, I
downloaded the latest version and compiled it with "ant". Everything
seems to work, but I'm a little bit confused about the paths and how I
should go on?
Following the tutorial, I have to change some files, but they exist in
several versions:
find . -iname regex-urlfilter.txt
./runtime/local/conf/regex-urlfilter.txt
./conf/regex-urlfilter.txt
The same goes for the "nutch" command, I'm not sure which one is the
right one. When I execute /src/bin/nutch with the following parameters:
./nutch crawl /opt/crawls/ -dir /opt/crawls/ -depth 3 -topN 5
I got an error which I understand that the script can not find the jar files:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/nutch/crawl/Crawler
Caused by: java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.nutch.crawl.Crawler.
Program will exit.
Any help would be nice ;-)
Best regards and thank you for the software!
Tom
--
Dr. Thomas Zastrow
Süsser Str. 5
72074 Tübingen
www.thomas-zastrow.de