First I want to say what a great find and great programming. My company
is thinking about getting the Google mini but I thought if I could get
nutch going on our site that would be great. I have a few errors and
questions.
I got java, tomcat, nutch installed all fine. I am not sure of what
file I need to edit to set NUTCH_JAVA_HOME=/usr/bin/java. This is a
fedora core 2 box that I am working on.
When I run
bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
the log looks like this..
060107 221039 parsing
file:/root/nutch_binaries/nutch-0.7/conf/nutch-site.xml
060107 221039 No FS indicated, using default:local
060107 221039 crawl started in: crawl.test
060107 221039 rootUrlFile = urls
060107 221039 threads = 10
060107 221039 depth = 3
060107 221040 Created webdb at
LocalFS,/root/nutch_binaries/nutch-0.7/crawl.test/db
Exception in thread "main" java.io.FileNotFoundException: urls (No such
file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileReader.<init>(Unknown Source)
at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
at
org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
and if I go to my tomcat site and do a search it get an error...can
someone help me?
http://www.mbproduction.com:8080/en/search.html
Thanks,
Andy