First I want to say what a great find and great programming.  My company
is thinking about getting the Google mini but I thought if I could get
nutch going on our site that would be great.  I have a few errors and
questions.  

I got java, tomcat, nutch installed all fine.  I am not sure of what
file I need to edit to set NUTCH_JAVA_HOME=/usr/bin/java.  This is a
fedora core 2 box that I am working on.

When I run  

bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log 

the log looks like this..

060107 221039 parsing
file:/root/nutch_binaries/nutch-0.7/conf/nutch-site.xml

060107 221039 No FS indicated, using default:local

060107 221039 crawl started in: crawl.test

060107 221039 rootUrlFile = urls

060107 221039 threads = 10

060107 221039 depth = 3

060107 221040 Created webdb at
LocalFS,/root/nutch_binaries/nutch-0.7/crawl.test/db

Exception in thread "main" java.io.FileNotFoundException: urls (No such
file or directory)

            at java.io.FileInputStream.open(Native Method)

            at java.io.FileInputStream.<init>(Unknown Source)

            at java.io.FileReader.<init>(Unknown Source)

            at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)

            at
org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)

            at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)

 

and if I go to my tomcat site and do a search it get an error...can
someone help me?

http://www.mbproduction.com:8080/en/search.html

 

Thanks,

Andy

Reply via email to