I get this error immediately upon running an intranet crawl.  

$ nohup time bin/nutch crawl /usr/tmp2/urls.txt -dir /usr/tmp2/100sites 
-threads 50 -depth 10 -topN 50001
$ cat nohup.out
crawl started in: /usr/tmp2/100sites
rootUrlDir = /usr/tmp2/urls.txt
threads = 50
depth = 10
topN = 50001
Injector: starting
Injector: crawlDb: /usr/tmp2/100sites/crawldb
Injector: urlDir: /usr/tmp2/urls.txt
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:166)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)
        3.20 real         1.73 user         0.27 sys

I have just installed a second instance of nutch on my server with the 
following steps:
$ svn co http://svn.apache.org/repos/asf/lucene/nutch/trunk -r \{2007-08-08\}
$ mv trunk nutch_trunk
$ cd nutch_trunk
$ ant clean
$ ant
set NUTCH_HOME to /usr/tmp2/nutch_trunk
modify conf/nutch-site.xml
modify conf/crawl-urlfilter.txt
modify conf/log4j.properties

In that last step I make sure to point this second instance of nutch to log to 
a different hadoop.log so I don't have a conflict with my first instance.

The first instance is currently running a crawl so I know it works.  It's a 
nightly build of nutch:
nutch-2007-06-27_06-52-44

I've tried running two instances of nutch, both crawling at the same time and 
succeeded.  I did this on my laptop under cygwin using stock 0.9 installs (i.e. 
not nightly builds).  I didn't even have to modify the log4j.properties.

I looked around on the forums for ideas.  This search lead me to some results:
"Job failed" "JobClient.java:604"

But none of the suggestions seemed appropriate.  E.g., I'm using the latest 
nutch, and the hosts file shouldn't make a difference as I have another 
instance on the same box already working.

The crawl dies immediately (well, in 3.2 seconds to be exact).  hadoop.log 
doesn't get created, nor does the directory /usr/tmp2/100sites.  

--Kai Middleton




       
____________________________________________________________________________________
Be a better Heartthrob. Get better relationship answers from someone who knows. 
Yahoo! Answers - Check it out. 
http://answers.yahoo.com/dir/?link=list&sid=396545433

Reply via email to