On 8/9/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote: > I get this error immediately upon running an intranet crawl. > > $ nohup time bin/nutch crawl /usr/tmp2/urls.txt -dir /usr/tmp2/100sites > -threads 50 -depth 10 -topN 50001 > $ cat nohup.out > crawl started in: /usr/tmp2/100sites > rootUrlDir = /usr/tmp2/urls.txt > threads = 50 > depth = 10 > topN = 50001 > Injector: starting > Injector: crawlDb: /usr/tmp2/100sites/crawldb > Injector: urlDir: /usr/tmp2/urls.txt > Injector: Converting injected urls to crawl db entries. > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.crawl.Injector.inject(Injector.java:166) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:115) > 3.20 real 1.73 user 0.27 sys > > I have just installed a second instance of nutch on my server with the > following steps: > $ svn co http://svn.apache.org/repos/asf/lucene/nutch/trunk -r \{2007-08-08\} > $ mv trunk nutch_trunk > $ cd nutch_trunk > $ ant clean > $ ant > set NUTCH_HOME to /usr/tmp2/nutch_trunk > modify conf/nutch-site.xml > modify conf/crawl-urlfilter.txt > modify conf/log4j.properties > > In that last step I make sure to point this second instance of nutch to log > to a different hadoop.log so I don't have a conflict with my first instance. > > The first instance is currently running a crawl so I know it works. It's a > nightly build of nutch: > nutch-2007-06-27_06-52-44 > > I've tried running two instances of nutch, both crawling at the same time and > succeeded. I did this on my laptop under cygwin using stock 0.9 installs > (i.e. not nightly builds). I didn't even have to modify the log4j.properties. > > I looked around on the forums for ideas. This search lead me to some results: > "Job failed" "JobClient.java:604" > > But none of the suggestions seemed appropriate. E.g., I'm using the latest > nutch, and the hosts file shouldn't make a difference as I have another > instance on the same box already working. > > The crawl dies immediately (well, in 3.2 seconds to be exact). hadoop.log > doesn't get created, nor does the directory /usr/tmp2/100sites.
That is really weird. I don't know why you are having this error but here are a couple of things you may want to try: 1) Try doing an "ant clean;ant" and try again. 2) Check if your classpath is clean 3) Try to run the inject command by itself: bin/nutch <crawldb> <urldir> I don't know if any of this will help, though... > > --Kai Middleton > > > > > > ____________________________________________________________________________________ > Be a better Heartthrob. Get better relationship answers from someone who > knows. Yahoo! Answers - Check it out. > http://answers.yahoo.com/dir/?link=list&sid=396545433 -- Doğacan Güney
