On 8/9/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote:
> I get this error immediately upon running an intranet crawl.
>
> $ nohup time bin/nutch crawl /usr/tmp2/urls.txt -dir /usr/tmp2/100sites 
> -threads 50 -depth 10 -topN 50001
> $ cat nohup.out
> crawl started in: /usr/tmp2/100sites
> rootUrlDir = /usr/tmp2/urls.txt
> threads = 50
> depth = 10
> topN = 50001
> Injector: starting
> Injector: crawlDb: /usr/tmp2/100sites/crawldb
> Injector: urlDir: /usr/tmp2/urls.txt
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:166)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)
>         3.20 real         1.73 user         0.27 sys
>
> I have just installed a second instance of nutch on my server with the 
> following steps:
> $ svn co http://svn.apache.org/repos/asf/lucene/nutch/trunk -r \{2007-08-08\}
> $ mv trunk nutch_trunk
> $ cd nutch_trunk
> $ ant clean
> $ ant
> set NUTCH_HOME to /usr/tmp2/nutch_trunk
> modify conf/nutch-site.xml
> modify conf/crawl-urlfilter.txt
> modify conf/log4j.properties
>
> In that last step I make sure to point this second instance of nutch to log 
> to a different hadoop.log so I don't have a conflict with my first instance.
>
> The first instance is currently running a crawl so I know it works.  It's a 
> nightly build of nutch:
> nutch-2007-06-27_06-52-44
>
> I've tried running two instances of nutch, both crawling at the same time and 
> succeeded.  I did this on my laptop under cygwin using stock 0.9 installs 
> (i.e. not nightly builds).  I didn't even have to modify the log4j.properties.
>
> I looked around on the forums for ideas.  This search lead me to some results:
> "Job failed" "JobClient.java:604"
>
> But none of the suggestions seemed appropriate.  E.g., I'm using the latest 
> nutch, and the hosts file shouldn't make a difference as I have another 
> instance on the same box already working.
>
> The crawl dies immediately (well, in 3.2 seconds to be exact).  hadoop.log 
> doesn't get created, nor does the directory /usr/tmp2/100sites.

That is really weird. I don't know why you are having this error but
here are a couple of things you may want to try:

1) Try doing an "ant clean;ant" and try again.
2) Check if your classpath is clean
3) Try to run the inject command by itself: bin/nutch <crawldb> <urldir>

I don't know if any of this will help, though...

>
> --Kai Middleton
>
>
>
>
>
> ____________________________________________________________________________________
> Be a better Heartthrob. Get better relationship answers from someone who 
> knows. Yahoo! Answers - Check it out.
> http://answers.yahoo.com/dir/?link=list&sid=396545433


-- 
Doğacan Güney

Reply via email to