Re: Error running intranet crawl with 0.8.0-dev

Zaheed Haque Wed, 12 Jul 2006 11:03:33 -0700

Hi:

Create a directory - "crawldb"
then create a sub directory "current" under "crawldb"


then run your bin/nutch inject crawldb urldir.

The latest SVN version already fix the problem but It was committed today.

Cheers

On 7/12/06, Daniel Varela Santoalla <[EMAIL PROTECTED]> wrote:

Hello

I'm having this problem when trying to test intranet crawling . Could
anybody help with this. I've trying a lot of things but without results.
0.7.2 works fine, but I wanted to test the new version.

[EMAIL PROTECTED]:~/tmp/devel/nutch-nightly/bin> ./nutch crawl urls/
-threads 5 -depth 10 -topN 1000 -dir crawl_results
Exception in thread "main" java.io.IOException: Input directory
/var/tmp/daniel/devel/nutch-nightly/bin/crawl_results/crawldb/current in
local is invalid.
         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
         at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
         at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)

--

Daniel Varela Santoalla
European Centre for Medium-Range Weather Forecasts (ECMWF)

Re: Error running intranet crawl with 0.8.0-dev

Reply via email to