Hi,

On 2/21/07, Oleg V. Konovalov <[EMAIL PROTECTED]> wrote:
[snip]
OK, next, "generate":

-bash-3.00$ bin/nutch generate crawl/crawldb crawl/segments
Generator: starting
Generator: segment: crawl/segments/20070221171048
Generator: Selecting best-scoring urls due for fetch.
Exception in thread "main" java.io.IOException: Input directory 
/user/nutch/crawl/crawldb/current in localhost:9000 is invalid.
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:319)
        at org.apache.nutch.crawl.Generator.main(Generator.java:395)


You configured nutch to look for HDFS at localhost:9000. If default fs
is configured to be HDFS and you give a relative path to any nutch
command (like crawl/crawldb) then nutch (actually hadoop) will assume
that you are accessing /user/<username>/<relative_path>. You either
have to put your crawldb there or configure nutch to use local fs or
change generate's arguments.

[snip]

--
Doğacan Güney

Reply via email to