Did you start up the hadoop daemon?

On Wed, Sep 29, 2010 at 3:08 PM, brad <[email protected]> wrote:

> I have tried to move from a local instance of Nutch to a Pseudo-Distributed
> Mode Hadoop Nutch on a single machine.  I set everything up using the How
> to
> Setup Nutch (V1.1) and Hadoop instructions located here:
> http://wiki.apache.org/nutch/NutchHadoopTutorial
>
> Then I moved all my relevant files to the HDFS using:
>
> bin/hadoop dfs -put crawl_www/crawldb /crawl_www/crawldb
> .
>
> I then double checked the files moved ok using
>
> bin/hadoop dfs -ls /crawl_www/crawldb
>
> And that worked fine
> Found 1 items
> drwxr-xr-x   - root supergroup          0 2010-09-28 13:14
> /crawl_www/crawldb/current
>
> I went all the way down to the file level and it appears the files exist
> bin/hadoop dfs -ls /crawl_www/crawldb/current/part-00000
>
> Found 2 items
> -rw-r--r--   1 root supergroup 2375690617 2010-09-28 13:13
> /crawl_www/crawldb/current/part-00000/data
> -rw-r--r--   1 root supergroup   23784625 2010-09-28 13:14
> /crawl_www/crawldb/current/part-00000/index
>
> Also, when I use firefox to browse the hdfs filesystem using
> localhost:50070, everything appears to work perfectly and I can see
> everything.
>
> But, when I try a basic test run of Nutch, I get the following:
> bin/nutch generate crawl_www/crawldb crawl_www/segments -topN 1000
>
>
> INFO  crawl.Generator - Generator: starting at 2010-09-29 11:54:15
> INFO  crawl.Generator - Generator: Selecting best-scoring urls due for
> fetch.
> INFO  crawl.Generator - Generator: filtering: true
> INFO  crawl.Generator - Generator: normalizing: true
> INFO  crawl.Generator - Generator: topN: 1000
> ERROR crawl.Generator - Generator:
> org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> hdfs://localhost:9000/user/root/crawl_www/crawldb/current
>
>
> Did I miss on configuration step?  I believe I have checked and double
> checked everything and it appears to look correct.
>
> Any ideas?
>
> Note: this is Nutch 1.2 on Centos Linux 5.5.
>
> Thanks
> Brad
>

Reply via email to