If you mean did I run bin/start-all.sh, then yes.  If you mean something
else, then no.

I believe the hadoop daemon is running, since I can browse the hadoop
NameNode filesystem...

-----Original Message-----
From: Steve Cohen [mailto:[email protected]] 
Sent: Wednesday, September 29, 2010 12:17 PM
To: [email protected]
Subject: Re: Error with Hadoop when moving from Local to HDFS
Pseudo-Distributed Mode...

Did you start up the hadoop daemon?

On Wed, Sep 29, 2010 at 3:08 PM, brad <[email protected]> wrote:

> I have tried to move from a local instance of Nutch to a 
> Pseudo-Distributed Mode Hadoop Nutch on a single machine.  I set 
> everything up using the How to Setup Nutch (V1.1) and Hadoop 
> instructions located here:
> http://wiki.apache.org/nutch/NutchHadoopTutorial
>
> Then I moved all my relevant files to the HDFS using:
>
> bin/hadoop dfs -put crawl_www/crawldb /crawl_www/crawldb .
>
> I then double checked the files moved ok using
>
> bin/hadoop dfs -ls /crawl_www/crawldb
>
> And that worked fine
> Found 1 items
> drwxr-xr-x   - root supergroup          0 2010-09-28 13:14
> /crawl_www/crawldb/current
>
> I went all the way down to the file level and it appears the files 
> exist bin/hadoop dfs -ls /crawl_www/crawldb/current/part-00000
>
> Found 2 items
> -rw-r--r--   1 root supergroup 2375690617 2010-09-28 13:13
> /crawl_www/crawldb/current/part-00000/data
> -rw-r--r--   1 root supergroup   23784625 2010-09-28 13:14
> /crawl_www/crawldb/current/part-00000/index
>
> Also, when I use firefox to browse the hdfs filesystem using 
> localhost:50070, everything appears to work perfectly and I can see 
> everything.
>
> But, when I try a basic test run of Nutch, I get the following:
> bin/nutch generate crawl_www/crawldb crawl_www/segments -topN 1000
>
>
> INFO  crawl.Generator - Generator: starting at 2010-09-29 11:54:15 
> INFO  crawl.Generator - Generator: Selecting best-scoring urls due for 
> fetch.
> INFO  crawl.Generator - Generator: filtering: true INFO  
> crawl.Generator - Generator: normalizing: true INFO  crawl.Generator - 
> Generator: topN: 1000 ERROR crawl.Generator - Generator:
> org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> hdfs://localhost:9000/user/root/crawl_www/crawldb/current
>
>
> Did I miss on configuration step?  I believe I have checked and double 
> checked everything and it appears to look correct.
>
> Any ideas?
>
> Note: this is Nutch 1.2 on Centos Linux 5.5.
>
> Thanks
> Brad
>

Reply via email to