Error with Hadoop when moving from Local to HDFS Pseudo-Distributed Mode...

brad Wed, 29 Sep 2010 12:07:39 -0700

I have tried to move from a local instance of Nutch to a Pseudo-Distributed
Mode Hadoop Nutch on a single machine.  I set everything up using the How to
Setup Nutch (V1.1) and Hadoop instructions located here:
http://wiki.apache.org/nutch/NutchHadoopTutorial


Then I moved all my relevant files to the HDFS using:

bin/hadoop dfs -put crawl_www/crawldb /crawl_www/crawldb
.

I then double checked the files moved ok using 

bin/hadoop dfs -ls /crawl_www/crawldb

And that worked fine
Found 1 items
drwxr-xr-x   - root supergroup          0 2010-09-28 13:14
/crawl_www/crawldb/current

I went all the way down to the file level and it appears the files exist
bin/hadoop dfs -ls /crawl_www/crawldb/current/part-00000

Found 2 items
-rw-r--r--   1 root supergroup 2375690617 2010-09-28 13:13
/crawl_www/crawldb/current/part-00000/data
-rw-r--r--   1 root supergroup   23784625 2010-09-28 13:14
/crawl_www/crawldb/current/part-00000/index

Also, when I use firefox to browse the hdfs filesystem using
localhost:50070, everything appears to work perfectly and I can see
everything.

But, when I try a basic test run of Nutch, I get the following:
bin/nutch generate crawl_www/crawldb crawl_www/segments -topN 1000


INFO  crawl.Generator - Generator: starting at 2010-09-29 11:54:15
INFO  crawl.Generator - Generator: Selecting best-scoring urls due for
fetch.
INFO  crawl.Generator - Generator: filtering: true
INFO  crawl.Generator - Generator: normalizing: true
INFO  crawl.Generator - Generator: topN: 1000
ERROR crawl.Generator - Generator:
org.apache.hadoop.mapred.InvalidInputException: 
Input path does not exist:
hdfs://localhost:9000/user/root/crawl_www/crawldb/current


Did I miss on configuration step?  I believe I have checked and double
checked everything and it appears to look correct.

Any ideas?

Note: this is Nutch 1.2 on Centos Linux 5.5.

Thanks
Brad

Error with Hadoop when moving from Local to HDFS Pseudo-Distributed Mode...

Reply via email to