If you mean did I run bin/start-all.sh, then yes. If you mean something else, then no.
I believe the hadoop daemon is running, since I can browse the hadoop NameNode filesystem... -----Original Message----- From: Steve Cohen [mailto:[email protected]] Sent: Wednesday, September 29, 2010 12:17 PM To: [email protected] Subject: Re: Error with Hadoop when moving from Local to HDFS Pseudo-Distributed Mode... Did you start up the hadoop daemon? On Wed, Sep 29, 2010 at 3:08 PM, brad <[email protected]> wrote: > I have tried to move from a local instance of Nutch to a > Pseudo-Distributed Mode Hadoop Nutch on a single machine. I set > everything up using the How to Setup Nutch (V1.1) and Hadoop > instructions located here: > http://wiki.apache.org/nutch/NutchHadoopTutorial > > Then I moved all my relevant files to the HDFS using: > > bin/hadoop dfs -put crawl_www/crawldb /crawl_www/crawldb . > > I then double checked the files moved ok using > > bin/hadoop dfs -ls /crawl_www/crawldb > > And that worked fine > Found 1 items > drwxr-xr-x - root supergroup 0 2010-09-28 13:14 > /crawl_www/crawldb/current > > I went all the way down to the file level and it appears the files > exist bin/hadoop dfs -ls /crawl_www/crawldb/current/part-00000 > > Found 2 items > -rw-r--r-- 1 root supergroup 2375690617 2010-09-28 13:13 > /crawl_www/crawldb/current/part-00000/data > -rw-r--r-- 1 root supergroup 23784625 2010-09-28 13:14 > /crawl_www/crawldb/current/part-00000/index > > Also, when I use firefox to browse the hdfs filesystem using > localhost:50070, everything appears to work perfectly and I can see > everything. > > But, when I try a basic test run of Nutch, I get the following: > bin/nutch generate crawl_www/crawldb crawl_www/segments -topN 1000 > > > INFO crawl.Generator - Generator: starting at 2010-09-29 11:54:15 > INFO crawl.Generator - Generator: Selecting best-scoring urls due for > fetch. > INFO crawl.Generator - Generator: filtering: true INFO > crawl.Generator - Generator: normalizing: true INFO crawl.Generator - > Generator: topN: 1000 ERROR crawl.Generator - Generator: > org.apache.hadoop.mapred.InvalidInputException: > Input path does not exist: > hdfs://localhost:9000/user/root/crawl_www/crawldb/current > > > Did I miss on configuration step? I believe I have checked and double > checked everything and it appears to look correct. > > Any ideas? > > Note: this is Nutch 1.2 on Centos Linux 5.5. > > Thanks > Brad >

