Its always nice to get the same answer more than once. Now to see if I
can make it use more than a single mapper and reducer. It clearly won't
scale without this.
On the Hadoop quirk, I've noticed that the datanodes can take up to a
minute to come on line after start-all. During that time I also see the
replication 0 error on puts. I suspect that the extra time taken to type
an extra -ls or open a browser on 50070 might be all it takes to get the
data nodes fully running and connected.
On 5/21/10 2:16 PM, Mike Roberts wrote:
It's ALIVE!!! And I got the same count: 359. Recap: Setting
mapred.child.java.opts in mapred-site.xml was the thing that worked. So,
shakka-kahn!
Found this exciting Hadoop quirk:
The following lines run in sequence produce an error:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input
--> Error= ..."could only be replicated to 0 nodes instead of 1 hadoop"...
I found this page:
http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment which
suggested a high level workaroud of loading a status page.
I found a simple solution that worked:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -ls /fpm-input<-- this somehow makes hadoop
stop complaining.
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input