Its always nice to get the same answer more than once. Now to see if I can make it use more than a single mapper and reducer. It clearly won't scale without this.

On the Hadoop quirk, I've noticed that the datanodes can take up to a minute to come on line after start-all. During that time I also see the replication 0 error on puts. I suspect that the extra time taken to type an extra -ls or open a browser on 50070 might be all it takes to get the data nodes fully running and connected.

On 5/21/10 2:16 PM, Mike Roberts wrote:
It's ALIVE!!! And I got the same count: 359.  Recap: Setting
mapred.child.java.opts in mapred-site.xml was the thing that worked. So,
shakka-kahn!

Found this exciting Hadoop quirk:
The following lines run in sequence produce an error:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input
-->  Error= ..."could only be replicated to 0 nodes instead of 1 hadoop"...

I found this page:
http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment  which
suggested a high level workaroud of loading a status page.

I found a simple solution that worked:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -ls /fpm-input<-- this somehow makes hadoop
stop complaining.
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input


Reply via email to