It's ALIVE!!! And I got the same count: 359. Recap: Setting mapred.child.java.opts in mapred-site.xml was the thing that worked. So, shakka-kahn!
Found this exciting Hadoop quirk: The following lines run in sequence produce an error: $HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input $HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input --> Error= ..."could only be replicated to 0 nodes instead of 1 hadoop"... I found this page: http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment which suggested a high level workaroud of loading a status page. I found a simple solution that worked: $HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input $HADOOP_HOME/bin/hadoop dfs -ls /fpm-input <-- this somehow makes hadoop stop complaining. $HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input On Fri, May 21, 2010 at 11:08 AM, Jeff Eastman <[email protected]>wrote: > I think the hadoop-config.sh heap size only affects heaps on the Hadoop > daemons. I had it at 2g earlier when I was getting the OMEs on fpg. I added > a note to the wiki page about setting mapred.child.java.opts to 2g and also > to remove the other config values that were set for single-node operation > (esp dfs.replication=1). > > I assume you added HADOOP_CONF_DIR so it actually runs in Hadoop? > > > > On 5/21/10 10:52 AM, Mike Roberts wrote: > >> Exciting. Yeah, I also set my heapsize to 2G. I set it in the >> hadoop-config.sh file. Did you do it there or did you instead set it in >> /conf/madred-site.xml --> mapred.child.java.opts? That'd be my next >> step >> if I were actually getting memory errors, but wasn't even sure that real >> data could be produced. >> >> Kinda scary that it'll exit successfully without results. Does mahout >> ever >> return "wrong" results? That is, there should be 120,000 results, but >> because of some memory config somewhere it successfully returns just >> 100,000 >> results? Anyone ever see that, and if so, how do you deal with it? >> conf/mapred-site.xml mapred.child.java.opts conf/mapred-site.xml >> mapred.child.java.opts >> >> On Fri, May 21, 2010 at 10:36 AM, Jeff Eastman >> <[email protected]>wrote: >> >> >> >>> On 5/20/10 9:51 PM, Mike Roberts wrote: >>> >>> >>> >>>> ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000 >>>> >>>> >>>> >>> After reconfiguring a 4-node cluster to set the java heapsize to 2g I got >>> 92144 in patterns/fpgrowth/part-r-00000 and got Count: 359 and volumes of >>> output after seqdumper. But its only using a single mapper/reducer in all >>> the steps (probably why it OMEs with the default heap). I also tried >>> Drew's >>> -Dmapred.reduce.tasks=2 trick but bin/mahout barfs on that. >>> >>> >>> >> >> > >
