I think the hadoop-config.sh heap size only affects heaps on the Hadoop daemons. I had it at 2g earlier when I was getting the OMEs on fpg. I added a note to the wiki page about setting mapred.child.java.opts to 2g and also to remove the other config values that were set for single-node operation (esp dfs.replication=1).

I assume you added HADOOP_CONF_DIR so it actually runs in Hadoop?


On 5/21/10 10:52 AM, Mike Roberts wrote:
Exciting.  Yeah, I also set my heapsize to 2G.  I set it in the
hadoop-config.sh file.  Did you do it there or did you instead set it in
/conf/madred-site.xml -->    mapred.child.java.opts?  That'd be my next step
if I were actually getting memory errors, but wasn't even sure that real
data could be produced.

Kinda scary that it'll exit successfully without results.  Does mahout ever
return "wrong" results?  That is, there should be 120,000 results, but
because of some memory config somewhere it successfully returns just 100,000
results?  Anyone ever see that, and if so, how do you deal with it?
   conf/mapred-site.xml mapred.child.java.opts  conf/mapred-site.xml
mapred.child.java.opts

On Fri, May 21, 2010 at 10:36 AM, Jeff Eastman
<[email protected]>wrote:

On 5/20/10 9:51 PM, Mike Roberts wrote:

./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000

After reconfiguring a 4-node cluster to set the java heapsize to 2g I got
92144 in patterns/fpgrowth/part-r-00000 and got Count: 359 and volumes of
output after seqdumper. But its only using a single mapper/reducer in all
the steps (probably why it OMEs with the default heap). I also tried Drew's
-Dmapred.reduce.tasks=2 trick but bin/mahout barfs on that.


Reply via email to