Hey, Did you ever figure this issue out?
>From my experience with Hadoop, you can optimize memory usage in your cluster. From http://getsatisfaction.com/cloudera/topics/how_much_ram_datanode_should_take, HADOOP_HEAP_SIZE sets the size of the hadoop daemons (datanode, tasktracker) and mapred.child.java.opts helps controls the heap size of children JVMs (the map and reduce tasks themselves). So maybe you could set HADOOP_HEAD_SIZE to 1Gb and the mapred.child.java.opts=-Xmx3072M (3Gb). That way your map tasks have more memory to work with? > -- james On Mon, Jan 24, 2011 at 9:54 PM, Jia Rao <[email protected]> wrote: > Hi all, > > I am having a problem running the 20 newsgroup example in a hadoop cluster. > The trainclassifier worked fine but I got "out of memory java heap" problem > in the testclassifier. > > The following is the configuration of the hadoop cluster. > > Physical machines: 4 nodes, each with 6GB memory. > > Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh, > mapred.child.java.opts=-Xmx1024M in mapred-site.xml. > > mahout: tried release 0.4 and the latest source, same problem. > > Command line arguments used: > > $MAHOUT_HOME/bin/mahout testclassifier \ > -m newsmodel \ > -d 20news-input \ > -type bayes \ > -ng 3 \ > -source hdfs \ > -method mapreduce > > > Any ideas ? > Thanks ! >
