Re: Problem running twenty newsgroup example in a hadoop cluster

james q Thu, 03 Feb 2011 20:26:44 -0800

Hey,

Did you ever figure this issue out?


>From my experience with Hadoop, you can optimize memory usage in your
cluster. From
http://getsatisfaction.com/cloudera/topics/how_much_ram_datanode_should_take,
HADOOP_HEAP_SIZE sets the size of the hadoop daemons (datanode,
tasktracker) and mapred.child.java.opts helps controls the heap size of
children JVMs (the map and reduce tasks themselves).

So maybe you could set HADOOP_HEAD_SIZE to 1Gb and the
mapred.child.java.opts=-Xmx3072M (3Gb). That way your map tasks have more
memory to work with?

> -- james


On Mon, Jan 24, 2011 at 9:54 PM, Jia Rao <[email protected]> wrote:

> Hi all,
>
> I am having a problem running the 20 newsgroup example in a hadoop cluster.
> The trainclassifier worked fine but I got "out of memory java heap" problem
> in the testclassifier.
>
> The following is the configuration of the hadoop cluster.
>
> Physical machines: 4 nodes, each with 6GB memory.
>
> Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh,
> mapred.child.java.opts=-Xmx1024M in mapred-site.xml.
>
> mahout: tried release 0.4 and the latest source, same problem.
>
> Command line arguments used:
>
> $MAHOUT_HOME/bin/mahout testclassifier \
>  -m newsmodel \
>  -d 20news-input \
>  -type bayes \
>  -ng 3 \
>  -source hdfs \
>  -method mapreduce
>
>
> Any ideas ?
> Thanks !
>

Re: Problem running twenty newsgroup example in a hadoop cluster

Reply via email to