Hi all, I am having a problem running the 20 newsgroup example in a hadoop cluster. The trainclassifier worked fine but I got "out of memory java heap" problem in the testclassifier.
The following is the configuration of the hadoop cluster. Physical machines: 4 nodes, each with 6GB memory. Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh, mapred.child.java.opts=-Xmx1024M in mapred-site.xml. mahout: tried release 0.4 and the latest source, same problem. Command line arguments used: $MAHOUT_HOME/bin/mahout testclassifier \ -m newsmodel \ -d 20news-input \ -type bayes \ -ng 3 \ -source hdfs \ -method mapreduce Any ideas ? Thanks !
