I think this much memory should fix the problem. However, If you still face OOM, then try using clusterpp command instead of clusterdump , its not having memory limitations as it also has a mapreduce version. You can find clusterpp's usage here https://cwiki.apache.org/MAHOUT/top-down-clustering.html.
On Fri, Oct 12, 2012 at 9:13 PM, Rajesh Nikam <[email protected]> wrote: > Hi, > > I have used canopy and k-means clustering to cluster around 1.2 M > instances. > csv file size if around 425 MB. However when I run "mahout clusterdump" > command as below I am getting > Java OutOfMemory error. > > mahout clusterdump -dt sequencefile -i > clean-kmeans-clusters/clusters-1-final/part-r-00000 -n 20 -b 100 -o > cdump-clean.txt -p clean-kmeans-clusters/clusteredPoints/ > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:44) > at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:39) > at > org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:99) > at > > org.apache.mahout.clustering.classify.WeightedVectorWritable.readFields(WeightedVectorWritable.java:56) > > I have switched to 64 bit Ubantu and even tried setting 4GB/8GB/12GB of > memory for java. > > JAVA_HEAP_MAX=-Xmx4g > JAVA_HEAP_MAX=-Xmx8g > JAVA_HEAP_MAX=-Xmx12g > > Not sure how to increase required memory for Java runtime. > > How to check is this java on Ubantu is 64 bit or not ? > > Thanks > Rajesh >
