Hi, I was running the example kmeans program following the link here
https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html So I increased the input size Synthetic_cotnrol.data from around 200kb to 1.2 GB by copying the data itself, the max iteration is set to 10, so after all 10 iterations are finished, I got a Exception in thread "main" java.lang.OutOfMemoryError: Java heap space. I have boosted the -xmx in the ChildJVM of Hadoop to 4G, and JAVA_HEAP_SIZE in bin/mahout to -xmx5g, but it still happens. I am confused a bit as to where is the JVM being started and how to pass in the Java Heap Size options to prevent this error from happening, Thanks and this is the full stack trace at org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:430) at org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:383) at org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:139) at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:118) at org.apache.mahout.clustering.classify.WeightedVectorWritable.readFields(WeightedVectorWritable.java:56) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1809) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1937) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95) at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:38) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Iterators$6.hasNext(Iterators.java:630) at com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:43) at org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:293) at org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:246) at org.apache.mahout.utils.clustering.ClusterDumper.<init>(ClusterDumper.java:94) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:137) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
