Hello,
We use mahout Kmeans-algorithm and convert its binary output to text
representation via ClusterDumper. When our input has reached approximately
20 million points, ClusterDumper takes 2.5 Gb RAM and fails with "Out of
memory error". Our machines don't have swap and we can't increase RAM
currently. Is there a way to avoid this problem?

The exception is attached below:

"Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:101)
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:38)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:474)
at
com.google.common.collect.ForwardingIterator.hasNext(ForwardingIterator.java:39)
at
org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:239)
at
org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:193)
at
org.apache.mahout.utils.clustering.ClusterDumper.<init>(ClusterDumper.java:78)
at com.mirantis.bigdata.clustering.kmeans.KmeansJob.run(Unknown Source)
at com.mirantis.bigdata.clustering.kmeans.KmeansJob.main(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)"

-- 
Regards,
Vitaly Davydov

Reply via email to