If I leave out the --pointsDir option it will work. But I need to know the cluster assignments for each point.
J On Fri, 2011-02-18 at 16:05 -0700, [email protected] wrote: > Greetings, > > I used kmeans to cluster ~3million instances of 40-d vectors. The > clustering ran fine but when I ran the cluster dump utility I got the > memory error below. I initially ran everything locally, but after > getting the memory error I tried running it under hadoop in pseudo > distributed mode (I'm running cloudera). > > I have r1066213 of Mahout. > Java is 1.6.0_23 > > > Jeremy > > /usr/local/programs/svn_mahout/bin/mahout clusterdump --seqFileDir > kmeans_work/cluster-9 --pointsDir kmeans_work/clusteredPoints --output > kmeans_work/clusteranalyze-9.txt > Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20 > No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/conf > 11/02/18 14:34:50 INFO common.AbstractJob: Command line arguments: > {--dictionaryType=text, --endPhase=2147483647, > --output=kmeans_work/clusteranalyze-9.txt, > --pointsDir=kmeans_work/clusteredPoints, > --seqFileDir=kmeans_work/cluster-9, --startPhase=0, --tempDir=temp} > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead > limit exceeded > at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:44) > at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:39) > at > org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:94) > at > org.apache.mahout.clustering.WeightedVectorWritable.readFields(WeightedVectorWritable.java:55) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1758) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1886) > at > org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:286) > at > org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:224) > at > org.apache.mahout.utils.clustering.ClusterDumper.run(ClusterDumper.java:143) > at > org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > >
