i use the mahout 4.0 release.
in mahout-distribution-0.4/bin, i used
./mahout canopy -i /home/space/lucene_clustering/vector/vector -o
/home/space/lucene_clustering/canopy/ -dm
org.apache.mahout.common.distance.EuclideanDistanceMeasure -t1 0.8 -t2 0.2 -ow
int hadoop-env.sh, i add the
export HADOOP_HEAPSIZE=20000
export HADOOP_OPTS="-Xmn3g -Xss128k -XX:ParallelGCThreads=20
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8
-XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31 -XX:+AggressiveOpts
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9004
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false"
i am sure all the parameters are workable, because i use the jconsole to check
the vm paramters.
however, after map 100% and reduce 100%, the memory increase from 2.5G to 20G
and the exception thrown. the file vector is 30m, 50000 records, which is used
for canopy.
10/11/23 16:04:27 INFO mapred.LocalJobRunner: reduce > reduce
10/11/23 16:04:27 INFO mapred.JobClient: map 100% reduce 100%
10/11/23 16:04:30 INFO mapred.LocalJobRunner: reduce > reduce
10/11/23 16:08:17 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
at
org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434)
at
org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
at
org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
at org.apache.mahout.math.AbstractVector.assign(AbstractVector.java:449)
at
org.apache.mahout.clustering.AbstractCluster.computeParameters(AbstractCluster.java:184)
at
org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:42)
at
org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:29)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
10/11/23 16:08:18 INFO mapred.JobClient: Job complete: job_local_0001
10/11/23 16:08:18 INFO mapred.JobClient: Counters: 12
10/11/23 16:08:18 INFO mapred.JobClient: FileSystemCounters
10/11/23 16:08:18 INFO mapred.JobClient: FILE_BYTES_READ=70413991
10/11/23 16:08:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=164338288
10/11/23 16:08:18 INFO mapred.JobClient: Map-Reduce Framework
10/11/23 16:08:18 INFO mapred.JobClient: Reduce input groups=1
10/11/23 16:08:18 INFO mapred.JobClient: Combine output records=0
10/11/23 16:08:18 INFO mapred.JobClient: Map input records=50000
10/11/23 16:08:18 INFO mapred.JobClient: Reduce shuffle bytes=0
10/11/23 16:08:18 INFO mapred.JobClient: Reduce output records=227
10/11/23 16:08:18 INFO mapred.JobClient: Spilled Records=64708
10/11/23 16:08:18 INFO mapred.JobClient: Map output bytes=8836211
10/11/23 16:08:18 INFO mapred.JobClient: Combine input records=0
10/11/23 16:08:18 INFO mapred.JobClient: Map output records=32354
10/11/23 16:08:18 INFO mapred.JobClient: Reduce input records=32354
Exception in thread "main" java.lang.InterruptedException: Canopy Job failed
processing /home/space/lucene_clustering/vector/vector
at
org.apache.mahout.clustering.canopy.CanopyDriver.buildClustersMR(CanopyDriver.java:252)
at
org.apache.mahout.clustering.canopy.CanopyDriver.buildClusters(CanopyDriver.java:167)
at
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:114)
at
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.canopy.CanopyDriver.main(CanopyDriver.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)