Using the CLI to kmeans from several trunk versions I get an error I
don't understand. When the job died the
b3/canopy-centroids/clusters-0-final contained the random-seeds file
generated by the kmeans driver and the b3/kmeans-clusters/clusters-0 had
several part files but b3/kmeans-clusters/clusters-1 was empty. When I
look through the code from the trace it doesn't make much sense.
Command line:
mahout kmeans
-i b3/vectors/tfidf-vectors/
-k 20
-c b3/canopy-centroids/clusters-0-final
-cl
-o b3/kmeans-clusters
-ow
-cd 0.01
-x 30
-dm org.apache.mahout.common.distance.CosineDistanceMeasure
Error:
12/06/04 07:55:03 INFO common.AbstractJob: Command line arguments:
{--clustering=null, --clusters=[b3/canopy-centroids/clusters-0-final],
--convergenceDelta=[0.01],
--distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
--endPhase=[2147483647], --input=[b3/vectors/tfidf-vectors/],
--maxIter=[30], --method=[mapreduce], --numClusters=[20],
--output=[b3/kmeans-clusters], --overwrite=null, --startPhase=[0],
--tempDir=[temp]}
2012-06-04 07:55:03.752 java[67308:1903] Unable to load realm info from
SCDynamicStore
12/06/04 07:55:03 INFO common.HadoopUtil: Deleting
b3/canopy-centroids/clusters-0-final
12/06/04 07:55:04 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
12/06/04 07:55:04 INFO compress.CodecPool: Got brand-new compressor
12/06/04 07:55:04 INFO kmeans.RandomSeedGenerator: Wrote 20 vectors to
b3/canopy-centroids/clusters-0-final/part-randomSeed
12/06/04 07:55:04 INFO kmeans.KMeansDriver: Input:
b3/vectors/tfidf-vectors Clusters In:
b3/canopy-centroids/clusters-0-final/part-randomSeed Out:
b3/kmeans-clusters Distance:
org.apache.mahout.common.distance.CosineDistanceMeasure
12/06/04 07:55:04 INFO kmeans.KMeansDriver: convergence: 0.01 max
Iterations: 30 num Reduce Tasks: org.apache.mahout.math.VectorWritable
Input Vectors: {}
12/06/04 07:55:04 INFO compress.CodecPool: Got brand-new decompressor
Cluster Iterator running iteration 1 over priorPath:
b3/kmeans-clusters/clusters-0
12/06/04 07:55:05 INFO input.FileInputFormat: Total input paths to
process : 1
12/06/04 07:55:05 INFO mapred.JobClient: Running job: job_local_0001
12/06/04 07:55:06 INFO mapred.MapTask: io.sort.mb = 100
12/06/04 07:55:08 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/04 07:55:08 INFO mapred.MapTask: record buffer = 262144/327680
12/06/04 07:55:08 INFO mapred.JobClient: map 0% reduce 0%
12/06/04 07:55:09 WARN mapred.LocalJobRunner: job_local_0001
org.apache.mahout.math.IndexException: Index -1 is outside allowable
range of [0,20)
at org.apache.mahout.math.AbstractVector.set(AbstractVector.java:439)
at
org.apache.mahout.clustering.iterator.AbstractClusteringPolicy.select(AbstractClusteringPolicy.java:44)
at org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:52)
at org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/06/04 07:55:09 INFO mapred.JobClient: Job complete: job_local_0001
12/06/04 07:55:09 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: Cluster
Iteration 1 failed processing b3/kmeans-clusters/clusters-1
at
org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:186)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:229)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:149)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:108)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)