Hmm, switched back to mahout 0.6 and the same command line produced the
expected results with the same data. No error. Can't find anything on JIRA.
Is anyone else using kmeans from the trunk on real data?
On 6/4/12 9:05 AM, Pat Ferrel wrote:
Using the CLI to kmeans from several trunk versions I get an error I
don't understand. When the job died the
b3/canopy-centroids/clusters-0-final contained the random-seeds file
generated by the kmeans driver and the b3/kmeans-clusters/clusters-0
had several part files but b3/kmeans-clusters/clusters-1 was empty.
When I look through the code from the trace it doesn't make much sense.
Command line:
mahout kmeans
-i b3/vectors/tfidf-vectors/
-k 20
-c b3/canopy-centroids/clusters-0-final
-cl
-o b3/kmeans-clusters
-ow
-cd 0.01
-x 30
-dm org.apache.mahout.common.distance.CosineDistanceMeasure
Error:
12/06/04 07:55:03 INFO common.AbstractJob: Command line arguments:
{--clustering=null, --clusters=[b3/canopy-centroids/clusters-0-final],
--convergenceDelta=[0.01],
--distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
--endPhase=[2147483647], --input=[b3/vectors/tfidf-vectors/],
--maxIter=[30], --method=[mapreduce], --numClusters=[20],
--output=[b3/kmeans-clusters], --overwrite=null, --startPhase=[0],
--tempDir=[temp]}
2012-06-04 07:55:03.752 java[67308:1903] Unable to load realm info
from SCDynamicStore
12/06/04 07:55:03 INFO common.HadoopUtil: Deleting
b3/canopy-centroids/clusters-0-final
12/06/04 07:55:04 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
12/06/04 07:55:04 INFO compress.CodecPool: Got brand-new compressor
12/06/04 07:55:04 INFO kmeans.RandomSeedGenerator: Wrote 20 vectors to
b3/canopy-centroids/clusters-0-final/part-randomSeed
12/06/04 07:55:04 INFO kmeans.KMeansDriver: Input:
b3/vectors/tfidf-vectors Clusters In:
b3/canopy-centroids/clusters-0-final/part-randomSeed Out:
b3/kmeans-clusters Distance:
org.apache.mahout.common.distance.CosineDistanceMeasure
12/06/04 07:55:04 INFO kmeans.KMeansDriver: convergence: 0.01 max
Iterations: 30 num Reduce Tasks: org.apache.mahout.math.VectorWritable
Input Vectors: {}
12/06/04 07:55:04 INFO compress.CodecPool: Got brand-new decompressor
Cluster Iterator running iteration 1 over priorPath:
b3/kmeans-clusters/clusters-0
12/06/04 07:55:05 INFO input.FileInputFormat: Total input paths to
process : 1
12/06/04 07:55:05 INFO mapred.JobClient: Running job: job_local_0001
12/06/04 07:55:06 INFO mapred.MapTask: io.sort.mb = 100
12/06/04 07:55:08 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/04 07:55:08 INFO mapred.MapTask: record buffer = 262144/327680
12/06/04 07:55:08 INFO mapred.JobClient: map 0% reduce 0%
12/06/04 07:55:09 WARN mapred.LocalJobRunner: job_local_0001
org.apache.mahout.math.IndexException: Index -1 is outside allowable
range of [0,20)
at org.apache.mahout.math.AbstractVector.set(AbstractVector.java:439)
at
org.apache.mahout.clustering.iterator.AbstractClusteringPolicy.select(AbstractClusteringPolicy.java:44)
at
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:52)
at
org.apache.mahout.clustering.iterator.CIMapper.map(CIMapper.java:18)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/06/04 07:55:09 INFO mapred.JobClient: Job complete: job_local_0001
12/06/04 07:55:09 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: Cluster
Iteration 1 failed processing b3/kmeans-clusters/clusters-1
at
org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:186)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:229)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:149)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:108)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)