Clustering error

Aysu Ezen Mon, 04 Feb 2013 11:14:25 -0800

Hi everyone,

I am having difficulty running the clustering example at:
https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html


I have followed all the steps but getting the error:
org.apache.mahout.math.CardinalityException: Required cardinality 60 but
got 1151
at org.apache.mahout.math.AbstractVector.dot(AbstractVector.java:112)
at
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)
at
org.apache.mahout.common.distance.EuclideanDistanceMeasure.distance(EuclideanDistanceMeasure.java:39)
at
org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:153)
at
org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:46)
at
org.apache.mahout.clustering.canopy.CanopyReducer.reduce(CanopyReducer.java:29)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/02/04 14:11:41 INFO mapred.JobClient: Job complete: job_local_0002
13/02/04 14:11:41 INFO mapred.JobClient: Counters: 16
13/02/04 14:11:41 INFO mapred.JobClient:   FileSystemCounters
13/02/04 14:11:41 INFO mapred.JobClient:     FILE_BYTES_READ=259999568
13/02/04 14:11:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=245776500
13/02/04 14:11:41 INFO mapred.JobClient:   File Input Format Counters
13/02/04 14:11:41 INFO mapred.JobClient:     Bytes Read=1485833
13/02/04 14:11:41 INFO mapred.JobClient:   Map-Reduce Framework
13/02/04 14:11:41 INFO mapred.JobClient:     Map output materialized
bytes=66080
13/02/04 14:11:41 INFO mapred.JobClient:     Map input records=3552
13/02/04 14:11:41 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/02/04 14:11:41 INFO mapred.JobClient:     Spilled Records=101
13/02/04 14:11:41 INFO mapred.JobClient:     Map output bytes=65646
13/02/04 14:11:41 INFO mapred.JobClient:     Total committed heap usage
(bytes)=929648640
13/02/04 14:11:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=535
13/02/04 14:11:41 INFO mapred.JobClient:     Combine input records=0
13/02/04 14:11:41 INFO mapred.JobClient:     Reduce input records=0
13/02/04 14:11:41 INFO mapred.JobClient:     Reduce input groups=0
13/02/04 14:11:41 INFO mapred.JobClient:     Combine output records=0
13/02/04 14:11:41 INFO mapred.JobClient:     Reduce output records=0
13/02/04 14:11:41 INFO mapred.JobClient:     Map output records=101
*Exception in thread "main" java.lang.InterruptedException: Canopy Job
failed processing output/data*
at
org.apache.mahout.clustering.canopy.CanopyDriver.buildClustersMR(CanopyDriver.java:349)
at
org.apache.mahout.clustering.canopy.CanopyDriver.buildClusters(CanopyDriver.java:236)
at
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:145)
at
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:160)
at org.apache.mahout.clustering.syntheticcontrol.canopy.Job.run(Job.java:86)
at
org.apache.mahout.clustering.syntheticcontrol.canopy.Job.main(Job.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I would appreciate if you could help.

Clustering error

Reply via email to