Hi list,

As suggested in previous posts, I am trying to use k-means to assign newly 
arriving documents to existing clusters.

However, while trying to assign the vectors corresponding to the new documents 
to the existing clusters (using KMeansDriver.clusterData(…)), I am running into 
an org.apache.mahout.math.CardinalityException.
See below for the complete stack-trace. 

For vector creation I use Mahout's DictionaryVectorizer. 
I assume, this exception occurs because the new vectors have a different 
cardinality than the previously computed clusters.

Is there some way to assign a fixed cardinality to all vectors? Or is there any 
other solution for this?

I would really appreciate any help! Thanks,
David

 

java.lang.Exception: org.apache.mahout.math.CardinalityException: Required 
cardinality 16 but got 22
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:371)
Caused by: org.apache.mahout.math.CardinalityException: Required cardinality 16 
but got 22
        at 
org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:172)
        at org.apache.mahout.math.NamedVector.dot(NamedVector.java:127)
        at 
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterer.outputPointWithClusterInfo(KMeansClusterer.java:140)
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:40)
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:652)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:238)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:680)

Reply via email to