Hi list,
As suggested in previous posts, I am trying to use k-means to assign newly
arriving documents to existing clusters.
However, while trying to assign the vectors corresponding to the new documents
to the existing clusters (using KMeansDriver.clusterData(…)), I am running into
an org.apache.mahout.math.CardinalityException.
See below for the complete stack-trace.
For vector creation I use Mahout's DictionaryVectorizer.
I assume, this exception occurs because the new vectors have a different
cardinality than the previously computed clusters.
Is there some way to assign a fixed cardinality to all vectors? Or is there any
other solution for this?
I would really appreciate any help! Thanks,
David
java.lang.Exception: org.apache.mahout.math.CardinalityException: Required
cardinality 16 but got 22
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:371)
Caused by: org.apache.mahout.math.CardinalityException: Required cardinality 16
but got 22
at
org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:172)
at org.apache.mahout.math.NamedVector.dot(NamedVector.java:127)
at
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)
at
org.apache.mahout.clustering.kmeans.KMeansClusterer.outputPointWithClusterInfo(KMeansClusterer.java:140)
at
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:40)
at
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:652)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:238)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)