Hello! After running Kmeans and computing Cluster Centroids I need a matrix (size: <Points> X <Clusters>) with distances between points and Clusters. By default mahout assigns every Point to one Cluster. In my case I need a Distance to each cluster.
For example, I have a document related to Formula 1. I have 2 clusters: Sport and Auto. Mahout assigns document to Cluster AUTO, while I need a result like this: Distance to AUTO 0.6 Distance to SPORT 0.4 I tried "vecdist" but it fails with exception (( Many thanks in advance! Here is "vecdist" example: ------------------------------ ./mahout vecdist \ --input /tmp/tfidf-vectors/ \ --seeds /tmp/clustering_results_kmeans/clusters-7-final \ --output /tmp/adhoc/ \ --distanceMeasure org.apache.mahout.common.distance.CosineDistanceMeasure \ --overwrite \ --outType v ------------------------------ And here is stack trace: ------------------------------ 12/09/05 14:10:28 INFO mapred.JobClient: Task Id : attempt_201207250104_40622_m_000029_0, Status : FAILED java.lang.IllegalStateException: Bad value class: class org.apache.mahout.clustering.iterator.ClusterWritable at org.apache.mahout.math.hadoop.similarity.SeedVectorUtil.loadSeedVectors(SeedVectorUtil.java:94) at org.apache.mahout.math.hadoop.similarity.VectorDistanceInvertedMapper.setup(VectorDistanceInvertedMapper.java:69) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) ------------------------------
