I am running kmeans clustering on vectors extracted from a lucene index. What I want as my end result is a mapping of document ID to the cluster for each document. How can I get that output? I see many other people also want this but I dont see enough detail in any solution that helps me enough to get it.
So far I do this: ./mahout lucene.vector -d ~/clusterdemo/solr/data/index/ -f text --idField id --output output.txt --dictOut dict.txt ./mahout kmeans -i output.txt -o kmeans -x 10 -k 100 -ow --clusters clusters -cl ./mahout clusterdump --dictionary dict.txt --seqFileDir kmeans/clusters-10-final --dictionaryType text --pointsDir kmeans/clusteredPoints --output dump But what I see inside "dump" file does not contain any mapping from document ID to each cluster. How can I get that? Should not be this hard to get the most obvious/useful output IMO ;) Thanks Bob
