Since I brought it up I guess I should answer, even if it's with Jeff's
answer:
Jeff Eastman comments on https://issues.apache.org/jira/browse/MAHOUT-1030
"
The more I think about the distance property calculation the more I am
comfortable with the current (trunk) implementation. Consider that, for
DistanceMeasureClusters at least, the pdf is:
pdf = 1/(1+distance)
It is trivial to back-calculate the distance from the pdf value that is
already written in the WeightedVectorWritable. This is because, in the
new implementation, the pdfs are calculated by the classify() policy
method and it is the pdfs that are written.
Sooo, I'm not sure this is a must-fix or even a want-fix. Thoughts?
"
On 6/27/12 9:12 AM, Pat Ferrel wrote:
Using 0.8 snapshot taken yesterday the output of kmeans clustering is
a <IntWritable, WeightedVectorWritable>. In 0.6 it was <IntWritable,
WeightedPropertyVectorWritable> and the properties contained among
other things the distance from the centroid. As I understand things,
this was discovered too late in the 0.7 release to fix. Should I plan
to calculate the distance on my own or is this going back into 0.8 any
time soon?