I must be missing something.
Jeff said; " It is trivial to back-calculate the distance from the pdf
value that is already written in the WeightedVectorWritable."
But with kmeans the weight in the WeightedVectorWritable is 1 or 0. Any
relationship to pdf = 1/(1+distance) has been lost.
So I'm back to being confused.
On 6/27/12 4:24 PM, Pat Ferrel wrote:
Since I brought it up I guess I should answer, even if it's with
Jeff's answer:
Jeff Eastman comments on
https://issues.apache.org/jira/browse/MAHOUT-1030
"
The more I think about the distance property calculation the more I am
comfortable with the current (trunk) implementation. Consider that,
for DistanceMeasureClusters at least, the pdf is:
pdf = 1/(1+distance)
It is trivial to back-calculate the distance from the pdf value that
is already written in the WeightedVectorWritable. This is because, in
the new implementation, the pdfs are calculated by the classify()
policy method and it is the pdfs that are written.
Sooo, I'm not sure this is a must-fix or even a want-fix. Thoughts?
"
On 6/27/12 9:12 AM, Pat Ferrel wrote:
Using 0.8 snapshot taken yesterday the output of kmeans clustering is
a <IntWritable, WeightedVectorWritable>. In 0.6 it was <IntWritable,
WeightedPropertyVectorWritable> and the properties contained among
other things the distance from the centroid. As I understand things,
this was discovered too late in the 0.7 release to fix. Should I plan
to calculate the distance on my own or is this going back into 0.8
any time soon?