I have the same requirement. the distances are scalars/magnitudes.
Without considering their direction you cannot assume what you say below.
Take a look at RowSimlarity. This calculates the distance for each
document to others and you can specify how many close ones to find. It
is not a nicely scalable algorithm though so if you only have it look
for similar docs on the same cluster you will optimize the time to
calculate RowSimilarity.
On 7/26/12 6:43 PM, kiran kumar wrote:
Hello,
I am using mahout clustering to cluster our data. I got around 100
clusters. All is good till now.
I have a requirement if user is seeing a document in a cluster, i want to
find the next closest documents to the user interested document and show it
to the user from the same cluster.
I have distances from the center for each document. if A document has
distance d1, does it mean documents with d2 d3 with values ascending are
similar or closer to the document. (OR) to get the closest documents to
document A do we need to calculate distance from A to all other documents
in the cluster.
Can you please give some thoughts on how to solve this problem.
Thanks & Regards,
Kiran Kumar Bushireddy.