I am not a Mahout user (yet!) but I have a clustering problem that I
need to solve scalably so Mahout looks very attractive.

From my first looks over the wiki and javadocs, it seems that Mahout
is oriented to clustering real-valued vectors.  I understand that
clustering must be built on top of a distance function, but the
objects in the space I'm computing with don't have a natural
representation as such vectors (that I can see).  There is a topology:
I can define a metric (non-negative, symmetric, satisfies triangle
inequality, distance to self is 0). It seems that that is all that
should be needed for clustering, from a logical point of view.

My question is: can Mahout run clustering for me based on a metric
function I supply, or must I have a vector representation of the
objects I want to cluster?

Thanks to the Mahout community for making such a useful tool available!

William F Dowling
Senior Technologist

Thomson Reuters



Reply via email to