I am not a Mahout user (yet!) but I have a clustering problem that I need to solve scalably so Mahout looks very attractive.
From my first looks over the wiki and javadocs, it seems that Mahout is oriented to clustering real-valued vectors. I understand that clustering must be built on top of a distance function, but the objects in the space I'm computing with don't have a natural representation as such vectors (that I can see). There is a topology: I can define a metric (non-negative, symmetric, satisfies triangle inequality, distance to self is 0). It seems that that is all that should be needed for clustering, from a logical point of view. My question is: can Mahout run clustering for me based on a metric function I supply, or must I have a vector representation of the objects I want to cluster? Thanks to the Mahout community for making such a useful tool available! William F Dowling Senior Technologist Thomson Reuters
