That's a really good question. Mahout does not have an "explain"
feature; however, you can use the ClusterDumper to print out the cluster
centers and vectors clustered within each cluster. Output is pretty
verbose and, with large text vectors being truncated, might not be that
useful. You might need to write something to do this. Look at the
cluster evaluator tests for some hints.
Which algorithm were you using?
On 2/4/13 1:57 PM, Chris Harrington wrote:
I was wondering if there was an explain feature in Mahout, something that gives
the reason why it did what it did, shows the values of the various features it
used to evaluate and choose the result, etc.
Because I have some wildly different text data being clustered together, for
example it clustered these 2 together and I'd like to be able to figure out why
Text 1: "Iron Butterfly Bassist Lee Dorman Dies at 70"
Text 2: "The BEST Memes Of 2012 2012 was a landmark year for memes -- and we could
say that due to the Ikea Monkey alone -- but it's not always easy…"