Hi there

This is the first time I send a message in this forum. I have a clustering
problem I want to solve. Basically I need to clusterize
a set of >1M items containing html text. At first I was thinking of using a
Lucene index and a hierarquical quadratic algorithm to find
these clusters, but we all know that quadratic performance is not good.
Although K-means complexity
is better, I would like to know production experiences using this algorithm
in Mahout, specifically time to setup a production
environment (inclusing hadoop configuration). I'm interested in this latter
issue since we have short time to come with a solution
for our problem

Thanks in advance.

Best regards.
Gustavo

Reply via email to