Hi there This is the first time I send a message in this forum. I have a clustering problem I want to solve. Basically I need to clusterize a set of >1M items containing html text. At first I was thinking of using a Lucene index and a hierarquical quadratic algorithm to find these clusters, but we all know that quadratic performance is not good. Although K-means complexity is better, I would like to know production experiences using this algorithm in Mahout, specifically time to setup a production environment (inclusing hadoop configuration). I'm interested in this latter issue since we have short time to come with a solution for our problem
Thanks in advance. Best regards. Gustavo
