What are the best settings for my clustering task

mercutio7979 Mon, 30 Sep 2013 14:15:12 -0700

Hello all,

I am currently trying create clusters from a group of 50.000 strings that
contain product descriptions (around 70-100 characters length each).


That group of 50.000 consists of roughly 5.000 individual products and ten
varying product descriptions per product. The product descriptions are
already prepared for clustering and contain a normalized brand name, product
model number, etc.

What would be a good approach to maximise the amound of found clusters (the
best possible value would be 5.000 clusters with 10 products each)

I adapted the reuters cluster script to read in my data and managed to
create a first set of clusters. However, I have not managed to maximise the
cluster count. 

The question is: what do I need to tweak with regard to the available mahout
settings, so the clusters are created as precisely as possible?

Many regards!
Jens





--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-are-the-best-settings-for-my-clustering-task-tp4092807.html
Sent from the Mahout User List mailing list archive at Nabble.com.

What are the best settings for my clustering task

Reply via email to