At such small sizes, I would guess that the sequential version of the streaming k-means or ball k-means would be better options.
On Mon, Sep 30, 2013 at 2:14 PM, mercutio7979 <[email protected]>wrote: > Hello all, > > I am currently trying create clusters from a group of 50.000 strings that > contain product descriptions (around 70-100 characters length each). > > That group of 50.000 consists of roughly 5.000 individual products and ten > varying product descriptions per product. The product descriptions are > already prepared for clustering and contain a normalized brand name, > product > model number, etc. > > What would be a good approach to maximise the amound of found clusters (the > best possible value would be 5.000 clusters with 10 products each) > > I adapted the reuters cluster script to read in my data and managed to > create a first set of clusters. However, I have not managed to maximise the > cluster count. > > The question is: what do I need to tweak with regard to the available > mahout > settings, so the clusters are created as precisely as possible? > > Many regards! > Jens > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/What-are-the-best-settings-for-my-clustering-task-tp4092807.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
