I don't see any issue in top terms having similar frequencies. Cosine distance measure is considered to be a good distance measure for text data.
On Mon, Oct 8, 2012 at 10:35 AM, jung hoon sohn <[email protected]> wrote: > Thank you for the information. > Following your answer, the top terms from the clusters have similar > frequencies. > As I used the cosine distance as the measure is this correct result? > > Thank You. > > Jung Hoon Sohn > > On Sun, Oct 7, 2012 at 9:35 PM, paritosh ranjan > <[email protected]>wrote: > > > The top terms come from the centroid of the cluster. These values are the > > term frequencies. > > > > On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn <[email protected]> > wrote: > > > > > Hello, > > > I used k-means algorithm to cluster the text terms in the documents > > > according to the cosine distance measure. > > > It ran successfully and when we ran the clusterdump utility to see the > > top > > > terms per each clusters, > > > I get the output such as > > > > > > Top Terms: > > > > > > hello => 21.8977799999 > > > you => 11.9284304939 > > > .... > > > > > > I am guessing the value next to the each terms are cosine distance > values > > > but not very sure about it. > > > Does anyone know specifically what does the value represent? > > > > > > Thanks. > > > > > > Jung Hoon Sohn > > > > > >
