Re: Clusterdump Output Question

paritosh ranjan Sun, 07 Oct 2012 23:57:34 -0700

I don't see any issue in top terms having similar frequencies. Cosine
distance measure is considered to be a good distance measure for text data.


On Mon, Oct 8, 2012 at 10:35 AM, jung hoon sohn <[email protected]> wrote:

> Thank you for the information.
> Following your answer, the top terms from the clusters have similar
> frequencies.
> As I used the cosine distance as the measure is this correct result?
>
> Thank You.
>
> Jung Hoon Sohn
>
> On Sun, Oct 7, 2012 at 9:35 PM, paritosh ranjan
> <[email protected]>wrote:
>
> > The top terms come from the centroid of the cluster. These values are the
> > term frequencies.
> >
> > On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn <[email protected]>
> wrote:
> >
> > > Hello,
> > > I used k-means algorithm to cluster the text terms in the documents
> > > according to the cosine distance measure.
> > > It ran successfully and when we ran the clusterdump utility to see the
> > top
> > > terms per each clusters,
> > > I get the output such as
> > >
> > >       Top Terms:
> > >
> > >             hello    =>     21.8977799999
> > >             you     =>     11.9284304939
> > >             ....
> > >
> > > I am guessing the value next to the each terms are cosine distance
> values
> > > but not very sure about it.
> > > Does anyone know specifically what does the value represent?
> > >
> > > Thanks.
> > >
> > > Jung Hoon Sohn
> > >
> >
>

Re: Clusterdump Output Question

Reply via email to