Hi Darshan,
What i understand from your problem is that:
- You have clustered few documents
- You want to verify the accuracy of ur clustering , and you want to use
entropy for that
- You are not sure what should be the input for entropy calculation.
Possible solution:
The entropy would expect a String[] to calculate the information contained
in the data/sequence.
One simplest way is to keep all the documents labelled with categories.
- Cluster the docs as you usually do.
- For entropy calculation create a String[] for every cluster. Each array
containing all the labels of the docs in the cluster.
cluster1 = {"sports", "tech", "tech", "tech", "book", ..}
cluster2 = {"sports", "drama", "sports", "sports"...}
etc
- Calculate the entropy of each cluster.
Entropy would measure the degree of randomness of a system. High entropy
means there is high degree of randomness in a system.
Lower Entropy are desirable for validation of accuracy of your clustering
technique.
P.S. You can use Entropy.java class for your validation purpose but
its deprecated now.
Having Said that - Kindly be patient while asking questions and provide
more info on what work you have done so far with your findings. It would
enable all of us to answer quickly & correctly :)
Hope it was helpful. Other Approaches are welcome..!!
Peace,
Yash
On Fri, May 23, 2014 at 10:55 PM, Ted Dunning <[email protected]> wrote:
> I am sorry, but I don't understand your questions or needs sufficiently to
> answer.
>
>
>
>
> On Wed, Apr 23, 2014 at 12:21 PM, Darshan Sonagara <
> [email protected]> wrote:
>
> > sir please reply me as soon as possible
> > thanks in advance.
> >
> >
> > On Tue, Apr 22, 2014 at 11:50 PM, Darshan Sonagara <
> > [email protected]> wrote:
> >
> > > waiting for the replay sir .
> > >
> > >
> > > On Tue, Apr 22, 2014 at 7:13 PM, Darshan Sonagara <
> > > [email protected]> wrote:
> > >
> > >> Thnks for the Replay sir,
> > >>
> > >> actually i am doing clustering for gathering similar king of document
> in
> > >> same cluster as much as possible.
> > >> i can see from output file by cluster dump by observing top term.
> > >> i also figure out that by varying Distance Measure Technique. it
> > differs.
> > >> but i want some mathematical prof that it is better then other
> > technique.
> > >> so for that i need to calculate Entropy and pureness of cluster.
> > >> but i am not able to find any command in mahout which can give me
> > entropy
> > >> as a result.
> > >> i found Entropy.java under mahout common math statistic package. but i
> > >> don't what should i give it as input so that i can find entropy or
> other
> > >> parameter. so i can find how much cluster is good or bed.
> > >>
> > >>
> > >>
> > >> On Tue, Apr 22, 2014 at 7:01 PM, Ted Dunning <[email protected]
> > >wrote:
> > >>
> > >>> On Tue, Apr 22, 2014 at 12:11 AM, Darshan Sonagara <
> > >>> [email protected]> wrote:
> > >>>
> > >>> > But the problem is that i want check that whether my clustering is
> > >>> good or
> > >>> > bad. so for that i need to calculate Entropy Value. I am not having
> > any
> > >>> > idea how to calculate entropy in mahout or by other technique.
> > >>> > by finding entropy i can have good conclusion.
> > >>> > so please can anyone help me with these.
> > >>> >
> > >>>
> > >>> Actually, the way to tell whether your clustering is good is to see
> if
> > it
> > >>> works for its intended use.
> > >>>
> > >>> What do you want to use clustering for?
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> *Regards From:*
> > >>
> > >> *Darshan Sonagara*
> > >> *Collaborative Platform lead,** SSN Team | Gujarat Section.*
> > >>
> > >> *Vice-Chairperson | **GCET IEEE SB.*
> > >>
> > >> (: +*91* 9408002452
> > >>
> > >>
> > >>
> > >> : Darshan Sonagara<
> > http://www.linkedin.com/pub/darshan-sonagara/64/11a/b54>
> > >> : Darshan Sonagara <http://www.facebook.com/darshansonagara>
> > >>
> > >>
> > >
> > >
> > > --
> > >
> > > *Regards From:*
> > >
> > > *Darshan Sonagara*
> > > *Collaborative Platform lead,** SSN Team | Gujarat Section.*
> > >
> > > *Vice-Chairperson | **GCET IEEE SB.*
> > >
> > > (: +*91* 9408002452
> > >
> > >
> > >
> > > : Darshan Sonagara<
> > http://www.linkedin.com/pub/darshan-sonagara/64/11a/b54>
> > > : Darshan Sonagara <http://www.facebook.com/darshansonagara>
> > >
> > >
> >
> >
> > --
> >
> > *Regards From:*
> >
> > *Darshan Sonagara*
> > *Collaborative Platform lead,** SSN Team | Gujarat Section.*
> >
> > *Vice-Chairperson | **GCET IEEE SB.*
> >
> > (: +*91* 9408002452
> >
> >
> >
> > : Darshan Sonagara<
> > http://www.linkedin.com/pub/darshan-sonagara/64/11a/b54>
> > : Darshan Sonagara <http://www.facebook.com/darshansonagara>
> >
>