y, September 30, 2014 1:28 PM
> *To:* scikit-learn-general@lists.sourceforge.net
> *Subject:* Re: [Scikit-learn-general] mutual information
>
>
>
> usually, statistically speaking, you compute the MI score to see to which
> extent is your observed frequency of cooccurrence di
So in this case label_predict would be vector X, and label_true vector Y?
Thank you,
From: Emanuela Boros [mailto:emanuela.bo...@gmail.com]
Sent: Tuesday, September 30, 2014 1:28 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] mutual information
usually
usually, statistically speaking, you compute the MI score to see to which
extent is your observed frequency of cooccurrence different from what you
would expect, so labels_true and labels_predict.
On Tue, Sep 30, 2014 at 7:13 PM, Pagliari, Roberto
wrote:
> I’m a little confused by the descriptio
I'm a little confused by the description of mutual information score.
What is the meaning of clustering, and why are the inputs called labels_true
and labels_predict.
Shouldn't mutual info be computed between two generic vectors X and Y?
Thanks,
Well actually I've made X and y to both be binary, as in they are just the
presence or absence or a feature/class in the vector(s) so they should be fine
to pass into mutual_info_score.
Matti Lyra
DPhil Student
Text Analytics Group
Chichester 1, R203
School of
Those two functions are apparently not computing the same thing. The
mutual_info_score function is a clustering quality evaluation tool
used to compute the mutual information between 2 sets of integer
cluster label assignment. At least some of the integer label values
must match for the score to be
Hi, I've been looking at using the sklearn.metrics.cluster.mutual_info_score as
a feature selection metric for language data. This is quite a common thing to
do in the NLP community. My problem however is that it is really slow as I need
to iterate over all the features in the dataset and pass t