Re: [Scikit-learn-general] mutual information

2014-10-01 Thread Michael Eickenberg
y, September 30, 2014 1:28 PM > *To:* scikit-learn-general@lists.sourceforge.net > *Subject:* Re: [Scikit-learn-general] mutual information > > > > usually, statistically speaking, you compute the MI score to see to which > extent is your observed frequency of cooccurrence di

Re: [Scikit-learn-general] mutual information

2014-09-30 Thread Pagliari, Roberto
So in this case label_predict would be vector X, and label_true vector Y? Thank you, From: Emanuela Boros [mailto:emanuela.bo...@gmail.com] Sent: Tuesday, September 30, 2014 1:28 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] mutual information usually

Re: [Scikit-learn-general] mutual information

2014-09-30 Thread Emanuela Boros
usually, statistically speaking, you compute the MI score to see to which extent is your observed frequency of cooccurrence different from what you would expect, so labels_true and labels_predict. On Tue, Sep 30, 2014 at 7:13 PM, Pagliari, Roberto wrote: > I’m a little confused by the descriptio

[Scikit-learn-general] mutual information

2014-09-30 Thread Pagliari, Roberto
I'm a little confused by the description of mutual information score. What is the meaning of clustering, and why are the inputs called labels_true and labels_predict. Shouldn't mutual info be computed between two generic vectors X and Y? Thanks,

Re: [Scikit-learn-general] Mutual Information

2013-08-28 Thread Matti Lyra
Well actually I've made X and y to both be binary, as in they are just the presence or absence or a feature/class in the vector(s) so they should be fine to pass into mutual_info_score. Matti Lyra DPhil Student Text Analytics Group Chichester 1, R203 School of

Re: [Scikit-learn-general] Mutual Information

2013-08-28 Thread Olivier Grisel
Those two functions are apparently not computing the same thing. The mutual_info_score function is a clustering quality evaluation tool used to compute the mutual information between 2 sets of integer cluster label assignment. At least some of the integer label values must match for the score to be

[Scikit-learn-general] Mutual Information

2013-08-28 Thread Matti Lyra
Hi, I've been looking at using the sklearn.metrics.cluster.mutual_info_score as a feature selection metric for language data. This is quite a common thing to do in the NLP community. My problem however is that it is really slow as I need to iterate over all the features in the dataset and pass t