This metric is especially useful if your ground truth labels are not of the
same nature as those you are using for prediction (imagine obtaining
clusters in an unsupervised manner and then associating your data points to
cluster centers. But imagine that you also have some sort of ground truth
labels indicating how it "should be"). Then, without creating a one-to-one
correspondence between labels, you can still evaluate the goodness of label
assignment through mutual information.

What it gives you is a measure of correlation by comparing the joint
histogram of label_predict and label_true (called 'contingency matrix' in
the sklearn code) to a version of the joint histogram supposing the two
vectors were independent (outer product of marginal distributions). The
divergence measure used to quantify this distance between histograms is
called Kullback-Leibler divergence.

Take a look at the code, it is not too long:
https://github.com/scikit-learn/scikit-learn/blob/7c29eca5e7718a70ddcb056f5f2b5ed2d1cc059e/sklearn/metrics/cluster/supervised.py#L496

As for computing this quantity for arbitrary floating point vectors, this
will not work the way you want it using that code. If you want to use
arbitrary floating point vectors, then you first need to specify a
procedure to infer a distribution from them. This can be done e.g. by
binning into a discrete histogram (where you have to specify the bins in
some way), or, in the context of the continuous equivalent of mutual
information, you can also use kernel density estimation and integrate over
the thus estimated densities.

In conclusion, 1) you need a very clear idea about the measure you are
trying to make before choosing this metric and 2) the
sklearn.metrics.mutual_information_score only works on discrete valued
vectors.

HTH,
Michael

On Tue, Sep 30, 2014 at 7:39 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
wrote:

> So in this case label_predict would be vector X, and label_true vector Y?
>
>
>
> Thank you,
>
>
>
> *From:* Emanuela Boros [mailto:emanuela.bo...@gmail.com]
> *Sent:* Tuesday, September 30, 2014 1:28 PM
> *To:* scikit-learn-general@lists.sourceforge.net
> *Subject:* Re: [Scikit-learn-general] mutual information
>
>
>
> usually, statistically speaking, you compute the MI score to see to which
> extent is your observed frequency of cooccurrence different from what you
> would expect, so labels_true and labels_predict.
>
>
>
> On Tue, Sep 30, 2014 at 7:13 PM, Pagliari, Roberto <
> rpagli...@appcomsci.com> wrote:
>
> I’m a little confused by the description of mutual information score.
>
>
>
> What is the meaning of clustering, and why are the inputs called
> labels_true and labels_predict.
>
>
>
> Shouldn’t mutual info be computed between two generic vectors X and Y?
>
>
>
> Thanks,
>
>
>
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> --
>
> Emanuela Boros
>
> CEA-List, LVIC
> LIMSI-CNRS
> Université Paris-Sud
> Orsay, France
>
> Tel. : +33652174595
> Email : emanuela.boros@{cea.fr,limsi.fr,u-psud.fr,gmail.com}
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to