I made a pull request last week with the code.Below is my reply to the last
comment of the current conversation of the pull request that should respond to
some of your inetrrogations.So, we can continue the conversation directly on
the gitub scikit-learn site on the pull request page cdamon:MI_RenyiCCParzen.
I put some references in the code: For the mutual information estimation, the
main reference is - D. XU and D. Erdogmuns, Renyi's entropy, divergence and
their nonparametric estimators(J.C. Principe, Information Theoretic Learning:
Renyi’s Entropy and Kernel 47Perspectives, Information Science and Statistics,
DOI 10.1007/978-1-4419-1570-2 2, Springer Science+Business Media, LLC 2010)For
Features selection based on this mutual information, below are some references
:- Peng et al: feature selection based on mutual information : criteria of
max-dependency, max-relevance and min-redundancy (IEEE transactions on pattern
analysis and machine intelligence, 2005)- Kari Torkkola, Feature Extraction by
Non-Parametric Mutual InformationMaximization (Journal of Machine Learning
Research 3 (2003) 1415-1438,
http://www.jmlr.org/papers/volume3/torkkola03a/torkkola03a.pdf)- Leonardo
Macrini, Leonardo Gonçalves, Application of Rényi Entropy and Mutual
Informationof Cauchy-Schwartz in Selecting Variables
It is also possible to use standard greedy algorithm for fetaures selection
based on MI. I will look at the interface of the classes in the
feature_selection module and try to compare a feature selection method based on
MI with other methods. Do you think about a specific comparison?
Mutual information estimation may be integrated with other metrics of the same
kind since it may be used for different purposes.I made some comparisons with
other MI estimation obtained with histogram, Kraskov or knn aproaches and the
proposed estimation based on Renyi entropy and Cauchy Schwartz divergence seems
to be more accurate.
Have a nice day,Cécilia
> Today's Topics:
>
> 1. Re: Re : Pull Request : Renyi entropy and Cauchy-Schwartz
> mutual information (Andy)
> 2. Re: SVC.predict_proba result inconsistent with SVC.predict
> result (shalu jhanwar)
> 3. grid search random state (Pagliari, Roberto)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 24 Feb 2015 09:47:47 -0500
> From: Andy <t3k...@gmail.com>
> Subject: Re: [Scikit-learn-general] Re : Pull Request : Renyi entropy
> and Cauchy-Schwartz mutual information
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <54ec8f13.5070...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> I agree, but I'm not sure it the one that Cecilia talks about is a good fit.
> MI based feature selection is still a field of pretty active research,
> right? Is there a good review paper?
> Or some set of standard algorithms?
>
> On 02/23/2015 12:49 PM, Fred Mailhot wrote:
> > A good MI-based feature selector would be welcome, I think. Well, by
> > me, anyway.
> >
> > On 23 February 2015 at 09:37, Andy <t3k...@gmail.com
> > <mailto:t3k...@gmail.com>> wrote:
> >
> > Hi Cecilia.
> > An MI estimate currently seems a bit out of scope of sklearn.
> > What context would a user apply it in?
> > Sklearn currently contains more out-of-the-box methods, while an
> > MI estimator seems more like a building block.
> >
> > Cheers,
> > Andy
> >
> >
> >
> > On 02/23/2015 06:01 AM, c?cilia wrote:
> >> Hi,
> >> May you tell me if you are interested by this measure (see my previous
> >> mail below)? In which part of scikit-learn may I develop it : clustering,
> >> feature selection?
> >> Thanks,
> >> C?cilia
> >> Hi,
> >> I have developped a script on mutual information estimation based on
> >> Renyi entropy Cauchy-Schwartz divergence (and Parzen-Window function for
> >> continous variables).This script allows to estimate MI between two
> >> disctete, continous and mixed (discrete and continous) variables. I'll
> >> plan to first parallelize the code and second to improve the code in order
> >> to estime MI between more than two features.
> >> If you are interested by this script, I can push the first version and
> >> modify it according to your feedback.
> >> Let me know.Greetings,C?cilia Damon
> >>
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> >> from Actuate! Instantly Supercharge Your Business Reports and
> >> Dashboards
> >> with Interactivity, Sharing, Native Excel Exports, App Integration &
> >> more
> >> Get technology previously reserved for billion-dollar corporations,
> >> FREE
> >>
> >> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> >>
> >>
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> <mailto:Scikit-learn-general@lists.sourceforge.net>
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> > from Actuate! Instantly Supercharge Your Business Reports and
> > Dashboards
> > with Interactivity, Sharing, Native Excel Exports, App Integration
> > & more
> > Get technology previously reserved for billion-dollar
> > corporations, FREE
> >
> > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > <mailto:Scikit-learn-general@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> > from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> > with Interactivity, Sharing, Native Excel Exports, App Integration & more
> > Get technology previously reserved for billion-dollar corporations, FREE
> > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> >
> >
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Wed, 25 Feb 2015 14:33:19 +0100
> From: shalu jhanwar <shalu.jhanwa...@gmail.com>
> Subject: Re: [Scikit-learn-general] SVC.predict_proba result
> inconsistent with SVC.predict result
> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
> <ca+zszp80u8sdkp9on2s75a+q2g6rh04rde8ye3vbhkmsxdw...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
>
> I'm facing the same problem with predict_proba for Random_forest
> classifier. I want to get a confidence value for each class and each
> prediction. But as shown here, that probability values are not consistent
> with prediction always so I was looking for decision_function method for
> random forest, but didn't find.
>
> Can anyone suggest me how can I get decision scores in case of random
> forest?
>
> thanks!
> Shalu
>
> On Thu, Jun 26, 2014 at 10:46 AM, Lars Buitinck <larsm...@gmail.com> wrote:
>
> > 2014-06-26 9:15 GMT+02:00 Andy <t3k...@gmail.com>:
> > > Maybe the calibration is not used for prediction? That would be a bit
> > > odd, though...
> >
> > That's exactly what's going on. Prediction is consistent with
> > decision_function, but not predict_proba.
> >
> >
> > ------------------------------------------------------------------------------
> > Open source business process management suite built on Java and Eclipse
> > Turn processes into business applications with Bonita BPM Community Edition
> > Quickly connect people, data, and systems into organized workflows
> > Winner of BOSSIE, CODIE, OW2 and Gartner awards
> > http://p.sf.net/sfu/Bonitasoft
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 3
> Date: Wed, 25 Feb 2015 01:26:50 +0000
> From: "Pagliari, Roberto" <rpagli...@appcomsci.com>
> Subject: [Scikit-learn-general] grid search random state
> To: "scikit-learn-general@lists.sourceforge.net"
> <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
>
> <7f8b451e34fcdb459e934b4f5045109683b...@rrc-ats-exmb2.ats.atsinnovate.com>
>
> Content-Type: text/plain; charset="us-ascii"
>
> I have two questions about gridsearchcv
>
>
> 1. Is it possible to fix the random state of the underlying kfold, for
> testing purposes?
>
> 2. When passing parameters, such as C and gamma for svm, does grid
> search go through them in order?
>
> Thank you,
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 61, Issue 79
> ****************************************************
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general