Dear all,
I migrated my Python implementation of the Boruta algorithm to:
https://github.com/danielhomola/boruta_py
I also implemented 3 mutual information based feature selection (JMI,
JMIM, MRMR) methods and wrapped them up in scikit-learn like interface:
https://github.com/danielhomola/mifs
Could you please have a look at it? I'm writing a blog post
demonstrating their strengths against existing methods. Would you
require anything else to possibly include these in the next release?
Thanks a lot,
Daniel
On 05/08/2015 08:22 PM, Andreas Mueller wrote:
It doesn't need to be super technical, and we try to keep the user
guide "easy to understand". No bonus points for unnecessary latex ;)
The example should be as illustrative and fair as possible, and
built-in datasets are preferred. It shouldn't be to heavy-weight, though.
If you like, you can show off some plots in the PR, that is always
very welcome.
On 05/08/2015 03:15 PM, Daniel Homola wrote:
Hi Andy,
Thanks! Will definitely do a github pull request once Miron confirmed
he benchmarked my implementation by running it on the datasets the
method was published with.
I wrote a blog post about it, which explains the differences but in a
quite casual an non rigorous way:
http://danielhomola.com/2015/05/08/borutapy-an-all-relevant-feature-selection-method/
I guess a more technical write-up, with one of the built in datasets
would be more useful for the sklearn audience.. I'm happy to do it if
Miron says everything looks good.
Cheers,
Daniel
On 08/05/15 21:02, Andreas Mueller wrote:
Btw, an example that compares this against existing feature
selection methods that explains differences and advantages would
help users and convince us to merge ;)
On 05/08/2015 02:34 PM, Daniel Homola wrote:
Hi all,
I wrote a couple of weeks ago about implementing the Boruta
all-relevant feature selection method algorithm in Python..
I think it's ready to go now. I wrote fit, transform and
fit_transform methods for it to make it sklearn like.
Here it is:
https://bitbucket.org/danielhomola/boruta_py
Let me know what you think. If anyone thinks this might be worthy
of adding it to the feature selection module, the original author
Miron is happy to give his blessing, and I'm happy work on it further.
Cheers,
Daniel
On 15/04/15 11:03, Daniel Homola wrote:
Hi all,
I needed a multivariate feature selection method for my work. As
I'm working with biological/medical data, where n < p or even n <<
p I started to read up on Random Forest based methods, as in my
limited understanding RF copes pretty well with this suboptimal
situation.
I came across an R package called
Boruta:https://m2.icm.edu.pl/boruta/
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
After reading the paper and checking some of the pretty impressive
citations I thought I'd try it, but it was really slow. So I
thought I'll reimplement it in Python, because I hoped (based on
thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)
that it will be faster. And it is :) I mean a LOT faster..
I was wondering if this would be something that you would consider
incorporating into the feature selection module of scikit-learn?
If yes, do you have a tutorial or some sort of guidance about how
should I prepare the code, what conventions should I follow, etc?
Cheers,
Daniel Homola
STRATiGRAD PhD Programme
Imperial College London
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general