Hi Andy,
Thanks! Will definitely do a github pull request once Miron confirmed he
benchmarked my implementation by running it on the datasets the method
was published with.
I wrote a blog post about it, which explains the differences but in a
quite casual an non rigorous way:
http://danielhomola.com/2015/05/08/borutapy-an-all-relevant-feature-selection-method/
I guess a more technical write-up, with one of the built in datasets
would be more useful for the sklearn audience.. I'm happy to do it if
Miron says everything looks good.
Cheers,
Daniel
On 08/05/15 21:02, Andreas Mueller wrote:
Btw, an example that compares this against existing feature selection
methods that explains differences and advantages would help users and
convince us to merge ;)
On 05/08/2015 02:34 PM, Daniel Homola wrote:
Hi all,
I wrote a couple of weeks ago about implementing the Boruta
all-relevant feature selection method algorithm in Python..
I think it's ready to go now. I wrote fit, transform and
fit_transform methods for it to make it sklearn like.
Here it is:
https://bitbucket.org/danielhomola/boruta_py
Let me know what you think. If anyone thinks this might be worthy of
adding it to the feature selection module, the original author Miron
is happy to give his blessing, and I'm happy work on it further.
Cheers,
Daniel
On 15/04/15 11:03, Daniel Homola wrote:
Hi all,
I needed a multivariate feature selection method for my work. As I'm
working with biological/medical data, where n < p or even n << p I
started to read up on Random Forest based methods, as in my limited
understanding RF copes pretty well with this suboptimal situation.
I came across an R package called
Boruta:https://m2.icm.edu.pl/boruta/
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
After reading the paper and checking some of the pretty impressive
citations I thought I'd try it, but it was really slow. So I thought
I'll reimplement it in Python, because I hoped (based on
thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)
that it will be faster. And it is :) I mean a LOT faster..
I was wondering if this would be something that you would consider
incorporating into the feature selection module of scikit-learn?
If yes, do you have a tutorial or some sort of guidance about how
should I prepare the code, what conventions should I follow, etc?
Cheers,
Daniel Homola
STRATiGRAD PhD Programme
Imperial College London
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general