Hi all,

I wrote a couple of weeks ago about implementing the Boruta all-relevant feature selection method algorithm in Python..

I think it's ready to go now. I wrote fit, transform and fit_transform methods for it to make it sklearn like.

Here it is:
https://bitbucket.org/danielhomola/boruta_py

Let me know what you think. If anyone thinks this might be worthy of adding it to the feature selection module, the original author Miron is happy to give his blessing, and I'm happy work on it further.

Cheers,
Daniel

On 15/04/15 11:03, Daniel Homola wrote:
Hi all,

I needed a multivariate feature selection method for my work. As I'm working with biological/medical data, where n < p or even n << p I started to read up on Random Forest based methods, as in my limited understanding RF copes pretty well with this suboptimal situation.

I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/ <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>

After reading the paper and checking some of the pretty impressive citations I thought I'd try it, but it was really slow. So I thought I'll reimplement it in Python, because I hoped (based on thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>) that it will be faster. And it is :) I mean a LOT faster..

I was wondering if this would be something that you would consider incorporating into the feature selection module of scikit-learn?

If yes, do you have a tutorial or some sort of guidance about how should I prepare the code, what conventions should I follow, etc?

Cheers,

Daniel Homola

STRATiGRAD PhD Programme
Imperial College London

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to