[Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

Daniel Homola Wed, 15 Apr 2015 02:05:28 -0700

Hi all,

I needed a multivariate feature selection method for my work. As I'mworking with biological/medical data, where n < p or even n << p Istarted to read up on Random Forest based methods, as in my limitedunderstanding RF copes pretty well with this suboptimal situation.

I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>

After reading the paper and checking some of the pretty impressivecitations I thought I'd try it, but it was really slow. So I thoughtI'll reimplement it in Python, because I hoped (based onthishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)that it will be faster. And it is :) I mean a LOT faster..

I was wondering if this would be something that you would considerincorporating into the feature selection module of scikit-learn?

If yes, do you have a tutorial or some sort of guidance about how shouldI prepare the code, what conventions should I follow, etc?


Cheers,

Daniel Homola

STRATiGRAD PhD Programme
Imperial College London

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Contributing to scikit-learn with a re-implementation of a Random Forest based iterative feature selection method

Reply via email to