Hi all,I needed a multivariate feature selection method for my work. As I'm working with biological/medical data, where n < p or even n << p I started to read up on Random Forest based methods, as in my limited understanding RF copes pretty well with this suboptimal situation.
I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/ <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
After reading the paper and checking some of the pretty impressive citations I thought I'd try it, but it was really slow. So I thought I'll reimplement it in Python, because I hoped (based on thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>) that it will be faster. And it is :) I mean a LOT faster..
I was wondering if this would be something that you would consider incorporating into the feature selection module of scikit-learn?
If yes, do you have a tutorial or some sort of guidance about how should I prepare the code, what conventions should I follow, etc?
Cheers, Daniel Homola STRATiGRAD PhD Programme Imperial College London
------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general