Hi Daniel.
That sounds potentially interesting.
Is there a widely cited paper for this?
I didn't read the paper, but it looks very similar to RFE(RandomForestClassifier()). Is it qualitatively different from that? Does it use a different feature importance?

btw: your mail is flagged as spam as your link is broken and links to some imperial college internal page.

Cheers,
Andy

On 04/15/2015 05:03 AM, Daniel Homola wrote:
Hi all,

I needed a multivariate feature selection method for my work. As I'm working with biological/medical data, where n < p or even n << p I started to read up on Random Foretst based methods, as in my limited understanding RF copes pretty well with this suboptimal situation.

I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/ <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>

After reading the paper and checking some of the pretty impressive citations I thought I'd try it, but it was really slow. So I thought I'll reimplement it in Python, because I hoped (based on thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>) that it will be faster. And it is :) I mean a LOT faster..

I was wondering if this would be something that you would consider incorporating into the feature selection module of scikit-learn?

If yes, do you have a tutorial or some sort of guidance about how should I prepare the code, what conventions should I follow, etc?

Cheers,

Daniel Homola

STRATiGRAD PhD Programme
Imperial College London


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to