Hi Dan.
I saw that paper, but it is not well-cited.
My question is more how different this is from what we already have.
So it looks like some (5) random control features are added and the features importances are compared against the control.

The question is whether the feature importance that is used is different from ours. Gilles?

If not, this could be hard to add. If it is the same, I think a meta-estimator would be a nice addition to the feature selection module.

Cheers,
Andy


On 04/15/2015 11:32 AM, Daniel Homola wrote:
Hi Andy,

This is the paper: http://www.jstatsoft.org/v36/i11/ which was cited 79 times according to Google Scholar.

Regarding your second point, the first 3 questions of the FAQ on the Boruta website answers it I guess.. https://m2.icm.edu.pl/boruta/

 1. *So, what's so special about Boruta?* It is an all relevant
    feature selection method, while most other are minimal optimal;
    this means it tries to find all features carrying information
    usable for prediction, rather than finding a possibly compact
    subset of features on which some classifier has a minimal error.
    Here is a paper with the details.
 2. *Why should I care?* For a start, when you try to understand the
    phenomenon that made your data, you should care about all factors
    that contribute to it, not just the bluntest signs of it in
    context of your methodology (yes, minimal optimal set of features
    by definition depends on your classifier choice).
 3. *But I only care about good classification accuracy!* So you also
    care about having a robust model; in p≫n problems, one can usually
    cherry-pick a nonsense subset of features which yields good or
    even perfect classification – minimal optimal methods can easily
    get deceived by that, leaving you with an overfitted model and no
    sign that something is wrong. See this or that for an example.

I'm not an ML expert by any means but it seemed reasonable to me. Any thoughts?

Cheers,
Dan




On 15/04/15 16:23, Andreas Mueller wrote:
Hi Daniel.
That sounds potentially interesting.
Is there a widely cited paper for this?
I didn't read the paper, but it looks very similar to RFE(RandomForestClassifier()). Is it qualitatively different from that? Does it use a different feature importance?

btw: your mail is flagged as spam as your link is broken and links to some imperial college internal page.

Cheers,
Andy

On 04/15/2015 05:03 AM, Daniel Homola wrote:
Hi all,

I needed a multivariate feature selection method for my work. As I'm working with biological/medical data, where n < p or even n << p I started to read up on Random Foretst based methods, as in my limited understanding RF copes pretty well with this suboptimal situation.

I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/ <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>

After reading the paper and checking some of the pretty impressive citations I thought I'd try it, but it was really slow. So I thought I'll reimplement it in Python, because I hoped (based on thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>) that it will be faster. And it is :) I mean a LOT faster..

I was wondering if this would be something that you would consider incorporating into the feature selection module of scikit-learn?

If yes, do you have a tutorial or some sort of guidance about how should I prepare the code, what conventions should I follow, etc?

Cheers,

Daniel Homola

STRATiGRAD PhD Programme
Imperial College London


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to