Hi Dan.
I saw that paper, but it is not well-cited.
My question is more how different this is from what we already have.
So it looks like some (5) random control features are added and the
features importances are compared against the control.
The question is whether the feature importance that is used is different
from ours. Gilles?
If not, this could be hard to add. If it is the same, I think a
meta-estimator would be a nice addition to the feature selection module.
Cheers,
Andy
On 04/15/2015 11:32 AM, Daniel Homola wrote:
Hi Andy,
This is the paper: http://www.jstatsoft.org/v36/i11/ which was cited
79 times according to Google Scholar.
Regarding your second point, the first 3 questions of the FAQ on the
Boruta website answers it I guess.. https://m2.icm.edu.pl/boruta/
1. *So, what's so special about Boruta?* It is an all relevant
feature selection method, while most other are minimal optimal;
this means it tries to find all features carrying information
usable for prediction, rather than finding a possibly compact
subset of features on which some classifier has a minimal error.
Here is a paper with the details.
2. *Why should I care?* For a start, when you try to understand the
phenomenon that made your data, you should care about all factors
that contribute to it, not just the bluntest signs of it in
context of your methodology (yes, minimal optimal set of features
by definition depends on your classifier choice).
3. *But I only care about good classification accuracy!* So you also
care about having a robust model; in p≫n problems, one can usually
cherry-pick a nonsense subset of features which yields good or
even perfect classification – minimal optimal methods can easily
get deceived by that, leaving you with an overfitted model and no
sign that something is wrong. See this or that for an example.
I'm not an ML expert by any means but it seemed reasonable to me. Any
thoughts?
Cheers,
Dan
On 15/04/15 16:23, Andreas Mueller wrote:
Hi Daniel.
That sounds potentially interesting.
Is there a widely cited paper for this?
I didn't read the paper, but it looks very similar to
RFE(RandomForestClassifier()).
Is it qualitatively different from that? Does it use a different
feature importance?
btw: your mail is flagged as spam as your link is broken and links to
some imperial college internal page.
Cheers,
Andy
On 04/15/2015 05:03 AM, Daniel Homola wrote:
Hi all,
I needed a multivariate feature selection method for my work. As I'm
working with biological/medical data, where n < p or even n << p I
started to read up on Random Foretst based methods, as in my limited
understanding RF copes pretty well with this suboptimal situation.
I came across an R package called
Boruta:https://m2.icm.edu.pl/boruta/
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
After reading the paper and checking some of the pretty impressive
citations I thought I'd try it, but it was really slow. So I thought
I'll reimplement it in Python, because I hoped (based on
thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)
that it will be faster. And it is :) I mean a LOT faster..
I was wondering if this would be something that you would consider
incorporating into the feature selection module of scikit-learn?
If yes, do you have a tutorial or some sort of guidance about how
should I prepare the code, what conventions should I follow, etc?
Cheers,
Daniel Homola
STRATiGRAD PhD Programme
Imperial College London
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general