Totally true Josef but I guess that shoesize should not contain more
information than age.
I was hoping to do not classify it as relevant when age is in the model.

@Gilles
thanks a lot.
I was also reading you thesis.
My first impression is that while I think that max features = 1 is a very
good choice to avoid the bias toward features with more unique values I
think that max_features = n_features is a wiser choice to dial with the
problem of correlated variables.

Look for example this very simple example.

X = np.random.normal(0, 1, (200,4))
X[:,2] = X[:,1] + np.random.normal(0, 1, 200)
y = X[:,0] + X[:,1] + np.random.normal(0, 0.5, 200)
index1 = np.argwhere(y>y.mean()).ravel()
index0 = np.argwhere(y<y.mean()).ravel()
y[index1] = 1
y[index0] = 0

>>> et = ExtraTreesClassifier(n_estimators=10000, max_features=1,
max_depth=5).fit(X,y)
>>> et.feature_importances_
array([ 0.34785118,  0.41261715,  0.18472051,  0.05481116])
>>> et = ExtraTreesClassifier(n_estimators=10000, max_features=4,
max_depth=5).fit(X,y)
>>> et.feature_importances_
array([ 0.41753879,  0.50921512,  0.05368199,  0.0195641 ])
>>>



Anyway thanks a lot :-).

Best,
Luca

On Sun, Apr 19, 2015 at 4:00 PM, <josef.p...@gmail.com> wrote:

> On Sun, Apr 19, 2015 at 10:05 AM, Gilles Louppe <g.lou...@gmail.com>
> wrote:
> > Hi Luca,
> >
> > If you want to find all relevant features, I would recommend using
> > ExtraTreesClassifier with max_features=1 and limited depth in order to
> avoid
> > this kind of bias due to estimation errors. E.g., try with max_depth=3
> to 5
> > or using max_leaf_nodes.
> >
> > Hope this helps,
> > Gilles
> >
> >
> >
> > On 19 April 2015 at 14:30, Luca Puggini <lucapug...@gmail.com> wrote:
> >>
> >> Hi all,
> >> I am using random forest and extra trees importance.
> >> I am wondering if there is any method to dial with correlated variables.
> >>
> >> Suppose for example the R party package.
> >> In page 30 of the documentation
> >> http://cran.r-project.org/web/packages/party/party.pdf a measure of
> >> 'conditional importance' is described.
> >>
> >> If I run importance with the ensamble methods in sklearn I get that
> >> shoesize is a relevant predictor for reading skills.
>
> In my family, shoesize is a perfect predictor for reading skills for 3
> out of 5 observations.
>
> Josef
>
>
> >>
> >> Is there any way to avoid that?
> >>
> >> Thanks a lot!
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> >> Develop your own process in accordance with the BPMN 2 standard
> >> Learn Process modeling best practices with Bonita BPM through live
> >> exercises
> >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> >> event?utm_
> >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to