Interesting OT.
I agree totally but I have a different point of view. I think that the
random forest model should be able to automatically detect the nonlinear
relationship between age and reading skills.
In my opinion shoes size can be a relevant variable because it can measure
the "biological age" of a person.
But as you said this really depends by the data and this was only an
example.
Luca
On Sun, Apr 19, 2015 at 8:03 PM, <josef.p...@gmail.com> wrote:
> On Sun, Apr 19, 2015 at 2:38 PM, Luca Puggini <lucapug...@gmail.com>
> wrote:
> > Totally true Josef but I guess that shoesize should not contain more
> > information than age.
> > I was hoping to do not classify it as relevant when age is in the model.
>
> Semi-OT for the random forest question
>
> I thought about the effect of including age.
>
> Actually, I think this is a very interesting example.
> Shoe size interacted with gender is a very good proxy for development,
> if you have also children in your sample.
> I guess shoe size (or height) captures the nonlinearity, the curvature
> of decreasing improvements in reading skills much better than the
> linear increase in age. It also depends on how shoe size and reading
> skills are measured.
>
> Josef
>
> >
> > @Gilles
> > thanks a lot.
> > I was also reading you thesis.
> > My first impression is that while I think that max features = 1 is a very
> > good choice to avoid the bias toward features with more unique values I
> > think that max_features = n_features is a wiser choice to dial with the
> > problem of correlated variables.
> >
> > Look for example this very simple example.
> >
> > X = np.random.normal(0, 1, (200,4))
> > X[:,2] = X[:,1] + np.random.normal(0, 1, 200)
> > y = X[:,0] + X[:,1] + np.random.normal(0, 0.5, 200)
> > index1 = np.argwhere(y>y.mean()).ravel()
> > index0 = np.argwhere(y<y.mean()).ravel()
> > y[index1] = 1
> > y[index0] = 0
> >
> >>>> et = ExtraTreesClassifier(n_estimators=10000, max_features=1,
> >>>> max_depth=5).fit(X,y)
> >>>> et.feature_importances_
> > array([ 0.34785118, 0.41261715, 0.18472051, 0.05481116])
> >>>> et = ExtraTreesClassifier(n_estimators=10000, max_features=4,
> >>>> max_depth=5).fit(X,y)
> >>>> et.feature_importances_
> > array([ 0.41753879, 0.50921512, 0.05368199, 0.0195641 ])
> >>>>
> >
> >
> >
> > Anyway thanks a lot :-).
> >
> > Best,
> > Luca
> >
> > On Sun, Apr 19, 2015 at 4:00 PM, <josef.p...@gmail.com> wrote:
> >>
> >> On Sun, Apr 19, 2015 at 10:05 AM, Gilles Louppe <g.lou...@gmail.com>
> >> wrote:
> >> > Hi Luca,
> >> >
> >> > If you want to find all relevant features, I would recommend using
> >> > ExtraTreesClassifier with max_features=1 and limited depth in order to
> >> > avoid
> >> > this kind of bias due to estimation errors. E.g., try with max_depth=3
> >> > to 5
> >> > or using max_leaf_nodes.
> >> >
> >> > Hope this helps,
> >> > Gilles
> >> >
> >> >
> >> >
> >> > On 19 April 2015 at 14:30, Luca Puggini <lucapug...@gmail.com> wrote:
> >> >>
> >> >> Hi all,
> >> >> I am using random forest and extra trees importance.
> >> >> I am wondering if there is any method to dial with correlated
> >> >> variables.
> >> >>
> >> >> Suppose for example the R party package.
> >> >> In page 30 of the documentation
> >> >> http://cran.r-project.org/web/packages/party/party.pdf a measure of
> >> >> 'conditional importance' is described.
> >> >>
> >> >> If I run importance with the ensamble methods in sklearn I get that
> >> >> shoesize is a relevant predictor for reading skills.
> >>
> >> In my family, shoesize is a perfect predictor for reading skills for 3
> >> out of 5 observations.
> >>
> >> Josef
> >>
> >>
> >> >>
> >> >> Is there any way to avoid that?
> >> >>
> >> >> Thanks a lot!
> >> >>
> >> >>
> >> >>
> >> >>
> ------------------------------------------------------------------------------
> >> >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> >> >> Develop your own process in accordance with the BPMN 2 standard
> >> >> Learn Process modeling best practices with Bonita BPM through live
> >> >> exercises
> >> >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> >> >> event?utm_
> >> >>
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> >> >> _______________________________________________
> >> >> Scikit-learn-general mailing list
> >> >> Scikit-learn-general@lists.sourceforge.net
> >> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >> >>
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> >> > Develop your own process in accordance with the BPMN 2 standard
> >> > Learn Process modeling best practices with Bonita BPM through live
> >> > exercises
> >> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> >> > event?utm_
> >> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> >> > _______________________________________________
> >> > Scikit-learn-general mailing list
> >> > Scikit-learn-general@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >> >
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> >> Develop your own process in accordance with the BPMN 2 standard
> >> Learn Process modeling best practices with Bonita BPM through live
> >> exercises
> >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> >> event?utm_
> >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general