On Sun, Apr 19, 2015 at 2:38 PM, Luca Puggini <lucapug...@gmail.com> wrote:
> Totally true Josef but I guess that shoesize should not contain more
> information than age.
> I was hoping to do not classify it as relevant when age is in the model.

Semi-OT for the random forest question

I thought about the effect of including age.

Actually, I think this is a very interesting example.
Shoe size interacted with gender is a very good proxy for development,
if you have also children in your sample.
I guess shoe size (or height) captures the nonlinearity, the curvature
of decreasing improvements in reading skills much better than the
linear increase in age.  It also depends on how shoe size and reading
skills are measured.

Josef

>
> @Gilles
> thanks a lot.
> I was also reading you thesis.
> My first impression is that while I think that max features = 1 is a very
> good choice to avoid the bias toward features with more unique values I
> think that max_features = n_features is a wiser choice to dial with the
> problem of correlated variables.
>
> Look for example this very simple example.
>
> X = np.random.normal(0, 1, (200,4))
> X[:,2] = X[:,1] + np.random.normal(0, 1, 200)
> y = X[:,0] + X[:,1] + np.random.normal(0, 0.5, 200)
> index1 = np.argwhere(y>y.mean()).ravel()
> index0 = np.argwhere(y<y.mean()).ravel()
> y[index1] = 1
> y[index0] = 0
>
>>>> et = ExtraTreesClassifier(n_estimators=10000, max_features=1,
>>>> max_depth=5).fit(X,y)
>>>> et.feature_importances_
> array([ 0.34785118,  0.41261715,  0.18472051,  0.05481116])
>>>> et = ExtraTreesClassifier(n_estimators=10000, max_features=4,
>>>> max_depth=5).fit(X,y)
>>>> et.feature_importances_
> array([ 0.41753879,  0.50921512,  0.05368199,  0.0195641 ])
>>>>
>
>
>
> Anyway thanks a lot :-).
>
> Best,
> Luca
>
> On Sun, Apr 19, 2015 at 4:00 PM, <josef.p...@gmail.com> wrote:
>>
>> On Sun, Apr 19, 2015 at 10:05 AM, Gilles Louppe <g.lou...@gmail.com>
>> wrote:
>> > Hi Luca,
>> >
>> > If you want to find all relevant features, I would recommend using
>> > ExtraTreesClassifier with max_features=1 and limited depth in order to
>> > avoid
>> > this kind of bias due to estimation errors. E.g., try with max_depth=3
>> > to 5
>> > or using max_leaf_nodes.
>> >
>> > Hope this helps,
>> > Gilles
>> >
>> >
>> >
>> > On 19 April 2015 at 14:30, Luca Puggini <lucapug...@gmail.com> wrote:
>> >>
>> >> Hi all,
>> >> I am using random forest and extra trees importance.
>> >> I am wondering if there is any method to dial with correlated
>> >> variables.
>> >>
>> >> Suppose for example the R party package.
>> >> In page 30 of the documentation
>> >> http://cran.r-project.org/web/packages/party/party.pdf a measure of
>> >> 'conditional importance' is described.
>> >>
>> >> If I run importance with the ensamble methods in sklearn I get that
>> >> shoesize is a relevant predictor for reading skills.
>>
>> In my family, shoesize is a perfect predictor for reading skills for 3
>> out of 5 observations.
>>
>> Josef
>>
>>
>> >>
>> >> Is there any way to avoid that?
>> >>
>> >> Thanks a lot!
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> >> Develop your own process in accordance with the BPMN 2 standard
>> >> Learn Process modeling best practices with Bonita BPM through live
>> >> exercises
>> >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> >> event?utm_
>> >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> >> _______________________________________________
>> >> Scikit-learn-general mailing list
>> >> Scikit-learn-general@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> > Develop your own process in accordance with the BPMN 2 standard
>> > Learn Process modeling best practices with Bonita BPM through live
>> > exercises
>> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> > event?utm_
>> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to