Re: [R-SIG-Finance] Random Forest Classifiers

Chris Waggoner Sat, 26 Nov 2011 18:43:12 -0800

Momop, I think that would warp the robustness of RF. As I understand it, RF
averages together the different leaves which are themselves averages.
Pruning like you're talking about would risk overfitting to your particular
dataset rather than the data-generating process.


On Sat, Nov 26, 2011 at 6:52 PM, Momop Momop <[email protected]> wrote:

>
> Apologies as the  mail got sent before completion. Here's the full text
>
> I am learning Random Forest and have a basic training question. For my
> problem, I "derived" various classifiers (var0,var1...var9). They are
> independent, but the intrinsic values from which they are derived overlap.
> I get the following data for my RF tree. The question I have is, should I
> eliminate the number of classifiers that haven't shown enough importance
> (For example, I could scale %IncMSE relatively and may be just pick the top
> 3 or 4).
>
> -------------------------------
> %IncMSE    IncNodePurity
> Var0    10.84632    7.232559
> var1    24.53021    7.976509
> var2    26.5005    4.653162
> var3    60.18863    21.882258
> var4    11.97568    7.25413
> var5    49.63468    16.968472
> var6    19.55981    10.009517
> var7    10.36669    13.136694
> var8    14.16585    7.818673
> var9    9.75812    7.178831
> -------------------------------
>
> Essentially, what I was attempting to do was to choose the best derived
> classifier by eliminating some from the above list which doesn't show
> noticeable relative impact on MSE. Any guidance or pointers is much
> appreciated. Thanks!
>
>
> ________________________________
>
> To: "[email protected]" <[email protected]>
> Sent: Saturday, November 26, 2011 5:45 PM
> Subject: [R-SIG-Finance] Random Forest Classifiers
>
> I am learning Random Forest and have a basic training question. For my
> problem, I "derived" various classifiers (var0,var1...var9). They are
> independent, but the intrinsic values from which they are derived overlap.
> I get the following data for my RF tree. The question I have is, should I
> eliminate the number of classifiers that haven't shown enough importance
> (For example, I could scale %IncMSE relatively and may be just pick the top
> 3 or 4).
>
> -------------------------------
> %IncMSE    IncNodePurity
> Var0    10.84632    7.232559
> var1    24.53021    7.976509
> var2    26.5005    4.653162
> var3    60.18863    21.882258
> var4    11.97568    7.25413
> var5    49.63468    16.968472
> var6    19.55981    10.009517
> var7    10.36669    13.136694
> var8    14.16585    7.818673
> var9    9.75812    7.178831
> -------------------------------
>
> [[elided Yahoo spam]]
>     [[alternative HTML version deleted]]
>
> _______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should 
go.

Re: [R-SIG-Finance] Random Forest Classifiers

Reply via email to