Momop, I think that would warp the robustness of RF. As I understand it, RF averages together the different leaves which are themselves averages. Pruning like you're talking about would risk overfitting to your particular dataset rather than the data-generating process.
On Sat, Nov 26, 2011 at 6:52 PM, Momop Momop <[email protected]> wrote: > > Apologies as the mail got sent before completion. Here's the full text > > I am learning Random Forest and have a basic training question. For my > problem, I "derived" various classifiers (var0,var1...var9). They are > independent, but the intrinsic values from which they are derived overlap. > I get the following data for my RF tree. The question I have is, should I > eliminate the number of classifiers that haven't shown enough importance > (For example, I could scale %IncMSE relatively and may be just pick the top > 3 or 4). > > ------------------------------- > %IncMSE IncNodePurity > Var0 10.84632 7.232559 > var1 24.53021 7.976509 > var2 26.5005 4.653162 > var3 60.18863 21.882258 > var4 11.97568 7.25413 > var5 49.63468 16.968472 > var6 19.55981 10.009517 > var7 10.36669 13.136694 > var8 14.16585 7.818673 > var9 9.75812 7.178831 > ------------------------------- > > Essentially, what I was attempting to do was to choose the best derived > classifier by eliminating some from the above list which doesn't show > noticeable relative impact on MSE. Any guidance or pointers is much > appreciated. Thanks! > > > ________________________________ > > To: "[email protected]" <[email protected]> > Sent: Saturday, November 26, 2011 5:45 PM > Subject: [R-SIG-Finance] Random Forest Classifiers > > I am learning Random Forest and have a basic training question. For my > problem, I "derived" various classifiers (var0,var1...var9). They are > independent, but the intrinsic values from which they are derived overlap. > I get the following data for my RF tree. The question I have is, should I > eliminate the number of classifiers that haven't shown enough importance > (For example, I could scale %IncMSE relatively and may be just pick the top > 3 or 4). > > ------------------------------- > %IncMSE IncNodePurity > Var0 10.84632 7.232559 > var1 24.53021 7.976509 > var2 26.5005 4.653162 > var3 60.18863 21.882258 > var4 11.97568 7.25413 > var5 49.63468 16.968472 > var6 19.55981 10.009517 > var7 10.36669 13.136694 > var8 14.16585 7.818673 > var9 9.75812 7.178831 > ------------------------------- > > [[elided Yahoo spam]] > [[alternative HTML version deleted]] > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > [[alternative HTML version deleted]] > > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > [[alternative HTML version deleted]] _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
