Hi Kevin, Interesting question. Your point is true provided you have an infinite amount of training data. In that case, you can indeed show that an infinitely large forest of extremely randomized trees built for K=1 converges towards an optimal model (the Bayes model).
This result however does not hold as soon as you have a finite amount of data, as you observe empirically. If you study the bias-variance decomposition of the error of your model in the finite case, then you can show that the generalization error decomposes into noise(x) + bias^2(x) + var(x). In the case of an ensemble, you can further show that var(x) decomposes into rho(x) * sigma^2(x) + (1-rho(x))/M * sigma^2(x), where rho(x) is the correlation coefficient between the predictions of two random trees, sigma^2(x) is the variance of a random single tree and M is the size of the forest. In this setting, we have the following combined effects: - The larger K, the lower the bias of the ensemble; and vice-versa. - The larger K, the larger rho(x) (because the less random and the more correlated the trees) and therefore the less variance can be reduced from ensembling; and vice-versa. In other words, the larger K, usually the larger the variance. You result is therefore explained by the fact that, for you dataset, increasing K has an effect on bias which is larger than the effect on variance, resulting in an overall lower error for a larger value of K than for K=1. For a detailed discussion, see http://orbi.ulg.ac.be/handle/2268/170309 (chapter 4 and references therein + 4.3.2 for an example). Hope this helps, Gilles On 28 July 2014 05:00, Kevin Leech <kevin.t.le...@gmail.com> wrote: > Hi folks, > > I wonder why ExtraTrees with a k = 1 (i.e. single feature evaluated a > time) isn't the optimal RandomForest-based algorithm? > > Now into a bit more details: as you guys know, ExtraTrees perform k many > splits and then the split that reduces entropy/Gini/whatever most will > be chosen. > > If we set k = 1, then a single random split using a single randomly > chosen feature is made, and it is used (i.e. no need to find the best > split) as it is just that random split). > > In my view, k = 1 should be the optimal RandomForest-based algorithm. I > think choosing k > 1, or even choosing a non-random split point (such as > original RandomForest algorithm by Leo Breiman) are all non-optimal. > > But as I've tested on the Sattelites dataset from UCI repository, it > turns out that my assumption is wrong. But I don't get why. > > Any of you guys know why? I've been pulling my hair on this one. I > cannot think of a single reason why ExtraTrees with k = 1 is not optimal. > > Best, > Kevin > > ------------------------------------------------------------------------------ > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Infragistics Professional Build stunning WinForms apps today! Reboot your WinForms applications with our WinForms controls. Build a bridge from your legacy apps to the future. http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general