Hi folks, I wonder why ExtraTrees with a k = 1 (i.e. single feature evaluated a time) isn't the optimal RandomForest-based algorithm?
Now into a bit more details: as you guys know, ExtraTrees perform k many splits and then the split that reduces entropy/Gini/whatever most will be chosen. If we set k = 1, then a single random split using a single randomly chosen feature is made, and it is used (i.e. no need to find the best split) as it is just that random split). In my view, k = 1 should be the optimal RandomForest-based algorithm. I think choosing k > 1, or even choosing a non-random split point (such as original RandomForest algorithm by Leo Breiman) are all non-optimal. But as I've tested on the Sattelites dataset from UCI repository, it turns out that my assumption is wrong. But I don't get why. Any of you guys know why? I've been pulling my hair on this one. I cannot think of a single reason why ExtraTrees with k = 1 is not optimal. Best, Kevin ------------------------------------------------------------------------------ Infragistics Professional Build stunning WinForms apps today! Reboot your WinForms applications with our WinForms controls. Build a bridge from your legacy apps to the future. http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general