Hi folks,

I wonder why ExtraTrees with a k = 1 (i.e. single feature evaluated a
time) isn't the optimal RandomForest-based algorithm?

Now into a bit more details: as you guys know, ExtraTrees perform k many
splits and then the split that reduces entropy/Gini/whatever most will
be chosen.

If we set k = 1, then a single random split using a single randomly
chosen feature is made, and it is used (i.e. no need to find the best
split) as it is just that random split).

In my view, k = 1 should be the optimal RandomForest-based algorithm. I
think choosing k > 1, or even choosing a non-random split point (such as
original RandomForest algorithm by Leo Breiman) are all non-optimal.

But as I've tested on the Sattelites dataset from UCI repository, it
turns out that my assumption is wrong. But I don't get why.

Any of you guys know why? I've been pulling my hair on this one. I
cannot think of a single reason why ExtraTrees with k = 1 is not optimal.

Best,
Kevin

------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to