Re: [Scikit-learn-general] Questions on random forests

Gilles Louppe Mon, 28 Jul 2014 00:20:08 -0700

Hi Kevin,

Interesting question. Your point is true provided you have an infinite
amount of training data. In that case, you can indeed show that an
infinitely large forest of extremely randomized trees built for K=1
converges towards an optimal model (the Bayes model).


This result however does not hold as soon as you have a finite amount
of data, as you observe empirically. If you study the bias-variance
decomposition of the error of your model in the finite case, then you
can show that the generalization error decomposes
into noise(x) + bias^2(x) + var(x). In the case of an ensemble, you
can further show that var(x) decomposes into  rho(x) * sigma^2(x) +
(1-rho(x))/M * sigma^2(x), where rho(x) is the correlation coefficient
between the predictions of two random trees, sigma^2(x) is the
variance of a random single tree and M is the size of the forest. In
this setting, we have the following combined effects:
- The larger K, the lower the bias of the ensemble; and vice-versa.
- The larger K, the larger rho(x) (because the less random and the
more correlated the trees) and therefore the less variance can be
reduced from ensembling; and vice-versa. In other words, the larger K,
usually the larger the variance.

You result is therefore explained by the fact that, for you dataset,
increasing K has an effect on bias which is larger than the effect on
variance, resulting in an overall lower error for a larger value of K
than for K=1.

For a detailed discussion, see
http://orbi.ulg.ac.be/handle/2268/170309 (chapter 4 and references
therein + 4.3.2 for an example).

Hope this helps,
Gilles

On 28 July 2014 05:00, Kevin Leech <kevin.t.le...@gmail.com> wrote:
> Hi folks,
>
> I wonder why ExtraTrees with a k = 1 (i.e. single feature evaluated a
> time) isn't the optimal RandomForest-based algorithm?
>
> Now into a bit more details: as you guys know, ExtraTrees perform k many
> splits and then the split that reduces entropy/Gini/whatever most will
> be chosen.
>
> If we set k = 1, then a single random split using a single randomly
> chosen feature is made, and it is used (i.e. no need to find the best
> split) as it is just that random split).
>
> In my view, k = 1 should be the optimal RandomForest-based algorithm. I
> think choosing k > 1, or even choosing a non-random split point (such as
> original RandomForest algorithm by Leo Breiman) are all non-optimal.
>
> But as I've tested on the Sattelites dataset from UCI repository, it
> turns out that my assumption is wrong. But I don't get why.
>
> Any of you guys know why? I've been pulling my hair on this one. I
> cannot think of a single reason why ExtraTrees with k = 1 is not optimal.
>
> Best,
> Kevin
>
> ------------------------------------------------------------------------------
> Infragistics Professional
> Build stunning WinForms apps today!
> Reboot your WinForms applications with our WinForms controls.
> Build a bridge from your legacy apps to the future.
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Questions on random forests

Reply via email to