Hi Everybody.
I'm still trying to hack at the trees. This time I stumbled across the
computation of the Gini index.
Could someone please explain this to me?
Hastie, Tishirani and Friedman told me this is computed as
\sum_{k} p_{mk}*(1- p_{mk})
where k enumerates the classes and m denotes a node (I guess that
means in the end, one sums over m)
It is not clear to me how what is done in the code is equivalent to this.
If I understood correctly, this is what the code does:
(\sum_m (n_m**2 - \sum_k n_{mk}**2) / n_m ) / sum_m n_m
where n_{mk} denotes the count of class k in node m,
and n_m is the total count of points in node m.
If I compute both values for the split left=(3,1), right=(1,2),
I end up with 59/72 for the first formula and 19/42 for the second formular.
Can someone tell me what I got wrong?
Thanks,
Andy
------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general