On 01/16/2012 10:07 AM, Andreas wrote:
> On 01/16/2012 09:44 AM, Andreas wrote:
>    
>> Hi Everybody.
>> I'm still trying to hack at the trees. This time I stumbled across the
>> computation of the Gini index.
>> Could someone please explain this to me?
>> Hastie, Tishirani and Friedman told me this is computed as
>> \sum_{k} p_{mk}*(1- p_{mk})
>> where k enumerates the classes and m denotes a node (I guess that
>> means in the end, one sums over m)
>>
>> It is not clear to me how what is done in the code is equivalent to this.
>> If I understood correctly, this is what the code does:
>>
>> (\sum_m (n_m**2 - \sum_k n_{mk}**2) / n_m ) / sum_m n_m
>>
>> where n_{mk} denotes the count of class k in node m,
>> and n_m is the total count of points in node m.
>>
>> If I compute both values for the split left=(3,1), right=(1,2),
>> I end up with 59/72 for the first formula and 19/42 for the second formular.
>>
>> Can someone tell me what I got wrong?
>>
>>      
> I think I found my mistake. The Gini indexes of the nodes
> are not just summed up but weighted with their counts.
>    
Still not working out :-/

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to