On 01/16/2012 10:12 AM, Andreas wrote:
> On 01/16/2012 10:07 AM, Andreas wrote:
>    
>> On 01/16/2012 09:44 AM, Andreas wrote:
>>
>>      
>>> Hi Everybody.
>>> I'm still trying to hack at the trees. This time I stumbled across the
>>> computation of the Gini index.
>>> Could someone please explain this to me?
>>> Hastie, Tishirani and Friedman told me this is computed as
>>> \sum_{k} p_{mk}*(1- p_{mk})
>>> where k enumerates the classes and m denotes a node (I guess that
>>> means in the end, one sums over m)
>>>
>>> It is not clear to me how what is done in the code is equivalent to this.
>>> If I understood correctly, this is what the code does:
>>>
>>> (\sum_m (n_m**2 - \sum_k n_{mk}**2) / n_m ) / sum_m n_m
>>>
>>> where n_{mk} denotes the count of class k in node m,
>>> and n_m is the total count of points in node m.
>>>
>>> If I compute both values for the split left=(3,1), right=(1,2),
>>> I end up with 59/72 for the first formula and 19/42 for the second formular.
>>>
>>> Can someone tell me what I got wrong?
>>>
>>>
>>>        
>> I think I found my mistake. The Gini indexes of the nodes
>> are not just summed up but weighted with their counts.
>>
>>      
> Still not working out :-/
>    
Ok never mind, also sorry for spamming the list :-/

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to