Re: [sympy] Statistics Variance

Ronan Lamy Wed, 14 Sep 2011 08:24:02 -0700

Le mercredi 14 septembre 2011 à 08:47 -0500, Matthew Rocklin a écrit :
> Hi Srinivas, 
> 
> 
> Nice catch. I agree that it would be better if variance was defined in
> terms of n-1 rather than n. This seems like an easy fix to get started
> with SymPy if you'd like to try. There is a wiki page providing tips
> for the Development Workflow if you're not already familiar with git
> and such. 
> 
> 
> If you're interested in improving the statistics functionality in
> SymPy let me know. This has been a project of mine. 
> 
> 
> Best,
> -Matt
> 
> On Tue, Sep 13, 2011 at 10:06 PM, Srinivas <[email protected]> wrote:
>         Hi,
>           I wanted to join as a new developer of sympy, so I am
>         looking
>         through the code to get familiar with it.
>         For /sympy/sympy/statistics/
>         distributions.py, the Sample class defines the variance to be:
>         s.variance = sum([(x-mean)**2 for x in s]) / Integer(len(s))
>         
>         But, this would be the biased estimator. My question is
>         would/should
>         this class use the unbiased estimator (replacing
>         Integer(len(s)) with
>         Integer(len(s)-1))?


The so-called "unbiased estimator" (dividing by n-1) isn't necessarily
meaningful. Dividing by n at least always gives the second central
moment of the sample distribution. It's also the default in numpy (cf.
http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.std.html#numpy.std
 ), so I think we should stay consistent with that.

Besides, I don't think that the Sample class is actually usable as it
exists currently (doesn't work correctly with symbolic or non-real
arguments, computes everything up-front, ...) and I don't even
understand what its purpose is. Fixing this would be much more useful. 


-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] Statistics Variance

Reply via email to