Le mercredi 14 septembre 2011 à 08:47 -0500, Matthew Rocklin a écrit : > Hi Srinivas, > > > Nice catch. I agree that it would be better if variance was defined in > terms of n-1 rather than n. This seems like an easy fix to get started > with SymPy if you'd like to try. There is a wiki page providing tips > for the Development Workflow if you're not already familiar with git > and such. > > > If you're interested in improving the statistics functionality in > SymPy let me know. This has been a project of mine. > > > Best, > -Matt > > On Tue, Sep 13, 2011 at 10:06 PM, Srinivas <[email protected]> wrote: > Hi, > I wanted to join as a new developer of sympy, so I am > looking > through the code to get familiar with it. > For /sympy/sympy/statistics/ > distributions.py, the Sample class defines the variance to be: > s.variance = sum([(x-mean)**2 for x in s]) / Integer(len(s)) > > But, this would be the biased estimator. My question is > would/should > this class use the unbiased estimator (replacing > Integer(len(s)) with > Integer(len(s)-1))?
The so-called "unbiased estimator" (dividing by n-1) isn't necessarily meaningful. Dividing by n at least always gives the second central moment of the sample distribution. It's also the default in numpy (cf. http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.std.html#numpy.std ), so I think we should stay consistent with that. Besides, I don't think that the Sample class is actually usable as it exists currently (doesn't work correctly with symbolic or non-real arguments, computes everything up-front, ...) and I don't even understand what its purpose is. Fixing this would be much more useful. -- You received this message because you are subscribed to the Google Groups "sympy" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
