Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> writes: > I am soliciting feedback regarding the API of my statistics module: > > http://code.google.com/p/pycalcstats/ > > > Specifically the following couple of issues: > > (1) Multivariate statistics such as covariance have two obvious APIs: > > A pass the X and Y values as two separate iterable arguments, e.g.: > cov([1, 2, 3], [4, 5, 6]) > > B pass the X and Y values as a single iterable of tuples, e.g.: > cov([(1, 4), (2, 5), (3, 6)] > > I currently support both APIs. Do people prefer one, or the other, or > both? If there is a clear preference for one over the other, I may drop > support for the other. >
I don't have an informed opinion on this. > (2) Statistics text books often give formulae in terms of sums and > differences such as > > Sxx = n*Σ(x**2) - (Σx)**2 Interestingly, your Sxx is closely related to the variance: if x is a list of n numbers then Sxx == (n**2)*var(x) And more generally if x and y have the same length n, then Sxy (*) is related to the covariance Sxy == (n**2)*cov(x, y) So if you have a variance and covariance function, it would be redundant to include Sxx and Sxy. Another argument against including Sxx & co is that their definition is not universally agreed upon. For example, I have seen Sxx = Σ(x**2) - (Σx)**2/n HTH -- Arnaud (*) Here I take Sxy to be n*Σ(xy) - (Σx)(Σy), generalising from your definition of Sxx. -- http://mail.python.org/mailman/listinfo/python-list