Re: PEP 450 Adding a statistics module to Python

Josef Pktd Sat, 17 Aug 2013 05:17:56 -0700

I think the install issues in the pep are exaggerated, and are in my opinion 
not a sufficient reason to get something into the standard lib.


google appengine includes numpy
https://developers.google.com/appengine/docs/python/tools/libraries27

I'm on Windows, and installing numpy and scipy are just binary installers that 
install without problems.
There are free binary distributions (for Windows and Ubuntu) that include all 
the main scientific applications. One-click installer on Windows
http://code.google.com/p/pythonxy/wiki/Welcome
http://code.google.com/p/winpython/

How many Linux distributions don't include numpy? (I have no idea.)

For commercial support Enthought's and Continuum's distributions include all 
the main packages.

I think having basic descriptive statistics is still useful in a basic python 
installation. Similarly, almost all the descriptive statistics moved from 
scipy.stats to numpy.

However, what is the longterm scope of this supposed to be?

I think working with pure python is interesting for educational purposes
http://www.greenteapress.com/thinkstats/
but I don't think it will get very far for more extensive uses. Soon you will 
need some linear algebra (numpy.linalg and scipy.linalg) and special functions 
(scipy.special).

You can reimplement them, but what's the point to duplicate them in the 
standard lib?

For example:

t test: which versions? one-sample, two-sample, paired and unpaired, with and 
without homogeneous variances, with 3 alternative hypothesis.

If we have t test, shouldn't we also have ANOVA when we want to compare more 
than two samples?

...

If the Python versions that are not using a C backend need a statistics package 
and partial numpy replacement, then I don't think it needs to be in the CPython 
lib.


If think the "nuclear reactor" analogy is in my opinion misplaced.

A python implementation of statistics is a bycycle, numpy is a car, and if you 
need some heavier lifting in statistics or machine learning, then the trucks 
are scipy, scikit-learn and statsmodels (and pandas for the data handling).
And rpy for things that are not directly available in python.


I'm one of the maintainers for scipy.stats and for statsmodels.

We have a similar problem of deciding on the boundaries and scope of numpy, 
scipy.stats, pandas, patsy, statsmodels and scikit-learn. There is some overlap 
of functionality where the purpose or use cases are different, but in general 
we try to avoid too much duplication.


https://pypi.python.org/pypi/statsmodels
https://pypi.python.org/pypi/pandas
https://pypi.python.org/pypi/patsy  (R like formulas)
https://pypi.python.org/pypi/scikit-learn


Josef
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: PEP 450 Adding a statistics module to Python

Reply via email to