On Fri, 16 Aug 2013 09:31:34 -0700, chris.barker wrote: >> > I am seeking comments on PEP 450, Adding a statistics module to >> > Python's > > The trick here is that numpy really is the "right" way to do this stuff.
Numpy does not have a monopoly on the correct algorithms for statistics functions, and a big, heavyweight library like numpy is overkill for many lightweight statistics tasks. One shouldn't need to turn on a nuclear reactor just to put the light on in your fridge. > I like to say: > "crunching numbers in python without numpy is like doing text processing > without using the string object" Your analogy is backwards. String objects actually aren't optimal for heavy duty text processing, because they're immutable. If you're serious about crunching vast amounts of numbers, you'll use numpy. If you're serious about crunch vast amounts of text, say for a text editor or word processor, you *won't* use strings, you'll use some sort of mutable buffer, or ropes, or some other data type. But very unlikely to use strings. > What this is really an argument for is a numpy-lite in the standard > library, which could be used to build these sorts of things on. But > that's been rejected before... "Numpy-lite". Which parts of numpy? Who maintains it? The numpy release schedule is nothing like the standard library's release schedule, so which one has to change? Or does somebody fork numpy, giving two independent code bases? What about Jython, IronPython, and other Python implementations? Even PyPy doesn't support numpy yet, and Jython and IronPython probably never will, since they're not C-based. > A few other comments: > > 1) the numpy folks have been VERY good at providing binaries for Windows > and OS-X -- easy point and click installing. > > 2) I hope we're almost there with standardizing pip and binary wheels, > at which point pip install will be painless. Yeah, right, sure it will be. I've been waiting a decade for package management on Linux to become painless, and it still isn't. There's no reason to expect pip will be more painless than aptitude or yum. But even if it is, installation of software is not just a software problem to be solved by better technology. There is also the social problem that not everyone is permitted to arbitrarily install software. I'm not just talking about security policies on the machine, but security policies in real life. People can be sacked for installing software they don't have permission to install. Machines may be locked down, users may have to submit a request before software will be installed. That may involve a security audit, legal review of licencing, strategy for full roll-back, potentially even a complete code audit. (Imagine auditing all of numpy.) Or policy may simply say, *no software from unapproved vendors* full stop. Not everyone is privileged to be permitted to install whatever software they like, when they like. Here are two organisations that make software installation requests *easy*: http://www.uhd.edu/computing/acl/SoftwareInstallationRequest.html http://www.calstatela.edu/its/services/software/ instructsoftwarerequest.php/form2.php Pip install isn't going to fix that. There are many, many people in a situation where the Python std lib is approved, usually because it comes from a vendor with a support contract (say, RedHat, Ubuntu, or Suse), but getting third-party packages like numpy approved is next to impossible. "Just install numpy" is a solution for a privileged few. > even before (2) -- pip install works fine anywhere the system is set up > to build python extensions (granted, not a given on Windows and Mac, but > pretty likely on Linux) Oh, well that's okay then -- that's three, maybe four percent of the computing world taken care of! Problem solved! Not. > -- the idea that running pip install wrote out a > lot of text (but worked!) is somehow a barrier to entry is absurd -- > anyone building their own stuff on Linux is used to that. Do you realise that not all Python programmers are used to, or able to, "build their own stuff on Linux"? [...] > All that being said -- if you do decide to do this, please use a PEP > 3118 (enhanced buffer) supporting data type (probably array.array) -- > compatibility with numpy and other packages for crunching numbers is > very nice. py> import array py> data = array.array('f', range(1000)) py> import statistics py> statistics.mean(data) 499.5 py> statistics.stdev(data) 288.8194360957494 If the data type supports the sequence protocol, it should work with my module. If it fails to work, submit a bug report, and I will fix it. > If someone decides to build a stand-alone stats package -- building it > on a ndarray-lite (PEP 3118 compatible) object would be a nice way to > go. > > > One other point -- for performance reason, is would be nice to have some > compiled code in there -- this adds incentive to put it in the stdlib -- > external packages that need compiling is what makes numpy unacceptable > to some folks. Like the decimal module, it will probably remain pure-Python for a few releases, but I hope that in the future the statistics module will gain a C-accelerated version. (Or Java-accelerated for Jython, etc.) I expect that PyPy won't need one. But because it's not really aimed at number- crunching megabytes of data, speed is not the priority. -- Steven -- http://mail.python.org/mailman/listinfo/python-list