Hi here,
I wonder if the idea of adding to the statistics module a class to calculate the running statistics (average and standard deviation) of a generic input data stream has ever come up in the past.

The basic idea is to do the necessary book-keeping as the data are fed into the accumulator class and to be able to query the average variance of the sequence at any point in time without having to loop over the thing again. The obvious way to do that is well know, and described, e.g., in Knuth TAOCP vol 2, 3rd edition, page 232. FWIW It is something that through the years I have coded myself a myriad of times (e.g., for real-time data processing)---and maybe worth considering for addition to the standard library.

For completeness, a cursory look on google brings up this fairly nice package
https://pypi.org/project/runstats/
but really, the core algorithm would be trivial to code in a fashion that works with decimal and fraction objects to be integrated into the statistics module. Should this spur enough interest (and assuming that the maintainer(s) of the module are not hostile to the idea) I'd like to volunteer to put together an tentative implementation.

[It's my first post on this list, so please be gentle :-)]

Luca

--
===============================================================================
Luca Baldini

Universita' di Pisa
and
Istituto Nazionale di Fisica Nucleare - Sezione di Pisa
Largo Bruno Pontecorvo 3, I-56127, Pisa, ITALY.

phone  : +39 050 2214438
fax    : +39 050 2214317
e-mail : luca.bald...@pi.infn.it
icq    : 396247302 (Garrone)
web    : http://www.df.unipi.it/~baldini
mirror : http://www.pi.infn.it/~lbaldini
===============================================================================

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to