Re: [Python-ideas] Running average and stdev in the statistics module?

2019-05-10 Thread Steven D'Aprano
On Mon, May 06, 2019 at 08:10:44PM +0300, Serge Matveenko wrote:
> On Sun, May 5, 2019 at 1:08 PM Luca Baldini  wrote:
> >
> > Hi here,
> > I wonder if the idea of adding to the statistics module a class to
> > calculate the running statistics (average and standard deviation) of a
> > generic input data stream has ever come up in the past.

[...]
> Personally, I would definitely use this in a number of places in the
> real-life code I contribute to.
> 
> The problem that I have with this idea is it's not clear how to store
> the data in an accumulator class. What about cases with different
> contexts in asyncio and/or multithreading code?

Can you give an example of the sort of thing you might want to do?


Thanks,



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Running average and stdev in the statistics module?

2019-05-06 Thread Steven D'Aprano
Hi Luca,

I'm the original author of the statistics module, and I'm very 
interested in your idea for calculating running statistics. However 
feature-freeze for 3.8 is not far away (about three weeks) so I think it 
would have to be deferred until 3.9.

But I encourage you to give some thought (either privately, or 
publicly here in this thread) about the features you want to see.



-- 
Steven
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Running average and stdev in the statistics module?

2019-05-06 Thread Michael Selik
I've often wanted a windowing function in itertools. One exists as a recipe
in the docs. If I remember correctly, one reason this was never implemented
is that the most efficient implementation changes depending on the size of
the window.

Use a deque(maxsize=n) for large windows and tuple slicing/concat for tiny
windows. I'm not sure how the tee/zip trick compares.

On Mon, May 6, 2019, 10:11 AM Serge Matveenko  wrote:

> On Sun, May 5, 2019 at 1:08 PM Luca Baldini 
> wrote:
> >
> > Hi here,
> > I wonder if the idea of adding to the statistics module a class to
> > calculate the running statistics (average and standard deviation) of a
> > generic input data stream has ever come up in the past.
> >
> > The basic idea is to do the necessary book-keeping as the data are fed
> > into the accumulator class and to be able to query the average variance
> > of the sequence at any point in time without having to loop over the
> > thing again. The obvious way to do that is well know, and described,
> > e.g., in Knuth TAOCP vol 2, 3rd edition, page 232. FWIW It is something
> > that through the years I have coded myself a myriad of times (e.g., for
> > real-time data processing)---and maybe worth considering for addition to
> > the standard library.
>
> Personally, I would definitely use this in a number of places in the
> real-life code I contribute to.
>
> The problem that I have with this idea is it's not clear how to store
> the data in an accumulator class. What about cases with different
> contexts in asyncio and/or multithreading code?
> I would say it could be useful to allow to pass a storage
> implementation from a user's code to address almost any possible
> scenario. In that case, such an accumulator class doesn't need to be a
> class at all and bother with any intermediate storage. It could be a
> number of module-level functions providing an effective algorythm
> implementation for user to be able to base on.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Running average and stdev in the statistics module?

2019-05-06 Thread Serge Matveenko
On Sun, May 5, 2019 at 1:08 PM Luca Baldini  wrote:
>
> Hi here,
> I wonder if the idea of adding to the statistics module a class to
> calculate the running statistics (average and standard deviation) of a
> generic input data stream has ever come up in the past.
>
> The basic idea is to do the necessary book-keeping as the data are fed
> into the accumulator class and to be able to query the average variance
> of the sequence at any point in time without having to loop over the
> thing again. The obvious way to do that is well know, and described,
> e.g., in Knuth TAOCP vol 2, 3rd edition, page 232. FWIW It is something
> that through the years I have coded myself a myriad of times (e.g., for
> real-time data processing)---and maybe worth considering for addition to
> the standard library.

Personally, I would definitely use this in a number of places in the
real-life code I contribute to.

The problem that I have with this idea is it's not clear how to store
the data in an accumulator class. What about cases with different
contexts in asyncio and/or multithreading code?
I would say it could be useful to allow to pass a storage
implementation from a user's code to address almost any possible
scenario. In that case, such an accumulator class doesn't need to be a
class at all and bother with any intermediate storage. It could be a
number of module-level functions providing an effective algorythm
implementation for user to be able to base on.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Running average and stdev in the statistics module?

2019-05-05 Thread Luca Baldini

Hi here,
I wonder if the idea of adding to the statistics module a class to 
calculate the running statistics (average and standard deviation) of a 
generic input data stream has ever come up in the past.


The basic idea is to do the necessary book-keeping as the data are fed 
into the accumulator class and to be able to query the average variance 
of the sequence at any point in time without having to loop over the 
thing again. The obvious way to do that is well know, and described, 
e.g., in Knuth TAOCP vol 2, 3rd edition, page 232. FWIW It is something 
that through the years I have coded myself a myriad of times (e.g., for 
real-time data processing)---and maybe worth considering for addition to 
the standard library.


For completeness, a cursory look on google brings up this fairly nice 
package

https://pypi.org/project/runstats/
but really, the core algorithm would be trivial to code in a fashion 
that works with decimal and fraction objects to be integrated into the 
statistics module. Should this spur enough interest (and assuming that 
the maintainer(s) of the module are not hostile to the idea) I'd like to 
volunteer to put together an tentative implementation.


[It's my first post on this list, so please be gentle :-)]

Luca

--
===
Luca Baldini

Universita' di Pisa
and
Istituto Nazionale di Fisica Nucleare - Sezione di Pisa
Largo Bruno Pontecorvo 3, I-56127, Pisa, ITALY.

phone  : +39 050 2214438
fax: +39 050 2214317
e-mail : luca.bald...@pi.infn.it
icq: 396247302 (Garrone)
web: http://www.df.unipi.it/~baldini
mirror : http://www.pi.infn.it/~lbaldini
===

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/