[issue18606] Add statistics module to standard library

2013-10-19 Thread Larry Hastings

Larry Hastings added the comment:

Mr. D'Aprano emailed me about getting this in for alpha 4.  Since nobody else 
stepped up, I volunteered to check it in for him.  There were some minor ReST 
errors in statistics.rst but I fixed 'em.

--
nosy: +larry

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-19 Thread Guido van Rossum

Guido van Rossum added the comment:

Thanks Larry!

On Sat, Oct 19, 2013 at 11:32 AM, Larry Hastings rep...@bugs.python.orgwrote:


 Larry Hastings added the comment:

 Mr. D'Aprano emailed me about getting this in for alpha 4.  Since nobody
 else stepped up, I volunteered to check it in for him.  There were some
 minor ReST errors in statistics.rst but I fixed 'em.

 --
 nosy: +larry

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue18606
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-19 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 685e044bed5e by Larry Hastings in branch 'default':
Issue #18606: Add the new statistics module (PEP 450).  Contributed
http://hg.python.org/cpython/rev/685e044bed5e

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-19 Thread Larry Hastings

Larry Hastings added the comment:

Checked in.  Thanks, Mr. D'Aprano!

--
resolution:  - fixed
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-19 Thread Georg Brandl

Georg Brandl added the comment:

 I'm sorry if I stepped on your toes, but I didn't ignore your patch. If I've 
 failed to follow the right procedure, it is due to inexperience, not malice. 
 You yourself suggested it was only a temporary version just good enough to 
 get the module committed, and in any case, it was already out of date since 
 I've made sum private.

No problem, just hate to see work being done twice. :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-18 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Here is the updated version which I hope is not too late for alpha 4. Main 
changes:

* sum is now private

* docstrings have been simplified and shrunk somewhat

* I have a draft .rst file, however I'm having trouble getting Sphinx working 
on my system and I have no idea whether the reST is working.

--
Added file: http://bugs.python.org/file32207/statistics.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-18 Thread Georg Brandl

Georg Brandl added the comment:

The rst file is missing from your patch.

I already posted a patch with statistics.rst five days ago.  I have no idea why 
you ignored it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-18 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Georg Brandl wrote:
 The rst file is missing from your patch.

Oops! Sorry about that. Fixed now.

 I already posted a patch with statistics.rst five days ago.  
 I have no idea why you ignored it.

I'm sorry if I stepped on your toes, but I didn't ignore your patch. If I've 
failed to follow the right procedure, it is due to inexperience, not malice. 
You yourself suggested it was only a temporary version just good enough to get 
the module committed, and in any case, it was already out of date since I've 
made sum private.

--
Added file: http://bugs.python.org/file32208/statistics.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-14 Thread Nick Coghlan

Nick Coghlan added the comment:

+0 for starting with _sum as private and +1 for getting this initial
version checked in for alpha 4.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-13 Thread Nick Coghlan

Nick Coghlan added the comment:

Are the ReST docs the only missing piece here? It would be nice to have this 
included in alpha 4 next weekend (although the real deadline is beta 1 on 
November 24).

--
nosy: +ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-13 Thread Georg Brandl

Georg Brandl added the comment:

In the attached patch I took the docstrings, put them in statistics.rst and 
reformatted/marked-up them according to our guidelines.  This should at least 
be good enough to make this committable.

I also modified statistics.py very slightly; I removed trailing spaces and 
added Function/class in the third table in the module docstring.

--
nosy: +georg.brandl
Added file: http://bugs.python.org/file32083/statistics_combined_withdocs.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-13 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On Sun, Oct 13, 2013 at 09:21:13AM +, Nick Coghlan wrote:
 
 Nick Coghlan added the comment:
 
 Are the ReST docs the only missing piece here? 

As far as I know, the only blocker is that the ReST docs are missing. 
Also Guido would like to see the docstrings be a little smaller (or 
perhaps even a lot smaller), and that will happen at the same time.

The implementation of statistics.sum needs to be a bit faster, but 
that's also coming. I presume that won't be a blocker.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-13 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Oscar Benjamin has just made a proposal to me off-list that has *almost* 
convinced me to make statistics.sum a private implementation detail, at 
least for the 3.4 release. I won't go into detail about Oscar's 
proposal, but it has caused me to rethink all the arguments for making 
sum public.

Given that the PEP concludes that sum ought to be public, is it 
appropriate to defer that part of it until 3.5 without updating the PEP? 
I'd like to shift sum - _sum for 3.4, then if Oscar's ideas don't pan 
out, in 3.5 make sum public.

(None of this will effect the public interface for mean, variance, etc.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-13 Thread Tim Peters

Tim Peters added the comment:

Do what's best for the future of the module.  A PEP is more of a starting point 
than a constraint, especially for implementation details.  And making a private 
thing public later is one ginormous whale of a lot easier than trying to remove 
a public thing later.  Practicality beats purity once again ;-)

--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-10-13 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I think this should get checked in so that people can start interacting with 
it.  The docstrings and whatnot can get tweaked later.

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-09-09 Thread Stefan Krah

Changes by Stefan Krah stefan-use...@bytereef.org:


--
nosy: +skrah

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-09-08 Thread Guido van Rossum

Guido van Rossum added the comment:

Here's a combined patch. Hopefully it will code review properly.

--
nosy: +gvanrossum
Added file: http://bugs.python.org/file31680/statistics_combined.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-09-08 Thread Guido van Rossum

Guido van Rossum added the comment:

Nice docstrings, but those aren't automatically included in the Doc tree.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-27 Thread janzert

janzert added the comment:

Seems that the discussion is now down to implementation issues and the PEP is 
at the point of needing to ask python-dev for a PEP dictator?

--
nosy: +janzert

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-27 Thread Oscar Benjamin

Oscar Benjamin added the comment:

On Aug 28, 2013 1:43 AM, janzert rep...@bugs.python.org wrote:

 Seems that the discussion is now down to implementation issues and the
PEP is at the point of needing to ask python-dev for a PEP dictator?

I would say so. AFAICT Steven has addressed all of the issues that have
been raised. I've read through the module in full and I'm happy with the
API/specification exactly as it now is (including the sum function since
the last patch).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-26 Thread Steven D'Aprano

Steven D'Aprano added the comment:

I have changed the algorithm for statistics.sum to use long integer summation 
of numerator/denominator pairs.

This removes the concerns Mark raised about the float addition requiring 
correct rounding. Unless I've missed something, this now means that 
statistics.sum is now exact, including for floats and Decimals.

The cost is that stats.sum(ints) is a little slower, sum of Decimals is a lot 
slower (ouch!) but sum of floats is faster and of Fractions a lot faster. 
(Changes are relative to my original implementation.) In my testing, 
algorithmic complexity is O(N) on the number of items, at least up to 10 
million items.

--
Added file: http://bugs.python.org/file31473/statistics_newsum.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-22 Thread Oscar Benjamin

Oscar Benjamin added the comment:

On 22 August 2013 03:43, Steven D'Aprano rep...@bugs.python.org wrote:

 If Oscar is willing, I'd like to discuss some of his ideas off-list, but that 
 may take some time.

I am willing and it will take time.

I've started reading the paper that Raymond Hettinger references for
the algorithm used in his accurate float sum recipe. I'm not sure why
yet but the algorithm is apparently provably exact only for binary
radix floats so isn't appropriate for decimals. It does seem to give
*very* accurate results for decimals though so I suspect the issue is
just about cases that are on the cusp of the rounding mode. In any
case the paper cites a previous work that gives an algorithm that
apparently works for floating point types with arbitrary radix and
exact rounding; it would be good for that to live somewhere in Python
but I haven't had a chance to look at the paper yet.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-21 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 20/08/13 22:43, Mark Dickinson wrote:

 I agree with Oscar about sum for decimal.Decimal.  The *ideal* sum for 
 Decimal instances would return the correctly rounded result (i.e., the exact 
 result, rounded to the current context just once using the current rounding 
 mode).  It seems wrong to give a guarantee of behaviour that's in conflict 
 with this ideal.

Okay, I know when I'm beaten :-)

Documentation will no longer make reference to honouring the context, as 
currently stated, and specific example shown will be dropped. Patch to follow. 
Changing the actual implementation of sum will follow later. If Oscar is 
willing, I'd like to discuss some of his ideas off-list, but that may take some 
time.

What else is needed before I can ask for a decision on the PEP?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-20 Thread Mark Dickinson

Mark Dickinson added the comment:

I agree with Oscar about sum for decimal.Decimal.  The *ideal* sum for Decimal 
instances would return the correctly rounded result (i.e., the exact result, 
rounded to the current context just once using the current rounding mode).  It 
seems wrong to give a guarantee of behaviour that's in conflict with this ideal.

IEEE 754 recommends a 'sum' operation (in section 9.4, amongst other reduction 
operations), but doesn't go so far as to either require or recommend that the 
result be correctly rounded.  Instead, it says Implementations may associate 
in any order or evaluate in any wider format., and then later on, Numerical 
results [...] may differ among implementations due to the precision of 
intermediates and the order of evaluation.  It certainly *doesn't* specify 
that the results should be as though the context precision and rounding mode 
were used for every individual addition.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-19 Thread Oscar Benjamin

Oscar Benjamin added the comment:

I've just checked over the new patch and it all looks good to me apart
from one quibble.

It is documented that statistics.sum() will respect rounding errors
due to decimal context (returning the same result that sum() would). I
would prefer it if statistics.sum would use compensated summation with
Decimals since in my view they are a floating point number
representation and are subject to arithmetic rounding error in the
same way as floats. I expect that the implementation of sum() will
change but it would be good to at least avoid documenting this IMO
undesirable behaviour.

So with the current implementation I can do:

 from decimal import Decimal as D, localcontext, Context, ROUND_DOWN
 data = [D(0.1375), D(0.2108), D(0.3061), D(0.0419)]
 print(statistics.variance(data))
0.0125290958333
 with localcontext() as ctx:
... ctx.prec = 2
... ctx.rounding = ROUND_DOWN
... print(statistics.variance(data))
...
0.010

The final result is not accurate to 2 d.p. rounded down. This is
because the decimal context has affected all intermediate computations
not just the final result. Why would anyone prefer this behaviour over
an implementation that could compensate for rounding errors and return
a more accurate result?

If statistics.sum and statistics.add_partial are modified in such a
way that they use the same compensated algorithm for Decimals as they
would for floats then you can have the following:

 statistics.sum([D('-1e50'), D('1'), D('1e50')])
Decimal('1')

whereas it currently does:

 statistics.sum([D('-1e50'), D('1'), D('1e50')])
Decimal('0E+23')
 statistics.sum([D('-1e50'), D('1'), D('1e50')]) == 0
True

It still doesn't fix the variance calculation but I'm not sure exactly
how to do better than the current implementation for that. Either way
though I don't think the current behaviour should be a documented
guarantee. The meaning of honouring the context implies using a
specific sum algorithm, since an alternative algorithm would give a
different result and I don't think you should constrain yourself in
that way.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-19 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 19/08/13 23:15, Oscar Benjamin wrote:

 So with the current implementation I can do:

 from decimal import Decimal as D, localcontext, Context, ROUND_DOWN
 data = [D(0.1375), D(0.2108), D(0.3061), D(0.0419)]
 print(statistics.variance(data))
 0.0125290958333
 with localcontext() as ctx:
 ... ctx.prec = 2
 ... ctx.rounding = ROUND_DOWN
 ... print(statistics.variance(data))
 ...
 0.010

 The final result is not accurate to 2 d.p. rounded down. This is
 because the decimal context has affected all intermediate computations
 not just the final result.

Yes. But that's the whole point of setting the context to always round down. If 
summation didn't always round down, it would be a bug.

If you set the precision to a higher value, you can avoid the need for 
compensated summation. I'm not prepared to pick and choose which contexts I'll 
honour. If I honour those with a high precision, I'll honour those with a low 
precision too. I'm not going to check the context, and if it is too low 
(according to whom?) set it higher.

Why would anyone prefer this behaviour over
 an implementation that could compensate for rounding errors and return
 a more accurate result?

Because that's what the Decimal standard requires (as I understand it), and 
besides you might be trying to match calculations on some machine with a lower 
precision, or different rounding modes. Say, a pocket calculator, or a Cray, or 
something. Or demonstrating why rounding matters.

Perhaps it will cause less confusion if I add an example to show a use for 
higher precision as well.

 If statistics.sum and statistics.add_partial are modified in such a
 way that they use the same compensated algorithm for Decimals as they
 would for floats then you can have the following:

 statistics.sum([D('-1e50'), D('1'), D('1e50')])
 Decimal('1')

statistics.sum can already do that:

py with localcontext() as ctx:
... ctx.prec = 50
... x = statistics.sum([D('-1e50'), D('1'), D('1e50')])
...
py x
Decimal('1')

I think the current behaviour is the right thing to do, but I appreciate the 
points you raise. I'd love to hear from someone who understands the Decimal 
module better than I do and can confirm that the current behaviour is in the 
spirit of the Decimal module.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-19 Thread Oscar Benjamin

Oscar Benjamin added the comment:

On 19 August 2013 17:35, Steven D'Aprano rep...@bugs.python.org wrote:

 Steven D'Aprano added the comment:

 On 19/08/13 23:15, Oscar Benjamin wrote:

 The final result is not accurate to 2 d.p. rounded down. This is
 because the decimal context has affected all intermediate computations
 not just the final result.

 Yes. But that's the whole point of setting the context to always round down. 
 If summation didn't always round down, it would be a bug.

If individual binary summation (d1 + d2) didn't round down then that
would be a bug.

 If you set the precision to a higher value, you can avoid the need for 
 compensated summation. I'm not prepared to pick and choose which contexts 
 I'll honour. If I honour those with a high precision, I'll honour those with 
 a low precision too. I'm not going to check the context, and if it is too 
 low (according to whom?) set it higher.

I often write functions like this:

def compute_stuff(x):
with localcontext() as ctx:
 ctx.prec +=2
 y = ... # Compute in higher precision
return +y  # __pos__ reverts to the default precision

The final result is rounded according to the default context but the
intermediate computation is performed in such a way that the final
result is (hopefully) correct within its context. I'm not proposing
that you do that, just that you don't commit to respecting inaccurate
results.

Why would anyone prefer this behaviour over
 an implementation that could compensate for rounding errors and return
 a more accurate result?

 Because that's what the Decimal standard requires (as I understand it), and 
 besides you might be trying to match calculations on some machine with a 
 lower precision, or different rounding modes. Say, a pocket calculator, or a 
 Cray, or something. Or demonstrating why rounding matters.

No that's not what the Decimal standard requires. Okay I haven't fully
read it but I am familiar with these standards and I've read a good
bit of IEEE-754. The standard places constrainst on low-level
arithmetic operations that you as an implementer of high-level
algorithms can use to ensure that your code is accurate.

Following your reasoning above I should say that math.fsum and your
statistics.sum are both in violation of IEEE-754 since
fsum([a, b, c, d, e])
is not equivalent to
a+b)+c)+d)+e)
under the current rounding scheme. They are not in violation of the
standard: both functions use the guarantees of the standard to
guarantee their own accuracy. Both go to some lengths to avoid
producing output with the rounding errors that sum() would produce.

 I think the current behaviour is the right thing to do, but I appreciate the 
 points you raise. I'd love to hear from someone who understands the Decimal 
 module better than I do and can confirm that the current behaviour is in the 
 spirit of the Decimal module.

I use the Decimal module for multi-precision real arithmetic. That may
not be the typical use-case but to me Decimal is a floating point type
just like float. Precisely the same reasoning that leads to fsum
applies to Decimal just as it does to float

(BTW I've posted on Rayomnd Hettinger's recipe a modification that
might make it work for Decimal but no reply yet.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-18 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Since I can't respond to the reviews, here's a revised patch. Summary of major 
changes:

- median.* functions are now median_*
- mode now only returns a single value
- better integrate tests with Python regression suite
- cleanup tests as per Ezio's suggestions
- remove unnecessary metadata and change licence

--
Added file: http://bugs.python.org/file31354/statistics.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-18 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 15/08/13 22:58, ezio.melo...@gmail.com wrote:
 http://bugs.python.org/review/18606/diff/8927/Lib/statistics.py#newcode277
 Lib/statistics.py:277: assert isinstance(x, float) and
 isinstance(partials, list)
 Is this a good idea?

I think so add_partials is internal/private, and so I don't have to worry about 
the caller providing wrong arguments, say a non-float. But I want some testing 
to detect coding errors. Using assert for this sort of internal pre-condition 
is exactly what assert is designed for.


 http://bugs.python.org/review/18606/diff/8927/Lib/test/test_statistics.py#newcode144
 Lib/test/test_statistics.py:144: assert data != sorted(data)
 Why not assertNotEqual?

I use bare asserts for testing code logic, even if the code is test code. So if 
I use self.assertSpam(...) then I'm performing a unit test of the module being 
tested. If I use a bare assert, I'm asserting something about the test logic 
itself.


 http://bugs.python.org/review/18606/diff/8927/Lib/test/test_statistics_approx.py
 File Lib/test/test_statistics_approx.py (right):

 http://bugs.python.org/review/18606/diff/8927/Lib/test/test_statistics_approx.py#newcode1
 Lib/test/test_statistics_approx.py:1: Numeric approximated equal
 comparisons and unit testing.
 Do I understand correctly that this is just an helper module used in
 test_statistics and that it doesn't actually test anything from the
 statistics module?

Correct.


 http://bugs.python.org/review/18606/diff/8927/Lib/test/test_statistics_approx.py#newcode137
 Lib/test/test_statistics_approx.py:137: # and avoid using
 TestCase.almost_equal, because it sucks
 Could you elaborate on this?

Ah, I misspelled TestCase.AlmostEqual.

- Using round() to test for equal-to-some-tolerance is IMO quite an 
idiosyncratic way of doing approx-equality tests. I've never seen anyone do it 
that way before. It surprises me.

- It's easy to think that ``places`` means significant figures, not decimal 
places.

- There's now a delta argument that is the same as my absolute error tolerance 
``tol``, but no relative error argument.

- You can't set a  per-instance error tolerance.


 http://bugs.python.org/review/18606/diff/8927/Lib/test/test_statistics_approx.py#newcode241
 Lib/test/test_statistics_approx.py:241: assert len(args1) == len(args2)
 Why not assertEqual?

As above, I use bare asserts to test the test logic, and assertSpam methods to 
perform the test. In this case, I'm confirming that I haven't created dodgy 
test data.


 http://bugs.python.org/review/18606/diff/8927/Lib/test/test_statistics_approx.py#newcode255
 Lib/test/test_statistics_approx.py:255: self.assertTrue(approx_equal(b,
 a, tol=0, rel=rel))
 Why not assertApproxEqual?

Because I'm testing the approx_equal function. I can't use assertApproxEqual to 
test its own internals.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-18 Thread Antoine Pitrou

Antoine Pitrou added the comment:

A couple of comments about the test suite:
- I would like to see PEP8 test names, i.e. test_foo_and_bar rather than 
testFooAndBar
- I don't think we need two separate test modules, it makes things more 
confusing

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-18 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Merged two test suites into one, and PEP-ified the test names testSpam - 
test_spam.

--
Added file: http://bugs.python.org/file31366/test_statistics.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-18 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Patch file for the stats module alone, without the tests.

--
Added file: http://bugs.python.org/file31367/statistics.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-17 Thread Steven D'Aprano

Steven D'Aprano added the comment:

To anyone waiting for me to respond to rietveld reviews, I'm trying, I really 
am, but I keep getting a django traceback.

This seems to have been reported before, three months ago:

http://psf.upfronthosting.co.za/roundup/meta/issue517

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-14 Thread Mark Dickinson

Mark Dickinson added the comment:

Steven:  were you planning to start a discussion thread on python-dev for PEP 
450?  I see that there's some activity on python-list and on python-ideas, but 
I think most core devs would expect the main discussions to happen on the 
python-dev mailing list.  (And I suspect that many core devs don't pay 
attention to python-list very much.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-14 Thread Ethan Furman

Changes by Ethan Furman et...@stoneleaf.us:


--
nosy: +ethan.furman

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-13 Thread Steven D'Aprano

Steven D'Aprano added the comment:

Attached is a patch containing the statistics reference implementation, after 
considering feedback given here and on python-ideas, and tests.

--
keywords: +patch
Added file: http://bugs.python.org/file31286/statistics.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-13 Thread Terry J. Reedy

Terry J. Reedy added the comment:

Revised patch with tests modified to pass, as described in pydev post.

1. test. added to test_statistics_approx import

2. delete test_main and change ending of both to
if __name__ == '__main__':
  unittest.main()

--
nosy: +terry.reedy
Added file: http://bugs.python.org/file31288/statistics2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Mark Dickinson

Mark Dickinson added the comment:

About the implementation of sum: it's worth noting that the algorithm you're 
using for floats depends on correct rounding of addition and subtraction, and 
that that's not guaranteed.  See the existing test (testFsum) in test_math for 
more information, and note that that test is skipped on machines that don't do 
correct rounding.

This isn't an uncommon problem:  last time I looked, most 32-bit Linux systems 
had problems with double rounding, thanks to evaluating first to 64-bit 
precision using the x87 FPU, and then rounding to 53-bit precision as usual.  
(Python builds on 64-bit Linux tend to use the SSE2 instructions in preference 
to the x87, so don't suffer from this problem.)

Steven: any thoughts about how to deal with this?  Options are (1) just ignore 
the problem and hope no-one runs into it, (2) document it / warn about it, (3) 
try to fix it.  Fixing it would be reasonably easy for a C implementation (with 
access to the FPU control word, in the same way that our float-string 
conversion already does), but not so easy in Python without switching algorithm 
altogether.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Mark Dickinson

Mark Dickinson added the comment:

From the code:

# Also, like all dunder methods, we should call
# __float__ on the class, not the instance.

Why?  I've never encountered this recommendation before.  x.__float__() would 
be clearer, IMO.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Mark Dickinson

Mark Dickinson added the comment:

 Why?  I've never encountered this recommendation before.  x.__float__()  
 would be clearer, IMO.

Hmm;  it would be better if I engaged by brain before commenting.  I guess the 
point is that type(x).__float__(x) better matches the behaviour of the builtin 
float:


 class A:
... def __float__(self): return 42.0
... 
 a = A()
 a.__float__ = lambda: 1729.0
 
 float(a)
42.0
 a.__float__()
1729.0
 type(a).__float__(a)
42.0


When you get around to tests, it would be nice to have a test for this 
behaviour, just so that someone who comes along and wonders the code is written 
this way gets an immediate test failure when they try to incorrectly simplify 
it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Mark Dickinson

Mark Dickinson added the comment:

(We don't seem to care too much about the distinction in general, though:  
there are a good few places in the std. lib. where obj.__index__() is used 
instead of the more correct type(obj).__index__(obj).)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 09/08/13 21:49, Oscar Benjamin wrote:

 I think that the argument `m` to variance, pvariance, stdev and pstdev
 should be renamed to `mu` for pvariance/pstdev and `xbar` for
 variance/stdev. The doc-strings should carefully distinguish that `mu`
 is the true/population mean and `xbar` is the estimated/sample mean
 and refer to this difference between the function variants.

Good thinking, and I agree.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 12/08/13 19:21, Mark Dickinson wrote:

 About the implementation of sum: it's worth noting that the algorithm you're 
 using for floats depends on correct rounding of addition and subtraction, and 
 that that's not guaranteed.
[...]
 Steven: any thoughts about how to deal with this?  Options are (1) just 
 ignore the problem and hope no-one runs into it, (2) document it / warn about 
 it, (3) try to fix it.  Fixing it would be reasonably easy for a C 
 implementation (with access to the FPU control word, in the same way that our 
 float-string conversion already does), but not so easy in Python without 
 switching algorithm altogether.

Document it and hope :-)

add_partial is no longer documented as a public function, so I'm open to 
switching algorithms in the future.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Mark Dickinson

Mark Dickinson added the comment:

Okay, that works.  I agree that not documenting add_partial is probably a good 
plan.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-12 Thread Oscar Benjamin

Oscar Benjamin added the comment:

On 12 August 2013 20:20, Steven D'Aprano rep...@bugs.python.org wrote:
 On 12/08/13 19:21, Mark Dickinson wrote:
 About the implementation of sum:
 add_partial is no longer documented as a public function, so I'm open to 
 switching algorithms in the future.

Along similar lines it might be good to remove the doc-test for using
decimal.ROUND_DOWN. I can't see any good reason for anyone to want
that behaviour when e.g. computing the mean() whereas I can see
reasons for wanting to reduce rounding error for decimal in
statistics.sum. It might be a good idea not to tie yourself to the
guarantee implied by that test.

I tried an alternative implementation of sum() that can also reduce
rounding error with decimals but it failed that test (by making the
result more accurate). Here's the sum() I wrote:

def sum(data, start=0):

if not isinstance(start, numbers.Number):
raise TypeError('sum only accepts numbers')

inexact_types = (float, complex, decimal.Decimal)
def isexact(num):
return not isinstance(num, inexact_types)

if isexact(start):
exact_total, inexact_total = start, 0
else:
exact_total, inexact_total = 0, start

carrybits = 0

for x in data:
if isexact(x):
exact_total = exact_total + x
else:
new_inexact_total = inexact_total + (x + carrybits)
carrybits = -(((new_inexact_total - inexact_total) - x) - carrybits)
inexact_total = new_inexact_total

return (exact_total + inexact_total) + carrybits

It is more accurate for e.g. the following:
nums = [decimal.Decimal(10 ** n) for n in range(50)]
nums += [-n for n in reversed(nums)]
assert sum(nums) == 0

However there will also be other situations where it is less accurate such as
print(sum([-1e30, +1e60, 1, 3, -1e60, 1e30]))
so it may not be suitable as-is.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-09 Thread Oscar Benjamin

Oscar Benjamin added the comment:

One small point:

I think that the argument `m` to variance, pvariance, stdev and pstdev
should be renamed to `mu` for pvariance/pstdev and `xbar` for
variance/stdev. The doc-strings should carefully distinguish that `mu`
is the true/population mean and `xbar` is the estimated/sample mean
and refer to this difference between the function variants.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-09 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 31/07/13 20:23, Antoine Pitrou added the comment:

 I suppose you should write a PEP for the module inclusion proposal

Done.

http://www.python.org/dev/peps/pep-0450/

I hope to have an updated reference implementation, plus unittests, up later 
today or tomorrow.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-08 Thread Stephen J. Turnbull

Stephen J. Turnbull added the comment:

A few small comments and nits.

1. I'm with the author on the question of a sum function in this module.  The 
arguments that builtin sum isn't accurate enough, and neither is math.fsum for 
cases where all data is of infinite precision, are enough for me.

2. A general percentile function should be high on the list of next additions.

A substantive question:

3. Can't add_partial be used in the one-pass algorithms?

Several typos and suggested style tweaks:

4. I would find the summary more readable if grouped by function:
add_partial, sum, StatisticsError; mean, median, mode; pstdev, pvariance, 
stdev, variance.  Maybe I'd like it better if the utilities came last.  IMO 
YMMV, of course.

5. In the big comment in add_partial, the inner loop is mentioned.  Indeed 
this is the inner loop in statistics.sum, but there's only one loop in 
add_partial.

6. In the Limitations section of sum's docstring it says these limitations may 
change.  Is these limitations may be relaxed what is meant?  I would hope 
so, but the current phrasing makes me nervous.

7. In sum, there are two comments referring to the construct 
type(total).__float__(total), with the first being a forward reference to the 
second.  I would find a single comment above the isinstance(total, float) 
test more readable.  Eg,


First, accumulate a non-float sum. Until we find a float, we keep adding.
If we find a float, we exit this loop, convert the partial sum to float, and 
continue with the float code below. Non-floats are converted to float with 
'type(x).__float__(x)'. Don't call float() directly, as that converts strings 
and we don't want that. Also, like all dunder methods, we should call __float__ 
on the class, not the instance.


8. The docstrings for mean and variance say they are unbiased.  This depends on 
the strong assumption of a representative (typically i.i.d.) sample.  I think 
this should be mentioned.

9. Several docstrings say this function should be used when   In fact the 
choice of which function to use is somewhat delicate.  My personal preference 
would be to use may rather than should.

10. In several of the mode functions, the value is a sorted sequence.  The sort 
key should be specified, because it could be the data value or the score.

--
nosy: +sjt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-06 Thread Oscar Benjamin

Changes by Oscar Benjamin oscar.j.benja...@gmail.com:


--
nosy: +oscarbenjamin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-05 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 03/08/13 13:22, Alexander Belopolsky wrote:

 Alexander Belopolsky added the comment:

 The implementation of median and mode families of functions as classes is 
 clever,

So long as it is not too clever.

 but I am not sure it is a good idea to return something other than an 
 instance of the class from __new__().

Returning foreign instances is supported behaviour for __new__. (If the object 
returned from __new__ is not an instance, __init__ is not called.) I believe 
the current implementation is reasonable and prefer to keep it. If I use the 
traditional implementation, there will only be one instance, with no state, 
only methods. That's a rather poor excuse for an instance, and a class already 
is a singleton object with methods and (in this case) no state, so creating an 
instance as well adds nothing.

I will change the implementation if the consensus among senior devs is against 
it, but would prefer not to.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-05 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
nosy: +mark.dickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-05 Thread Mark Dickinson

Mark Dickinson added the comment:

I too find the use of a class that'll never be instantiated peculiar.

As you say, there's no state to be stored.  So why not simply have separate 
functions `median`, `median_low`, `median_high`, `median_grouped`, etc.?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-05 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 06/08/13 03:08, Mark Dickinson wrote:

 I too find the use of a class that'll never be instantiated peculiar.

I'll accept unusual, but not peculiar. It's an obvious extension to classes 
being first-class objects. We use classes as objects very frequently, we call 
methods on classes directly (e.g. int.fromhex). This is just a trivial 
variation where I am using a class-as-object as a function.

But if this is really going to be a sticking point, I can avoid using a class. 
I'll make median a plain function. Will that be acceptable?

 As you say, there's no state to be stored.  So why not simply have separate 
 functions `median`, `median_low`, `median_high`, `median_grouped`, etc.?

Why have a pseudo-namespace median_* when we could have a real namespace 
median.* ?

I discussed my reasons for this here:
http://mail.python.org/pipermail/python-ideas/2013-August/022612.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-05 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

On Mon, Aug 5, 2013 at 2:14 PM, Steven D'Aprano rep...@bugs.python.orgwrote:

  As you say, there's no state to be stored.  So why not simply have
 separate functions `median`, `median_low`, `median_high`, `median_grouped`,
 etc.?

 Why have a pseudo-namespace median_* when we could have a real namespace
 median.* ?

I am with Steven on this one.  Note that these functions are expected to be
used interactively and with standard US keyboards . is much easier to
type than _.

My only objection is to having a class xyz such that isinstance(xyz(..),
xyz) is false.  While this works with CPython, it may present problems for
other implementations.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-05 Thread Mark Dickinson

Mark Dickinson added the comment:

 My only objection is to having a class xyz such that isinstance(xyz(..),
 xyz) is false.

Yep.  Use a set of functions (median, median_low);  use an instance of a class 
as Alexander describes;  use a single median function that takes an optional 
method parameter;  create a statistics.median subpackage and put the various 
median functions in that.  Any of those options are fairly standard, 
unsurprising, and could reasonably be defended.

But having `median` be a class whose `__new__` returns a float really *is* 
nonstandard and peculiar.  There's just no need for such perversity in what 
should be a straightforward and uncomplicated module.  Special cases aren't 
special enough to break the rules and all that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-04 Thread Daniel Stutzbach

Daniel Stutzbach added the comment:

As the person originally trying to take the mean of timedelta objects, I'm 
personally fine with the workaround of:

py m = statistics.mean([x.total_seconds() for x in data])
py td(seconds=m)
datetime.timedelta(2, 43200)

At the time I was trying to take the mean of timedelta objects, even the 
total_seconds() method did not exist in the version of Python I was using.

On the flip side, wouldn't sum() work on timedelta objects if you simply 
removed the isinstance(start, numbers.Number) check?

--
nosy: +stutzbach

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-04 Thread Ronald Oussoren

Ronald Oussoren added the comment:

As noted before statistics.sum seems to have the same functionality as 
math.fsum. 

Statistics.add_partial, which does the majority of the work, also references 
the same cookbook recipe as the math.fsum documentation.

IMHO statistics.sum should be removed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-03 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 03/08/13 13:02, Alexander Belopolsky wrote:

 Alexander Belopolsky added the comment:

 Is there a reason why there is no review link?  Could it be because the 
 file is uploaded as is rather than as a patch?

I cannot answer that question, sorry.

 In any case, I have a question about this code in sum:

  # Convert running total to a float. See comment below for
  # why we do it this way.
  total = type(total).__float__(total)

 The comment below says:

  # Don't call float() directly, as that converts strings and we
  # don't want that. Also, like all dunder methods, we should call
  # __float__ on the class, not the instance.
  x = type(x).__float__(x)

 but this reason does not apply to total that cannot be a string unless you 
 add instances of a really weird class in which case all bets are off and the 
 dunder method won't help much.

My reasoning was that total may be a string if the start parameter is a string, 
but of course I explicitly check the type of start. So I think you are right.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-03 Thread Vajrasky Kok

Vajrasky Kok added the comment:

Is there a reason why there is no 'review' link?  Could it be because the file 
is uploaded as is rather than as a patch?

I think I can answer this question. The answer is yes. You can have review 
only if you use diff not raw file.

The original poster, Steven D'Aprano, uploaded the raw file instead of diff. To 
upload the new file as a diff, (assuming he is using mercurial) he can do 
something like this:

hg add Lib/statistics.py
hg diff Lib/statistics.py  /tmp/statistics_diff.patch

Then he can upload the statistics_diff.patch.

Of course, this is just my hypothetical guess.

--
nosy: +vajrasky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-03 Thread Tshepang Lekhonkhobe

Changes by Tshepang Lekhonkhobe tshep...@gmail.com:


--
nosy: +tshepang

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-03 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

Here is the use-case that was presented to support adding additional operations 
on timedelta objects:


I'm conducting a series of observation experiments where I
measure the duration of an event.  I then want to do various
statistical analysis such as computing the mean, median,
etc.  Originally, I tried using standard functions such as
lmean from the stats.py package.  However, these sorts of
functions divide by a float at the end, causing them to fail
on timedelta objects.  Thus, I have to either write my own
special functions, or convert the timedelta objects to
integers first (then convert them back afterwards).
  (Daniel Stutzbach, in msg26267 on issue1289118.)

The proposed statistics module does not support this use case:

 mean([timedelta(1)])
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Users/sasha/Work/cpython-ro/Lib/statistics.py, line 387, in mean
total = sum(data)
  File /Users/sasha/Work/cpython-ro/Lib/statistics.py, line 223, in sum
total += x
TypeError: unsupported operand type(s) for +=: 'int' and 'datetime.timedelta'
 sum([timedelta(1)], timedelta(0))
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Users/sasha/Work/cpython-ro/Lib/statistics.py, line 210, in sum
raise TypeError('sum only accepts numbers')
TypeError: sum only accepts numbers

--
nosy: +agthorr

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-03 Thread Steven D'Aprano

Steven D'Aprano added the comment:

On 04/08/13 05:31, Alexander Belopolsky wrote:

 Alexander Belopolsky added the comment:

 Here is the use-case that was presented to support adding additional 
 operations on timedelta objects:

 
 I'm conducting a series of observation experiments where I
 measure the duration of an event.  I then want to do various
 statistical analysis such as computing the mean, median,
 etc.  Originally, I tried using standard functions such as
 lmean from the stats.py package.  However, these sorts of
 functions divide by a float at the end, causing them to fail
 on timedelta objects.  Thus, I have to either write my own
 special functions, or convert the timedelta objects to
 integers first (then convert them back afterwards).
   (Daniel Stutzbach, in msg26267 on issue1289118.)

 The proposed statistics module does not support this use case:
[...]
 TypeError: sum only accepts numbers

That's a nice use-case, but I'm not sure how to solve it, or whether it needs 
to be.

I'm not going to add support for timedelta objects as a special-case. Once we 
start special-casing types, where will it end?

At first I thought that registering timedelta as a numeric type would help, but 
that is a slightly dubious thing to do since timedelta doesn't support all 
numeric operations:

py datetime.timedelta(1, 1, 1)+2
Traceback (most recent call last):
   File stdin, line 1, in module
TypeError: unsupported operand type(s) for +: 'datetime.timedelta' and 'int'

(What would that mean, anyway? Add two days, two seconds, or two milliseconds?)

Perhaps timedelta objects should be enhanced to be (Integral?) numbers. In the 
meantime, there's a simple way to do this:

py from datetime import timedelta as td
py data = [td(2), td(1), td(3), td(4)]
py m = statistics.mean([x.total_seconds() for x in data])
py m
216000.0
py td(seconds=m)
datetime.timedelta(2, 43200)

And for standard deviation:

py s = statistics.stdev([x.total_seconds() for x in data])
py td(seconds=s)
datetime.timedelta(1, 25141, 920371)

median works without any wrapper:

py statistics.median(data)
datetime.timedelta(2, 43200)

I'm now leaning towards will not fix for supporting timedelta objects. If 
they become proper numbers, then they should just work, and if they don't, 
supporting them just requires a tiny bit of extra code.

However, I will add documentation and tests for them.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-03 Thread Christian Heimes

Changes by Christian Heimes li...@cheimes.de:


--
nosy: +christian.heimes

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-03 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

 Once we start special-casing types, where will it end?

At the point where all stdlib types are special-cased. :-)


 In the meantime, there's a simple way to do this:

py from datetime import timedelta as td
py data = [td(2), td(1), td(3), td(4)]
py m = statistics.mean([x.total_seconds() for x in data])
py td(seconds=m)
datetime.timedelta(2, 43200)

Simple, but as simple ways go in this area not correct.  Here is the right way:

py td.resolution * statistics.mean(d//td.resolution for d in data)
datetime.timedelta(2, 43200)

I wish I had a solution to make sum() work properly on timedeltas without 
special-casing.  I thought that start could default to type(data[0])(0), but 
that would bring in strings.  Maybe statistics.mean() should support 
non-numbers that support addition and division by a number?  Will it be too 
confusing if mean() supports types that sum() does not?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-02 Thread Gregory P. Smith

Gregory P. Smith added the comment:

note, 
http://docs.scipy.org/doc/scipy/reference/stats.html#statistical-functions is a 
very popular module for statistics in Python.

One of the more frequent things I see people include the entire beast of a code 
base (scipy and numpy) for is the student's t-test functions.

I don't think we can include code from scipy due to license reasons so any 
implementor should NOT be looking at the scipy code, just the docs for API 
inspirations.

--
nosy: +gregory.p.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-02 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

Is there a reason why there is no review link?  Could it be because the file 
is uploaded as is rather than as a patch?

In any case, I have a question about this code in sum:

# Convert running total to a float. See comment below for
# why we do it this way.
total = type(total).__float__(total)

The comment below says:

# Don't call float() directly, as that converts strings and we
# don't want that. Also, like all dunder methods, we should call
# __float__ on the class, not the instance.
x = type(x).__float__(x)

but this reason does not apply to total that cannot be a string unless you add 
instances of a really weird class in which case all bets are off and the dunder 
method won't help much.

--
nosy: +belopolsky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-08-02 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

The implementation of median and mode families of functions as classes is 
clever, but I am not sure it is a good idea to return something other than an 
instance of the class from __new__().  I would prefer to see a more traditional 
implementation along the lines:

class _mode:
def __call__(self, data, ..):
..
def collate(self, data, ..):
..
mode = _mode()

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-07-31 Thread Steven D'Aprano

New submission from Steven D'Aprano:

I proposed adding a statistics module to the standard library some time ago, 
and received some encouragement:

http://mail.python.org/pipermail/python-ideas/2011-September/011524.html

Real life intervened, plus a bad case of over-engineering, but over the last 
few weeks I have culled my earlier (private) attempt down to manageable size. I 
would like to propose the attached module for the standard library.

I also have a set of unit-tests for this module. At the moment it covers about 
30-40% of the functions in the module, but I should be able to supply unit 
tests for the remaining functions over the next few days.

--
components: Library (Lib)
files: statistics.py
messages: 193988
nosy: stevenjd
priority: normal
severity: normal
status: open
title: Add statistics module to standard library
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file31097/statistics.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-07-31 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I suppose you should write a PEP for the module inclusion proposal (and for a 
summary of the API).

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18606] Add statistics module to standard library

2013-07-31 Thread Ronald Oussoren

Ronald Oussoren added the comment:

At first glance statistics.sum does the same as math.fsum (and statistics. 
add_partial seems to be a utility for implementing sum). 

I agree that a PEP would be useful.

--
nosy: +ronaldoussoren

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18606
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com