[issue45876] Improve accuracy of stdev functions in statistics

2021-11-30 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset 0aa0bd056349f73de9577ccc38560c1d01864d51 by Raymond Hettinger in branch 'main': bpo-45876: Have stdev() also use decimal specific square root. (GH-29869) https://github.com/python/cpython/commit/0aa0bd056349f73de9577ccc38560c1d01864d51

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-30 Thread Raymond Hettinger
Change by Raymond Hettinger : -- pull_requests: +28095 pull_request: https://github.com/python/cpython/pull/29869 ___ Python tracker ___

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-30 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset a39f46afdead515e7ac3722464b5ee8d7b0b2c9b by Raymond Hettinger in branch 'main': bpo-45876: Correctly rounded stdev() and pstdev() for the Decimal case (GH-29828)

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-28 Thread Raymond Hettinger
Change by Raymond Hettinger : -- pull_requests: +28059 pull_request: https://github.com/python/cpython/pull/29828 ___ Python tracker ___

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset af9ee57b96cb872df6574e36027cc753417605f9 by Raymond Hettinger in branch 'main': bpo-45876: Improve accuracy for stdev() and pstdev() in statistics (GH-29736) https://github.com/python/cpython/commit/af9ee57b96cb872df6574e36027cc753417605f9

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: Thank you all for looking at this. It's unlikely that anyone will ever notice the improvement, but I'm happy with it and that's all the matters ;-) -- ___ Python tracker

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Raymond Hettinger
Change by Raymond Hettinger : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Steven D'Aprano
Steven D'Aprano added the comment: Raymond, Mark, Tim, I have been reading this whole thread. Thank you all. I am in awe and a little bit intimidated by how much I still have to learn about floating point arithmetic. -- ___ Python tracker

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Tim Peters
Tim Peters added the comment: But I would like to leave it alone. Extended precision simply is not an issue on any current platform I'm aware of ("not even Windows"), and I would, e.g., hate trying to explain to users why 1 / 2731 != 1.0 / 2731.0 (assuming we're not also proposing to

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: > It won't affect _this_ application, but possibly we should > fix this anyway. I would like to see this fixed. It affects our ability to reason about int/int code. That comes up every time a fraction is fed into a math library function than converts

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Mark Dickinson
Mark Dickinson added the comment: Concrete example of int/int not being correctly rounded on systems using x87 instructions: on those systems, I'd expect to see 1/2731 return a result of 0.00036616623947272064 (0x1.7ff4005ffd002p-12), as a result of first rounding to 64-bit precision and

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Mark Dickinson
Mark Dickinson added the comment: > All the rounding has already happened at the point where ldexp is called, and > the result of the ldexp call is exact. Sketch of proof: [Here](https://github.com/python/cpython/blob/4ebde73b8e416eeb1fd5d2ca3283f7ddb534c5b1/Objects/longobject.c#L3929) we

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Tim Peters
Tim Peters added the comment: > Objects/longobject.c::long_true_divide() uses ldexp() internally. > Will it suffer the same issues with subnormals on Windows? Doesn't look like it will. In context, looks like it's ensuring that ldexp can only lose trailing 0 bits, so that _whatever_ ldexp

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Mark Dickinson
Mark Dickinson added the comment: > Will it suffer the same issues with subnormals on Windows? No, it should be fine. All the rounding has already happened at the point where ldexp is called, and the result of the ldexp call is exact. > Is CPython int/int true division guaranteed to be

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: [Tim] > Note that, on Windows, ldexp() in the presence of > denorms can truncate. Division rounds, so > >assert x / 2**i == ldexp(x, -i) > > can fail. Objects/longobject.c::long_true_divide() uses ldexp() internally. Will it suffer the same issues

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Tim Peters
Tim Peters added the comment: Mark, ya, MS's Visual Studio's ldexp() has, far as I know, always worked this way. The code I showed was run under the 2019 edition, which we use to build the Windows CPython. Raymond, x = float(i) is screamingly obvious at first glance. x = i/1

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Mark Dickinson
Mark Dickinson added the comment: Since I've failed to find a coherent statement and proof of the general principle I articulated anywhere online, I've included one below. (To be absolutely clear, the principle is not new - it's very well known, but oddly hard to find written down

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Mark Dickinson
Mark Dickinson added the comment: [Raymond] > [...] perhaps do an int/int division to match the other code path [...] Sure, works for me. -- ___ Python tracker ___

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: Instead of calling float(), perhaps do an int/int division to match the other code path so that the function depends on only one mechanism for building the float result. -return float(_isqrt_frac_rto(n, m << 2 * q) << q) +(_isqrt_frac_rto(n, m <<

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Mark Dickinson
Mark Dickinson added the comment: Related: https://stackoverflow.com/questions/32150888/should-ldexp-round-correctly -- ___ Python tracker ___

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-25 Thread Mark Dickinson
Mark Dickinson added the comment: There's also a potential double-rounding issue with ldexp, when the first argument is an int: ldexp(n, e) will first round n to a float, and then (again for results in the subnormal range) potentially also need to round the result. >>> n = 2**53 + 1 >>> e =

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-25 Thread Tim Peters
Tim Peters added the comment: Note that, on Windows, ldexp() in the presence of denorms can truncate. Division rounds, so assert x / 2**i == ldexp(x, -i) can fail. >>> import math >>> d = math.nextafter(0.0, 1.0) >>> d 5e-324 >>> d3 = 7 * d # ....0111 >>> d3 3.5e-323 >>> d3 / 4.0

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: Mark, would it preferable to use ldexp() to build the float? + return math.ldexp(isqrt_frac_rto(n << -2 * q, m), q) - return isqrt_frac_rto(n << -2 * q, m) / (1 << -q) -- ___ Python tracker

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-24 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Here's a reference for this use of round-to-odd: > https://www.lri.fr/~melquion/doc/05-imacs17_1-expose.pdf Thanks Mark. It looks like I'll be getting a little education over the Thanksgiving holiday :-) Shown below is the code that I'm thinking of

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-24 Thread Mark Dickinson
Mark Dickinson added the comment: Here's a reference for this use of round-to-odd: https://www.lri.fr/~melquion/doc/05-imacs17_1-expose.pdf I'm happy to provide any proofs that anyone feels are needed. -- ___ Python tracker

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-24 Thread Mark Dickinson
Mark Dickinson added the comment: > am still studying the new one Sorry - I should have added at least _some_ comments to it. Here's the underlying principle, which I think of as "The magic of round-to-odd": Suppose x is a positive real number and e is an integer satisfying 2^e <= x. Then

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: As a side effect of inlining the variance code, we also get to fix the error messages which were variance specific. -- ___ Python tracker

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: I've opened a PR to make this easy to experiment with. It also worked with my frac_sqrt() and deci_sqrt(), but having all integer arithmetic and always correct rounding are nice wins. The only downside is that I completely understood the first two

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Raymond Hettinger
Change by Raymond Hettinger : -- keywords: +patch pull_requests: +27974 stage: -> patch review pull_request: https://github.com/python/cpython/pull/29736 ___ Python tracker

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Mark Dickinson added the comment: Hmm. isqrt_frac_rto is unnecessarily complicated. Here's a more streamlined version of the code. import math def isqrt_frac_rto(n, m): """ Square root of n/m, rounded to the nearest integer using round-to-odd. """ a = math.isqrt(n*m) // m

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Mark Dickinson added the comment: > Should the last line of sqrt_frac() be wrapped with float()? It's already a float - it's the result of an int / int division. -- ___ Python tracker

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: Should the last line of sqrt_frac() be wrapped with float()? -- ___ Python tracker ___ ___

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Mark Dickinson added the comment: Here's the float-and-Fraction-based code that I'm using to compare the integer-based code against: def sqrt_frac2(n, m): """ Square root of n/m as a float, correctly rounded. """ f = fractions.Fraction(n, m) # First approximation. x

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Mark Dickinson added the comment: > Does the technique you had in mind involve testing 1 ulp up or down to see > whether its square is closer to the input? Kinda sorta. Below is some code: it's essentially just pure integer operations, with a last minute conversion to float (implicit in the

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: > It wouldn't be hard to go for _always_ correctly rounded > and actually get it over. Yes, that would be the right thing to do. Does the technique you had in mind involve testing 1 ulp up or down to see whether its square is closer to the input? > an

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Mark Dickinson added the comment: > we've run the ball almost the full length of the field and then failed to put > the ball over the goal line But if we only go from "faithfully rounded" to "almost always correctly rounded", it seems to me that we're still a couple of millimetres away from

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: > I'm not sure this is worth worrying about ... Instead of writing simple, mostly accurate code with math.fsum(), these functions have already applied labor intensive measures to get an exact mean and exact sum of square differences expressed in

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Mark Dickinson added the comment: One thought: would the deci_sqrt approach help with value ranges where the values are well within float limits, but the squares of the values are not? E.g., on my machine, I currently get errors for both of the following: >>> xs = [random.normalvariate(0.0,

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Mark Dickinson added the comment: I'm not sure this is worth worrying about. We already have a very tight error bound on the result: if `x` is a (positive) fraction and `y` is the closest float to x, (and assuming IEEE 754 binary64, round-ties-to-even, no overflow or underflow, etc.) then

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Mark Dickinson
Change by Mark Dickinson : -- nosy: +mark.dickinson ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45876] Improve accuracy of stdev functions in statistics

2021-11-23 Thread Raymond Hettinger
New submission from Raymond Hettinger : The standard deviation computation in the statistics module is still subject to error even though the mean and sum of square differences are computed exactly using fractions. The problem is that the exact fraction gets rounded to a float before going