On Jan10, 2014, at 18:14 , Kevin Grittner <kgri...@ymail.com> wrote: > Given that this is already the case with aggregates on floating > point approximate numbers, why should we rule out an optimization > which only makes rounding errors more likely to be visible? The > real issue here is that if you are using an approximate data type > and expecting exact answers, you will have problems.
Because without the optimization, only the values which you *actually* process for a given result determine whether you lose precision or not. With the optimization, OTOH, values which have *nothing* to do with the result in question can nevertheless make it completely bogus. SUM() is a good example. As long as all your values are positive, the amount of precision you lose is bound by the number of input values. If I sum over 10 values, the worst that can happen is that the first values is large enough to prevent the other 9 values from influencing the result. That limits the relative error to something like 9*epsilon, where epsilon is the relative precision of the floating point type, i.e. 1e-15 or so for double. In other words, as long as your frames are less than 10e13 rows long, the relative error will stay below 1%. But with the optimization, that is no longer true. If you sum from, say, CURRENT ROW to UNBOUNDED FOLLOWING, the relative error of the result in one row now depends on the magnitude of values *preceding* that row, even though that value isn't in the frame. And since we now internally subtract, not only add, the relative error is no longer bound by the number of rows in the frame. Here's the corresponding SELECT (which is basically the same as Tom's example upthread): select n, x::float, sum(x::float) over ( order by n rows between current row and unbounded following ) from (values (1, 1e20), (2, 1), (3, 2) ) as t(n, x) order by n; Currently that returns n | x | sum ---+-------+------- 1 | 1e+20 | 1e+20 2 | 1 | 3 3 | 2 | 2 but with an inverse transfer function, it may very well return n | x | sum ---+-------+------- 1 | 1e+20 | 1e+20 2 | 1 | 0 3 | 2 | -1 > That's not to say that approximations are useless. If you > represent the circumference of the earth with a double precision > number you're dealing with an expected rounding error of about a > foot. That's close enough for many purposes. The mistake is > assuming that it will be exact or that rounding errors cannot > accumulate. In situations where SQL does not promise particular > ordering of operations, it should not be assumed; so any > expectations of a specific or repeatable result from a sum or > average of approximate numbers is misplaced. But this isn't about ordering, it's replacing one computation with a completely different one that just happens to be equivalent *algebraically*. To me, the proposed optimization for float is akin to C compiler which decided to evaluate a + b + c + … z as -a + (2a - b) + (2b - c) + … + (2y - z) + 2z Algebraically, these are the same, but it'd still be insane to do that. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers