On Sun, Dec 15, 2013 at 2:00 PM, Tom Lane <[email protected]> wrote:
> Greg Stark <[email protected]> writes:
> > On 14 Dec 2013 15:40, "Tom Lane" <[email protected]> wrote:
> >> I think you *can't* cover them for the float types; roundoff error
> >> would mean you don't get the same answers as before.
>
> > I was going to say the same thing. But then I started to wonder....
> What's
> > so special about the answers we used to give? They are also subject to
> > round off and the results are already quite questionable in those cases.
>
> Well, we can't easily do better than the old answers, and the new ones
> might be arbitrarily worse. Example: sum or average across single-row
> windows ought to be exact in any case, but it might be arbitrarily wrong
> with the negative-transition technique.
>
> More generally, this is supposed to be a performance enhancement only;
> it's not supposed to change the results.
>
>
It came to me that it might be possible to implement inverse transitions
for floating point aggregates by just detecting if precision has been lost
during forward transitions.
I've written the test to do this as:
IF state.value + value = state.value AND value <> 0 THEN
newstate.precision_lost := true; newstate.value := state.value; ELSE
newstate.precision_lost := false; newstate.value := state.value + value;
END IF;
The inverse transition function checks the precision_lost and if it's true
it returns NULL. The core code is now implemented (thanks to Florian) to
re-aggregate when NULL is returned from the inverse transition function.
I've attached an implementation of this with the transition functions
written in plpgsql.
I don't really know for sure yet if it can handle all cases and give the
exact same results as it would without inverse transitions, but it
certainly fixes the error case which was presented
Using the attached on HEAD of
https://github.com/david-rowley/postgres/commits/invtrans
explain (analyze, verbose)
select mysum(v) over (order by i rows between current row and unbounded
following) from (values(1,1e20),(2,1)) b(i,v);
Gives me the expected results of 1e20 and 1, instead of my original attempt
which gave 1e20 and 0.
I guess the extra tracking on forward transition might mean this would not
be practical to implement in C for sum(float), but I just wanted to run the
idea through a few heads to see if anyone can present a case where it can
still produce wrong results.
If it seems sound enough, then I may implement it in C to see how much
overhead it adds to forward aggregation for floating point types, but even
if it did add too much overhead to forward aggregation it might be worth
allowing aggregates to have 2 forward transition functions and if the 2nd
one exists then it could be used in windowing functions where the frame
does not have "unbounded following".
Any thoughts?
Regards
David Rowley
BEGIN WORK;
CREATE TYPE float_state AS (precision_lost bool, value float);
CREATE OR REPLACE FUNCTION float_sum(state float_state, value float)
RETURNS float_state AS
$$
DECLARE newstate float_state;
BEGIN
IF state IS NULL THEN
IF value IS NULL THEN
RETURN NULL;
ELSE
newstate.value := value;
newstate.precision_lost := false;
return newstate;
END IF;
END IF;
IF state.value + value = state.value AND value <> 0 THEN
newstate.precision_lost := true;
newstate.value := state.value;
ELSE
newstate.precision_lost := false;
newstate.value := state.value + value;
END IF;
RETURN newstate;
END;
$$
LANGUAGE plpgsql IMMUTABLE;
CREATE OR REPLACE FUNCTION float_sum_inv(state float_state, value float)
RETURNS float_state AS
$$
DECLARE newstate float_state;
BEGIN
IF state.precision_lost = true THEN
RETURN NULL;
ELSE
newstate.value := state.value - value;
RETURN newstate;
END IF;
END;
$$
LANGUAGE plpgsql STRICT IMMUTABLE;
CREATE OR REPLACE FUNCTION float_sum_final(state float_state)
RETURNS float AS
$$
BEGIN
IF NOT(state IS NULL) THEN
RETURN state.value;
ELSE
RETURN NULL;
END IF;
END;
$$
LANGUAGE plpgsql IMMUTABLE;
CREATE AGGREGATE mysum (float) (
stype = float_state,
sfunc = float_sum,
invfunc = float_sum_inv,
finalfunc = float_sum_final
);
select mysum(v) from (values(1,1e20),(2,1)) b(i,v);
-- forces re-aggregate due to precision loss
--explain (analyze, verbose)
select mysum(v) over (order by i rows between current row and unbounded
following) from (values(1,1e20),(2,1)) b(i,v);
-- does not force reaggregate.
--explain (analyze, verbose)
select mysum(v) over (order by i rows between current row and unbounded
following) from (values(1,1),(2,2),(3,3)) b(i,v);
rollback;
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers