Kevin Grittner <kgri...@ymail.com> writes:
> Aggregates on approximate (floating-point) numbers are not nearly
> as consistent as many people probably assume. Picture for a minute
> a table where a column contains positive floating point numbers
> happen to be located in the heap in increasing order, perhaps as
> the result of a CLUSTER on an index on the column. SELECT
> sum(colname) FROM tablename; would tend to give the most accurate
> answer possible when a sequence scan was run -- unless there
> happened to be a seqscan already half-way through the heap. Then
> the result would be different.
I don't think that argument holds any water. In the first place, somebody
could turn off synchronize_seqscans if they needed to have the calculation
done the same way every time (and I recall questions from users who ended
up doing exactly that, shortly after we introduced synchronize_seqscans).
In the second place, for most use-cases it'd be pretty foolish to rely on
physical heap order, so somebody who was really trying to sum float8s
accurately would likely do
select sum(x order by x) from ...
This is a well-defined, numerically stable calculation, and I don't want
to see us put in non-defeatable optimizations that break it.
> The real issue here is that if you are using an approximate data type
> and expecting exact answers, you will have problems.
That's a canard. People who know what they're doing (admittedly a
minority) do not expect exact answers, but they do expect to be able to
specify how to do the calculation in a way that minimizes roundoff errors.
The inverse-transition-function approach breaks that, and it does so at a
level where the user can't work around it, short of building his own
regards, tom lane
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: