Kevin Grittner <kgri...@ymail.com> writes: > Aggregates on approximate (floating-point) numbers are not nearly > as consistent as many people probably assume. Picture for a minute > a table where a column contains positive floating point numbers > happen to be located in the heap in increasing order, perhaps as > the result of a CLUSTER on an index on the column. SELECT > sum(colname) FROM tablename; would tend to give the most accurate > answer possible when a sequence scan was run -- unless there > happened to be a seqscan already half-way through the heap. Then > the result would be different.
I don't think that argument holds any water. In the first place, somebody could turn off synchronize_seqscans if they needed to have the calculation done the same way every time (and I recall questions from users who ended up doing exactly that, shortly after we introduced synchronize_seqscans). In the second place, for most use-cases it'd be pretty foolish to rely on physical heap order, so somebody who was really trying to sum float8s accurately would likely do select sum(x order by x) from ... This is a well-defined, numerically stable calculation, and I don't want to see us put in non-defeatable optimizations that break it. > The real issue here is that if you are using an approximate data type > and expecting exact answers, you will have problems. That's a canard. People who know what they're doing (admittedly a minority) do not expect exact answers, but they do expect to be able to specify how to do the calculation in a way that minimizes roundoff errors. The inverse-transition-function approach breaks that, and it does so at a level where the user can't work around it, short of building his own aggregates. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers