Re: [HACKERS] Sum aggregate calculation for single precsion real

Konstantin Knizhnik Tue, 14 Feb 2017 23:41:00 -0800


On 14.02.2017 16:59, Jim Nasby wrote:

On 2/13/17 10:45 AM, Konstantin Knizhnik wrote:

It is not true - please notice query execution time of this two queries:

I bet you'd get even less difference if you simply cast to float8instead of adding 0.0. Same result, no floating point addition.

The expectation for SUM(float4) is that you want speed and are
prepared to cope with the consequences.  It's easy enough to cast your
input to float8 if you want a wider accumulator, or to numeric if
you'd like more stable (not necessarily more accurate :-() results.
I do not think it's the database's job to make those choices for you.


From my point of your it is strange and wrong expectation.
I am choosing "float4" type for a column just because it is enough to
represent range of data I have and I need to minimize size of record.


In other words, you've decided to trade accuracy for performance...


Could not agree with it...

1. If I choose float4 type to store bid price (which usually has 5-6significant digits) - I do not loose precision and accuracy is not suffered.The accuracy is important when I am calculating sum of prices. But herethe assumption that accuracy of sum calculation should depend on type ofsummed fieldis non obvious. May be it is more or less clear for C programmers butnot for SQL users.In all database I have tested SUM of single precision floats iscalculated at least using double precision numbers (or using numeric type).

2. There is no huge gap in performance between accumulating in float4and float8. There are no "orders of magnitude":

postgres=# select sum(l_quantity) from lineitem_projection;
     sum
-------------
 1.07374e+09
(1 row)

Time: 4659.509 ms (00:04.660)

postgres=# select sum(l_quantity::float8) from lineitem_projection;
    sum
------------
 1529738036
(1 row)

Time: 5465.320 ms (00:05.465)

So do not think that there is actually compromise here betweenperformance and accuracy.But current implementation cause leads to many confusions andcontradictions with users expectations:

1. The fact that sum(l_quantity) and sum(l_quantity::float8) areabsolutely different (1.5 times!!! - we loose 0.5 milliard dollars:)2. avg(l_quantity)*count(l_quantity) is not equal to sum(l_quantity) Butin case of casting to float8 result is the same.3. sum of aggregates for groups is not equal to total sum (once again noproblem for float8 type_/

But when I am calculating sum, I expect to receive more or less precise
result. Certainly I realize that even in case of using double it is
... but now you want to trade performance for accuracy? Why would youexpect the database to magically come to that conclusion?

Se above. No trading here. Please notice that current Postgresimplementation of AVG aggregates calculates at sum and sum of squareseven if last one is not needed for AVG.

The comment in the code says:

 * It might seem attractive to optimize this by having multiple accumulator
 * functions that only calculate the sums actually needed.  But on most
 * modern machines, a couple of extra floating-point multiplies will be
 * insignificant compared to the other per-tuple overhead, so I've chosen
 * to minimize code space instead.

And it is true!

In the addition to the results above I can add AVG timing for AVGcalculation:


postgres=# select avg(l_quantity) from lineitem_projection;
       avg
------------------
 25.5015621964919
(1 row)

postgres=# select avg(l_quantity::float8) from lineitem_projection;
       avg
------------------
 25.5015621964919
(1 row)

Please notice that avg for float is calculated using float4_accum whichuse float8 accumulator and also calculates sumX2!



Time: 6103.807 ms (00:06.104)

So I do not see reasonable arguments here for using float4pl forsum(float4)!

And I do not know any database which has such strange behavior.

I know that "be as others" or especially "be as Oracle" are never goodargument for Postgres community but doing something differently (andIMHO wrong) without any significant reasons seems to be very strange.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Sum aggregate calculation for single precsion real

Reply via email to