On 14.02.2017 16:59, Jim Nasby wrote:
On 2/13/17 10:45 AM, Konstantin Knizhnik wrote:
It is not true - please notice query execution time of this two queries:
I bet you'd get even less difference if you simply cast to float8
instead of adding 0.0. Same result, no floating point addition.
The expectation for SUM(float4) is that you want speed and are
prepared to cope with the consequences. It's easy enough to cast your
input to float8 if you want a wider accumulator, or to numeric if
you'd like more stable (not necessarily more accurate :-() results.
I do not think it's the database's job to make those choices for you.
From my point of your it is strange and wrong expectation.
I am choosing "float4" type for a column just because it is enough to
represent range of data I have and I need to minimize size of record.
In other words, you've decided to trade accuracy for performance...
Could not agree with it...
1. If I choose float4 type to store bid price (which usually has 5-6
significant digits) - I do not loose precision and accuracy is not suffered.
The accuracy is important when I am calculating sum of prices. But here
the assumption that accuracy of sum calculation should depend on type of
is non obvious. May be it is more or less clear for C programmers but
not for SQL users.
In all database I have tested SUM of single precision floats is
calculated at least using double precision numbers (or using numeric type).
2. There is no huge gap in performance between accumulating in float4
and float8. There are no "orders of magnitude":
postgres=# select sum(l_quantity) from lineitem_projection;
Time: 4659.509 ms (00:04.660)
postgres=# select sum(l_quantity::float8) from lineitem_projection;
Time: 5465.320 ms (00:05.465)
So do not think that there is actually compromise here between
performance and accuracy.
But current implementation cause leads to many confusions and
contradictions with users expectations:
1. The fact that sum(l_quantity) and sum(l_quantity::float8) are
absolutely different (1.5 times!!! - we loose 0.5 milliard dollars:)
2. avg(l_quantity)*count(l_quantity) is not equal to sum(l_quantity) But
in case of casting to float8 result is the same.
3. sum of aggregates for groups is not equal to total sum (once again no
problem for float8 type_/
But when I am calculating sum, I expect to receive more or less precise
result. Certainly I realize that even in case of using double it is
... but now you want to trade performance for accuracy? Why would you
expect the database to magically come to that conclusion?
Se above. No trading here. Please notice that current Postgres
implementation of AVG aggregates calculates at sum and sum of squares
even if last one is not needed for AVG.
The comment in the code says:
* It might seem attractive to optimize this by having multiple accumulator
* functions that only calculate the sums actually needed. But on most
* modern machines, a couple of extra floating-point multiplies will be
* insignificant compared to the other per-tuple overhead, so I've chosen
* to minimize code space instead.
And it is true!
In the addition to the results above I can add AVG timing for AVG
postgres=# select avg(l_quantity) from lineitem_projection;
postgres=# select avg(l_quantity::float8) from lineitem_projection;
Please notice that avg for float is calculated using float4_accum which
use float8 accumulator and also calculates sumX2!
Time: 6103.807 ms (00:06.104)
So I do not see reasonable arguments here for using float4pl for
And I do not know any database which has such strange behavior.
I know that "be as others" or especially "be as Oracle" are never good
argument for Postgres community but doing something differently (and
IMHO wrong) without any significant reasons seems to be very strange.
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company