Hi hackers,

We have found that in parallel mode result of queries is non-deterministic
when the types of the attributes in table are double precision
(floating-point).

Our example is based on TPC-H, but some NUMERIC columns type was changed to
DOUBLE PRECISION;

When running without parallelism

tpch=# set max_parallel_workers_per_gather to 0;
SET
tpch=# select sum(l_extendedprice) from lineitem where l_shipdate <= date
'1998-12-01' - interval '116 days';
       sum
------------------
 448157055361.319
(1 row)

output is always the same.

But in parallel mode

tpch=# set max_parallel_workers_per_gather to 1;
SET
tpch=# select sum(l_extendedprice) from lineitem where l_shipdate <= date
'1998-12-01' - interval '116 days';
       sum
------------------
 448157055361.341
(1 row)

tpch=# select sum(l_extendedprice) from lineitem where l_shipdate <= date
'1998-12-01' - interval '116 days';
       sum
-----------------
 448157055361.348
(1 row)

result differs between runs.

That is because floating-point addition is not necessarily associative.
That is, (a + b) + c is not necessarily equal to a + (b + c).
In parallel mode, the order in which the attribute values are added
(summed) changes between runs, which leads to non-deterministic results.

Is this desirable behavior?

-- 

*Best Regards,**Ruben.* <ru...@ispras.ru>
ISP RAS.

Reply via email to