Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/20023
@gatorsmile, please refer to the [e-mail to the dev mail
list](https://mail-archives.apache.org/mod_mbox/spark-dev/201712.mbox/%3CCAEorWNAJ4TxJR9NBcgSFMD_VxTg8qVxusjP%2BAJP-x%2BJV9zH-yA%40mail.gmail.com%3E)
for further details. I run the script I added to the tests in this PR, the
results are:
- Hive behaves exactly as Spark after this PR;
- SQLServer the same, even though on additions and subtractions it seems
to maintain one more precision digit in some cases (I am running SQLServer
2017, since Hive implementation, and therefore this too, are inspired to
SQLServer2005, there might have been a small behavior change in this case).
Anyway, differently from Hive and Spark it throws an exception in case 3
described in the email (it is compliant to SQL standard, point 3 of the email
is out of scope of this PR, I will create another PR for it once we agree on
how to handle that case);
- Oracle and Postgres have nearly infinite precision. Thus it is nearly
impossible to provoke a rounding on them. If we force a precision loss on them
(point 3 of the email, out of scope of this PR) they throw an exception
(compliant to SQL standard and SQLServer);
Here you are the outputs of the queries.
**Hive 2.3.0 (same as Spark after PR)**
```
0: jdbc:hive2://localhost:10000> create table decimals_test(id int, a
decimal(38,18), b decimal(38,18));
No rows affected (2.085 seconds)
0: jdbc:hive2://localhost:10000> insert into decimals_test values (1,
100.0, 999.0), (2, 12345.123, 12345.123), (3, 0.1234567891011, 1234.1), (4,
123456789123456789.0, 1.123456789123456789);
No rows affected (14.054 seconds)
0: jdbc:hive2://localhost:10000> select id, a+b, a-b, a*b, a/b from
decimals_test order by id;
+-----+---------------------------------------+---------------------------------------+----------------------------+----------------------------+
| id | _c1 | _c2
| _c3 | _c4 |
+-----+---------------------------------------+---------------------------------------+----------------------------+----------------------------+
| 1 | 1099.00000000000000000 | -899.00000000000000000
| 99900.000000 | 0.100100 |
| 2 | 24690.24600000000000000 | 0E-17
| 152402061.885129 | 1.000000 |
| 3 | 1234.22345678910110000 | -1233.97654321089890000
| 152.358023 | 0.000100 |
| 4 | 123456789123456790.12345678912345679 |
123456789123456787.87654321087654321 | 138698367904130467.515623 |
109890109097814272.043109 |
+-----+---------------------------------------+---------------------------------------+----------------------------+----------------------------+
```
**SQLServer 2017**
```
1> create table decimals_test(id int, a decimal(38,18), b decimal(38,18));
2> insert into decimals_test values (1, 100.0, 999.0), (2, 12345.123,
12345.123), (3, 0.1234567891011, 1234.1), (4, 123456789123456789.0,
1.123456789123456789);
3> select id, a+b, a-b, a*b, a/b from decimals_test order by id;
4> GO
(4 rows affected)
id
----------- ----------------------------------------
----------------------------------------
----------------------------------------
----------------------------------------
1 1099.000000000000000000
-899.000000000000000000 99900.000000
.100100
2 24690.246000000000000000
.000000000000000000 152402061.885129
1.000000
3 1234.223456789101100000
-1233.976543210898900000 152.358023
.000100
4 123456789123456790.123456789123456789
123456789123456787.876543210876543211 138698367904130467.515623
109890109097814272.043109
```
**Postgres and Oracle**
```
postgres=# create table decimals_test(id int, a decimal(38,18), b
decimal(38,18));
CREATE TABLE
postgres=# insert into decimals_test values (1, 100.0, 999.0), (2,
12345.123, 12345.123), (3, 0.1234567891011, 1234.1), (4, 123456789123456789.0,
1.123456789123456789);
INSERT 0 4
postgres=# select id, a+b, a-b, a*b, a/b from decimals_test order by id;
id | ?column? | ?column?
| ?column? |
?column?
----+---------------------------------------+---------------------------------------+---------------------------------------------------------+---------------------------------------
1 | 1099.000000000000000000 |
-899.000000000000000000 |
99900.000000000000000000000000000000000000 |
0.10010010010010010010
2 | 24690.246000000000000000 |
0.000000000000000000 | 152402061.885129000000000000000000000000000000
| 1.00000000000000000000
3 | 1234.223456789101100000 |
-1233.976543210898900000 |
152.358023429667510000000000000000000000 | 0.000100037913541123085649
4 | 123456789123456790.123456789123456789 |
123456789123456787.876543210876543211 |
138698367904130467.515622620750190521000000000000000000 |
109890109097814272.043109406191131436
(4 rows)
```
**Spark before the PR**
```
scala> sql("create table decimals_test(id int, a decimal(38,18), b
decimal(38,18)) using parquet")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("insert into decimals_test values (1, 100.0, 999.0), (2,
12345.123, 12345.123), (3, 0.1234567891011, 1234.1), (4, 123456789123456789.0,
1.123456789123456789)")
res1: org.apache.spark.sql.DataFrame = []
scala> sql("select id, a+b, a-b, a*b, a/b from decimals_test order by
id").show(truncate = false)
+---+-------------------------------------+-------------------------------------+-------+-------------------------------------+
|id |(a + b) |(a - b)
|(a * b)|(a / b) |
+---+-------------------------------------+-------------------------------------+-------+-------------------------------------+
|1 |1099.000000000000000000 |-899.000000000000000000
|null |0.100100100100100100 |
|2 |24690.246000000000000000 |0E-18
|null |1.000000000000000000 |
|3 |1234.223456789101100000 |-1233.976543210898900000
|null |0.000100037913541123 |
|4
|123456789123456790.123456789123456789|123456789123456787.876543210876543211|null
|109890109097814272.043109406191131436|
+---+-------------------------------------+-------------------------------------+-------+-------------------------------------+
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]