[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

mgaido91 Thu, 21 Dec 2017 01:39:07 -0800

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20023
  
    @gatorsmile, please refer to the [e-mail to the dev mail 
list](https://mail-archives.apache.org/mod_mbox/spark-dev/201712.mbox/%3CCAEorWNAJ4TxJR9NBcgSFMD_VxTg8qVxusjP%2BAJP-x%2BJV9zH-yA%40mail.gmail.com%3E)
 for further details. I run the script I added to the tests in this PR, the 
results are:
    
     - Hive behaves exactly as Spark after this PR;
     - SQLServer the same, even though on additions and subtractions it seems 
to maintain one more precision digit in some cases (I am running SQLServer 
2017, since Hive implementation, and therefore this too, are inspired to 
SQLServer2005, there might have been a small behavior change in this case). 
Anyway, differently from Hive and Spark it throws an exception in case 3 
described in the email (it is compliant to SQL standard, point 3 of the email 
is out of scope of this PR, I will create another PR for it once we agree on 
how to handle that case);
     - Oracle and Postgres have nearly infinite precision. Thus it is nearly 
impossible to provoke a rounding on them. If we force a precision loss on them 
(point 3 of the email, out of scope of this PR) they throw an exception 
(compliant to SQL standard and SQLServer);
    
    Here you are the outputs of the queries.
    
    **Hive 2.3.0 (same as Spark after PR)**
    
    ```
    0: jdbc:hive2://localhost:10000> create table decimals_test(id int, a 
decimal(38,18), b decimal(38,18));
    No rows affected (2.085 seconds)
    0: jdbc:hive2://localhost:10000> insert into decimals_test values (1, 
100.0, 999.0), (2, 12345.123, 12345.123), (3, 0.1234567891011, 1234.1), (4, 
123456789123456789.0, 1.123456789123456789);
    No rows affected (14.054 seconds)
    0: jdbc:hive2://localhost:10000> select id, a+b, a-b, a*b, a/b from 
decimals_test order by id;
    
+-----+---------------------------------------+---------------------------------------+----------------------------+----------------------------+
    | id  |                  _c1                  |                  _c2        
          |            _c3             |            _c4             |
    
+-----+---------------------------------------+---------------------------------------+----------------------------+----------------------------+
    | 1   | 1099.00000000000000000                | -899.00000000000000000      
          | 99900.000000               | 0.100100                   |
    | 2   | 24690.24600000000000000               | 0E-17                       
          | 152402061.885129           | 1.000000                   |
    | 3   | 1234.22345678910110000                | -1233.97654321089890000     
          | 152.358023                 | 0.000100                   |
    | 4   | 123456789123456790.12345678912345679  | 
123456789123456787.87654321087654321  | 138698367904130467.515623  | 
109890109097814272.043109  |
    
+-----+---------------------------------------+---------------------------------------+----------------------------+----------------------------+
    ```
    
    **SQLServer 2017**
    
    ```
    1> create table decimals_test(id int, a decimal(38,18), b decimal(38,18));
    2> insert into decimals_test values (1, 100.0, 999.0), (2, 12345.123, 
12345.123), (3, 0.1234567891011, 1234.1), (4, 123456789123456789.0, 
1.123456789123456789);
    3> select id, a+b, a-b, a*b, a/b from decimals_test order by id;
    4> GO
    
    (4 rows affected)
    id                                                                          
                                                                                
                   
    ----------- ---------------------------------------- 
---------------------------------------- 
---------------------------------------- 
----------------------------------------
              1                  1099.000000000000000000                  
-899.000000000000000000                             99900.000000                
                  .100100
              2                 24690.246000000000000000                      
.000000000000000000                         152402061.885129                    
             1.000000
              3                  1234.223456789101100000                 
-1233.976543210898900000                               152.358023               
                   .000100
              4    123456789123456790.123456789123456789    
123456789123456787.876543210876543211                138698367904130467.515623  
              109890109097814272.043109
    ```
    
    **Postgres and Oracle**
    
    ```
    postgres=# create table decimals_test(id int, a decimal(38,18), b 
decimal(38,18));
    CREATE TABLE
    postgres=# insert into decimals_test values (1, 100.0, 999.0), (2, 
12345.123, 12345.123), (3, 0.1234567891011, 1234.1), (4, 123456789123456789.0, 
1.123456789123456789);
    INSERT 0 4
    postgres=# select id, a+b, a-b, a*b, a/b from decimals_test order by id;
     id |               ?column?                |               ?column?        
        |                        ?column?                         |             
  ?column?                
    
----+---------------------------------------+---------------------------------------+---------------------------------------------------------+---------------------------------------
      1 |               1099.000000000000000000 |               
-899.000000000000000000 |              
99900.000000000000000000000000000000000000 |                
0.10010010010010010010
      2 |              24690.246000000000000000 |                  
0.000000000000000000 |          152402061.885129000000000000000000000000000000 
|                1.00000000000000000000
      3 |               1234.223456789101100000 |              
-1233.976543210898900000 |                
152.358023429667510000000000000000000000 |            0.000100037913541123085649
      4 | 123456789123456790.123456789123456789 | 
123456789123456787.876543210876543211 | 
138698367904130467.515622620750190521000000000000000000 | 
109890109097814272.043109406191131436
    (4 rows)
    ```
    
    **Spark before the PR**
    
    ```
    scala> sql("create table decimals_test(id int, a decimal(38,18), b 
decimal(38,18)) using parquet")
    res0: org.apache.spark.sql.DataFrame = []
    
    scala> sql("insert into decimals_test values (1, 100.0, 999.0), (2, 
12345.123, 12345.123), (3, 0.1234567891011, 1234.1), (4, 123456789123456789.0, 
1.123456789123456789)")
    res1: org.apache.spark.sql.DataFrame = []                                   
    
    
    scala> sql("select id, a+b, a-b, a*b, a/b from decimals_test order by 
id").show(truncate = false)
    
+---+-------------------------------------+-------------------------------------+-------+-------------------------------------+
    |id |(a + b)                              |(a - b)                          
    |(a * b)|(a / b)                              |
    
+---+-------------------------------------+-------------------------------------+-------+-------------------------------------+
    |1  |1099.000000000000000000              |-899.000000000000000000          
    |null   |0.100100100100100100                 |
    |2  |24690.246000000000000000             |0E-18                            
    |null   |1.000000000000000000                 |
    |3  |1234.223456789101100000              |-1233.976543210898900000         
    |null   |0.000100037913541123                 |
    |4  
|123456789123456790.123456789123456789|123456789123456787.876543210876543211|null
   |109890109097814272.043109406191131436|
    
+---+-------------------------------------+-------------------------------------+-------+-------------------------------------+
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

Reply via email to