[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

mgaido91 Sat, 23 Dec 2017 02:52:56 -0800

Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20023
  
    @gatorsmile that is scenario 3. I will explain you why and after I will do 
and errata corrige of the summary I did in my last e-mail, because I made a 
mistake about how DB2 computes the result precision and scale, sorry for that.
    
    Anyway, what you showed is an example of point 3 because DB2 computes the 
result type as `DECIMAL( MIN(31, p1 + p2), MIN(31, s1 + s2) )`. Therefore, in 
your case the result type was `DECIMAL(31, 31)`. Since your result had more 
than 0 significant digits, it was out out of the range of the representable 
values and an overflow exception was thrown.
    You can reproduce case 2 as follows:
    ```
    db2 => create table decimals_test (id int, a decimal(31,31), b 
decimal(31,31))
    DB20000I  The SQL command completed successfully.
    db2 => insert into decimals_test values(1, 0.12345678912345678912345689, 
0.12345678912345678912345)             
    DB20000I  The SQL command completed successfully.
    db2 => select a*b from dd
    
    1                                
    ---------------------------------
     .0152415787806736785461049526020
    ```
    As you can see a truncation occurred.
    
    Now, let me amend my table to summarize the behavior of the many DBs:
    1. **Rules to determine precision and scale**
         - *Hive, SQLServer (and Spark after the PR)*: I won't include the
    exact formulas, anyway the relevant part is that in case of precision
    higher that the maximum value, we use the maximum available value (38) as
    precision and the maximum between the needed scale (computing according the
    relevant formula) and a minimum value guaranteed for the scale which is 6.
         - *DB2*: computes the result type as `DECIMAL( MIN(31, p1 + p2), 
MIN(31, s1 + s2) )`.
         - *Postgres and Oracle*: NA
         - *SQL ANSI 2011*: no indication
         - *Spark now*: if the precision needed is more than 38, use 38 as
    precision; use the needed scale without any adjustment.
    
      2. **Behavior in case of precision loss but result in the range of the
    representable values**
         - *Oracle, Hive, SQLServer (and Spark after the PR)*: round the result.
         - *DB2*: truncates the result (and sets a warning flag).
         - *Postgres*: NA, it has infinite precision...
         - *SQL ANSI 2011*: either truncate or round the value.
         - *Spark now*: returns NULL.
    
      3. **Behavior in case of result out of the range of the representable
    values (i.e overflow)**
         - *Oracle, DB2, SQLServer*: throw an exception.
         - *Postgres*: NA, they have nearly infinite precision...
         - *SQL ANSI 2011*: an exception should be raised
         - *Spark now, Hive*: return NULL (for Hive, there is a open ticket to
    make it compliant to the SQL standard).




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

Reply via email to