[GitHub] [spark] wangyum opened a new pull request #26697: [SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column

GitBox Wed, 27 Nov 2019 16:20:26 -0800

wangyum opened a new pull request #26697: [SPARK-28461][SQL] Pad Decimal 
numbers with trailing zeros to the scale of the column
URL: https://github.com/apache/spark/pull/26697
 
 
   ## What changes were proposed in this pull request?
   
   [HIVE-12063](https://issues.apache.org/jira/browse/HIVE-12063) improved pad 
decimal numbers with trailing zeros to the scale of the column. The following 
description is copied from the description of HIVE-12063.
   
   > HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
which caused many problems including treating 0.0, 0.00 and so on as 0, which 
has different precision/scale. Please refer to HIVE-7373 description. However, 
HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. 
HIVE-11835 was resolved recently to address one of the problems, where 0.0, 
0.00, and so on cannot be read into decimal(1,1).
    However, HIVE-11835 didn't address the problem of showing as 0 in query 
result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 
0 and 0.0 have different precision/scale than 0.
   The proposal here is to pad zeros for query result to the type's scale. This 
not only removes the confusion described above, but also aligns with many other 
DBs. Internal decimal number representation doesn't change, however.
   
   **Spark SQL**:
   ```sql
   // bin/spark-sql
   spark-sql> select cast(1 as decimal(38, 18));
   1
   spark-sql>
   
   // bin/beeline
   0: jdbc:hive2://localhost:10000/default> select cast(1 as decimal(38, 18));
   +----------------------------+--+
   | CAST(1 AS DECIMAL(38,18))  |
   +----------------------------+--+
   | 1.000000000000000000       |
   +----------------------------+--+
   
   // bin/spark-shell
   scala> spark.sql("select cast(1 as decimal(38, 18))").show(false)
   +-------------------------+
   |CAST(1 AS DECIMAL(38,18))|
   +-------------------------+
   |1.000000000000000000     |
   +-------------------------+
   
   // bin/pyspark
   >>> spark.sql("select cast(1 as decimal(38, 18))").show()
   +-------------------------+
   |CAST(1 AS DECIMAL(38,18))|
   +-------------------------+
   |     1.000000000000000000|
   +-------------------------+
   
   // bin/sparkR
   > showDF(sql("SELECT cast(1 as decimal(38, 18))"))
   +-------------------------+
   |CAST(1 AS DECIMAL(38,18))|
   +-------------------------+
   |     1.000000000000000000|
   +-------------------------+
   ```
   
   **PostgreSQL**:
   ```sql
   postgres=# select cast(1 as decimal(38, 18));
          numeric
   ----------------------
    1.000000000000000000
   (1 row)
   ```
   **Presto**:
   ```sql
   presto> select cast(1 as decimal(38, 18));
           _col0
   ----------------------
    1.000000000000000000
   (1 row)
   ```
   
   ## How was this patch tested?
   
   unit tests and manual test:
   ```sql
   spark-sql> select cast(1 as decimal(38, 18));
   1.000000000000000000
   ```
   Spark SQL Upgrading Guide:
   
![image](https://user-images.githubusercontent.com/5399861/69649620-4405c380-10a8-11ea-84b1-6ee675663b98.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum opened a new pull request #26697: [SPARK-28461][SQL] Pad Decimal numbers with trailing zeros to the scale of the column

Reply via email to