[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31319: [SPARK-34212][SQL] Fix incorrect decimal reading from Parquet files

GitBox Mon, 25 Jan 2021 22:43:34 -0800


dongjoon-hyun edited a comment on pull request #31319:
URL: https://github.com/apache/spark/pull/31319#issuecomment-767337183



   Thanks, @cloud-fan .
   1. For vectorized code path, (if we do), we need **adjust** during building 
columnar vectors. It will cause performance regression like time-related 
rebasing.
   2. For the concerns, I replied here 
(https://github.com/apache/spark/pull/31319#discussion_r564264345).
       The basic idea is that Spark's decimal object should be used as an 
object. To interpret its `longVal`,  the user should use its precision and 
scale instead of upper layer's schema assumption. Inside Apache Spark, I don't 
think we have such a code in the non-vectorized code part, but I'm not sure 
about the 3rd party library or some downstream. Do you experience some 
conflicts due to the users' assumption?
   ```scala
   final class Decimal extends Ordered[Decimal] with Serializable {
     import org.apache.spark.sql.types.Decimal._
   
     private var decimalVal: BigDecimal = null
     private var longVal: Long = 0L
     private var _precision: Int = 1
     private var _scale: Int = 0
   
     def precision: Int = _precision
     def scale: Int = _scale
   ```
   
   3. As a side-note, the AS-IS Spark implementation is holding a wrong value 
in the decimal object. We cannot recover from those wrong values. I believe 
this PR is better because we can recover after this PR in the upper-layer by 
fixing upper-layer's usage or users assumptions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31319: [SPARK-34212][SQL] Fix incorrect decimal reading from Parquet files

Reply via email to