dongjoon-hyun edited a comment on pull request #31319: URL: https://github.com/apache/spark/pull/31319#issuecomment-767337183
Thanks, @cloud-fan . 1. For vectorized code path, (if we do), we need **adjust** during building columnar vectors. It will cause performance regression like time-related rebasing. 2. For the concerns, I replied here (https://github.com/apache/spark/pull/31319#discussion_r564264345). The basic idea is that Spark's decimal object should be used as an object. To interpret its `longVal`, the user should use its precision and scale instead of upper layer's schema assumption. Inside Apache Spark, I don't think we have such a code in the non-vectorized code part, but I'm not sure about the 3rd party library or some downstream. Do you experience some conflicts due to the users' assumption? ```scala final class Decimal extends Ordered[Decimal] with Serializable { import org.apache.spark.sql.types.Decimal._ private var decimalVal: BigDecimal = null private var longVal: Long = 0L private var _precision: Int = 1 private var _scale: Int = 0 def precision: Int = _precision def scale: Int = _scale ``` 3. As a side-note, the AS-IS Spark implementation is holding a wrong value in the decimal object. We cannot recover from those wrong values. I believe this PR is better because we can recover after this PR in the upper-layer by fixing upper-layer's usage or users assumptions. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
