revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1030661428
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with
BeforeAndAfter {
val a = AttributeReference("a", DecimalType(3, -10))()
val b = AttributeReference("b", DecimalType(1, -1))()
val c = AttributeReference("c", DecimalType(35, 1))()
- checkType(Multiply(a, b), DecimalType(5, -11))
- checkType(Multiply(a, c), DecimalType(38, -9))
- checkType(Multiply(b, c), DecimalType(37, 0))
+ checkType(Multiply(a, b), DecimalType(16, 0))
Review Comment:
Sorry about the long explanation. I am not sure how to make it any shorter.
By definition a decimal multiply will add the scales.
https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html there is no
special case for negative scale vs positive scale. It adds them.
```
scala> val lhs = BigDecimal("123E10")
lhs: scala.math.BigDecimal = 1.23E+12
scala> lhs.scale
res0: Int = -10
scala> val rhs = BigDecimal("9E1")
rhs: scala.math.BigDecimal = 9E+1
scala> rhs.scale
res1: Int = -1
scala> val ret = rhs * lhs
ret: scala.math.BigDecimal = 1.107E+14
scala> ret.scale
res2: Int = -11
```
By definition the resulting precision of a multiply can at most be
`lhs.precision + rhs.precision + 1`. Again there is no call out for negative
scale or positive scale. This is in the SQL standard and also how decimal math
works. The problem is that you are resetting the values to have a scale of 0,
and effectively changing the precision of the result. This does not follow the
standard rules for decimal operations. You are effectively increasing the
precision of the value. You get an equivalent answer, but the cost of storing
the result is now much greater. So much so that many results will not fit and
result in an overflow when they could have fit if the result had a negative
scale.
I am -1 on this change as is. It is wrong to modify multiply in this way.
Divide is technically wrong in a number of ways with negative scale decimal
values even before 3.4.0. The scale of a divide is LHS.scale - RHS.scale. This
includes negative scale values. There is technically a great deal of scale loss
when doing a decimal divide. Spark follows Hive and most other SQL
implementations by having at least a scale of 6 in the output of the divide,
and some complicated math to calculate the output precision that goes with it.
But that does not deal with negative values cleanly. It would be really nice to
know what Hive does in these cases, or what MsSQL does, or really anything that
supports negative scale. Do they all return errors? What type do they return if
it is not an error? It appears that the SQL spec itself has the bug in it, and
I am not inclined to "fix" the spec without some understanding of what others
are doing too.
The reason `IntegralDivide` is failing now, where it didn't before, is
because the output result was never checked for overflow before being converted
into a Long. `IntegralDivide` returns a Long. The overflow check in 3.4.0 that
happens after the divide is totally skipped in previous versions because the
output is a long. Look at the rule in DecimalPrecision
https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L192-L202
If the output type is not a decimal do nothing. If you want to have
`IntegralDivide` act the same as before you need to just remove the overflow
check in it. I am not sure if that is a good thing or not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]