[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

GitBox Wed, 23 Nov 2022 08:34:49 -0800


revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1030661428



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with 
BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   Sorry about the long explanation. I am not sure how to make it any shorter. 
   
   By definition a decimal multiply will add the scales. 
https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html there is no 
special case for negative scale vs positive scale. It adds them.
   
   ```
   scala> val lhs = BigDecimal("123E10")
   lhs: scala.math.BigDecimal = 1.23E+12
   
   scala> lhs.scale
   res0: Int = -10
   
   scala> val rhs = BigDecimal("9E1")
   rhs: scala.math.BigDecimal = 9E+1
   
   scala> rhs.scale
   res1: Int = -1
   
   scala> val ret = rhs * lhs
   ret: scala.math.BigDecimal = 1.107E+14
   
   scala> ret.scale
   res2: Int = -11
   ```
   
   By definition the resulting precision of a multiply can at most be 
`lhs.precision + rhs.precision + 1`. Again there is no call out for negative 
scale or positive scale. This is in the SQL standard and also how decimal math 
works. The problem is that you are resetting the values to have a scale of 0, 
and effectively changing the precision of the result. This does not follow the 
standard rules for decimal operations. You are effectively increasing the 
precision of the value. You get an equivalent answer, but the cost of storing 
the result is now much greater. So much so that many results will not fit and 
result in an overflow when they could have fit if the result had a negative 
scale.
   
   I am -1 on this change as is.  It is wrong to modify multiply in this way.  
   
   Divide is technically wrong in a number of ways with negative scale decimal 
values even before 3.4.0. The scale of a divide is LHS.scale - RHS.scale. This 
includes negative scale values. There is technically a great deal of scale loss 
when doing a decimal divide. Spark follows Hive and most other SQL 
implementations by having at least a scale of 6 in the output of the divide, 
and some complicated math to calculate the output precision that goes with it. 
But that does not deal with negative values cleanly. It would be really nice to 
know what Hive does in these cases, or what MsSQL does, or really anything that 
supports negative scale. Do they all return errors? What type do they return if 
it is not an error? It appears that the SQL spec itself has the bug in it, and 
I am not inclined to "fix" the spec without some understanding of what others 
are doing too.
   
   The reason `IntegralDivide` is failing now, where it didn't before, is 
because the output result was never checked for overflow before being converted 
into a Long.  `IntegralDivide` returns a Long. The overflow check in 3.4.0 that 
happens after the divide is totally skipped in previous versions because the 
output is a long.  Look at the rule in DecimalPrecision
   
   
https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L192-L202
   
   If the output type is not a decimal do nothing.  If you want to have 
`IntegralDivide` act the same as before you need to just remove the overflow 
check in it. I am not sure if that is a good thing or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Reply via email to