Robert Joseph Evans created SPARK-40129:
-------------------------------------------

             Summary: Decimal multiply can produce the wrong answer because it 
rounds twice
                 Key: SPARK-40129
                 URL: https://issues.apache.org/jira/browse/SPARK-40129
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.0, 3.2.0, 3.4.0
            Reporter: Robert Joseph Evans


This looks like it has been around for a long time, but I have reproduced it in 
3.2.0+

The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), but 
I think it can be reproduced with other number combinations, and possibly with 
divide too.
{code:java}
Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as 
DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as 
DECIMAL(38,10))").show(truncate=false)
{code}
This produces an answer in Spark of {{-110083130231976019291714061058.575920}} 
But if I do the calculation in regular java BigDecimal I get 
{{-110083130231976019291714061058.575919}}
{code:java}
BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
BigDecimal r = new BigDecimal("-12.0000000000");
BigDecimal prod = l.multiply(r);
BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);
{code}
Spark does essentially all of the same operations, but it used Decimal to do it 
instead of java's BigDecimal directly. Spark, by way of Decimal, will set a 
MathContext for the multiply operation that has a max precision of 38 and will 
do half up rounding. That means that the result of the multiply operation in 
Spark is {{{}-110083130231976019291714061058.57591950{}}}, but for the java 
BigDecimal code the result is 
{{{}-110083130231976019291714061058.57591949560000000000{}}}. Then in 
CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression in 
3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that point 
the already rounded number is rounded yet again resulting in what is arguably a 
wrong answer by Spark.

I have not fully tested this, but it looks like we could just remove the 
MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal 
operations appear to have their own overflow and rounding anyways. If we want 
to potentially reduce the total memory usage, we could also set the max 
precision to 39 and truncate (round down) the result in the math context 
instead.  That would then let us round the result correctly in setPrecision 
afterwards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to