maropu commented on issue #27627: [WIP][SPARK-28067][SQL] Fix incorrect results 
for decimal aggregate sum by returning null on decimal overflow
URL: https://github.com/apache/spark/pull/27627#issuecomment-595108294
 
 
   For integral sum (e.g., int/long), overflow can happen in partial aggregate 
sides (via `Math.addExact`). We don't need to follow the behaviour in decimal 
sum, too, for consistency?
   ```
   scala> sql("SET spark.sql.ansi.enabled=true")
   res39: org.apache.spark.sql.DataFrame = [key: string, value: string]
   
   scala> spark.table("t").printSchema
   root
    |-- v: long (nullable = true)
   
   scala> sql("select * from t").show()
   +-------------------+
   |                  v|
   +-------------------+
   |9223372036854775807|
   |                  1|
   +-------------------+
   
   
   scala> sql("select sum(*) from t").show()
   // Throws an exception
   java.lang.ArithmeticException: long overflow
        at java.lang.Math.addExact(Math.java:809)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithoutKey_0$(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
        at ...
   
   scala> sql("SET spark.sql.ansi.enabled=false")
   scala> sql("select sum(*) from t").show()
   // Wrong result
   +--------------------+
   |              sum(v)|
   +--------------------+
   |-9223372036854775808|
   +--------------------+
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to