skambha commented on issue #27627: [WIP][SPARK-28067][SQL] Fix incorrect 
results for decimal aggregate sum by returning null on decimal overflow
URL: https://github.com/apache/spark/pull/27627#issuecomment-594110207
 
 
   > > In case of the whole stage codegen, you can see the decimal ‘+’ 
expressions will at some point be larger than what can be contained in dec 
38,18 but it gets written out as null. This messes up the end result of the sum 
and you get wrong results.
   > 
   > Still a bit confused. If it gets written out as null, then null + decimal 
always return null and the final result is null?
   
   So if we look into the aggregate Sum, we have **coalesce** in  
updateExpressions and mergeExpressions, so it is not purely only a null + 
decimal only expression.   For e.g, in updateExpressions if the intermediate 
sum becomes null because of overflow, then in the next iteration of the 
updateExpressions, coalesce will be used to make the sum become 0 for that and 
the sum will be updated to 0 + decimal.   
   
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L76
   ```
    override lazy val updateExpressions: Seq[Expression]
   ...
           /* sum = */
           coalesce(coalesce(sum, zero) + child.cast(sumDataType), sum)
   ```
   Please let me know if this helps. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to