cloud-fan commented on issue #27627: [WIP][SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow URL: https://github.com/apache/spark/pull/27627#issuecomment-597440834 @skambha great analysis! I agree with you that we need another boolean flag in the sum aggregate buffer, but I'd like to make it simpler and only change it for decimals. How about we add a new expression `DecimalSum`? In which: 1. the buffer attributes are [sum, isEmpty] 2. initial value is [0, true] 3. the `updateExpression` should do: 3.1 update `isEmpty` to false 3.2 set `sum` to null if overflowed 3.3 do nothing if `sum` is already null. 4. the `mergeExpression` should do: 4.1 update `isEmpty` to false 4.2 if the input buffer's `isEmpty` is true, keep sum unchanged 4.3 if the input buffer's `isEmpty` is false, but `sum` is null, update its own `sum` to null 4.4 do nothing if `sum` is already null. 4.5 otherwise, add input buffer's `sum` 5. the `evaluateExpression` should do: 5.1 output null if `isEmpty` is true 5.2 fail if `sum` is null and ansi mode is on 5.3 otherwise, output the sum.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
