GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/21910
[SPARK-24957][SQL] Average with decimal followed by aggregation returns
wrong result
## What changes were proposed in this pull request?
When we do an average, the result is computed dividing the sum of the
values by their count. In the case the result is a DecimalType, the way we are
casting/managing the precision and scale is not really optimized and it is not
coherent with what we do normally.
In particular, a problem can happen when the `Divide` operand returns a
result which contains a precision and scale different by the ones which are
expected as output of the `Divide` operand. In the case reported in the JIRA,
for instance, the result of the `Divide` operand is a `Decimal(38, 36)`, while
the output data type for `Divide` is 38, 22. This is not an issue when the
`Divide` is followed by a `CheckOverflow` or a `Cast` to the right data type,
as these operations return a decimal with the defined precision and scale.
Despite in the `Average` operator we do have a `Cast`, this may be bypassed if
the result of `Divide` is the same type which it is casted to, hence the issue
reported in the JIRA may arise.
The PR proposes to use the normal rules/handling of the arithmetic
operators with Decimal data type, so we both reuse the existing code (having a
single logic for operations between decimals) and we fix this problem as the
result is always guarded by `CheckOverflow`.
## How was this patch tested?
added UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-24957
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21910.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21910
----
commit 1a1252d1db0af0485b98c9bca0d442c9235bd2a0
Author: Marco Gaido <marcogaido91@...>
Date: 2018-07-29T10:53:09Z
[SPARK-24957][SQL] Average with decimal followed by aggregation returns
wrong result
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]