GitHub user huaxingao opened a pull request:
https://github.com/apache/spark/pull/19496
[SPARK-22271][SQL]mean overflows and returns null for some decimal variables
## What changes were proposed in this pull request?
In Average.scala, it has
```
override lazy val evaluateExpression = child.dataType match {
case DecimalType.Fixed(p, s) =>
// increase the precision and scale to prevent precision loss
val dt = DecimalType.bounded(p + 14, s + 4)
Cast(Cast(sum, dt) / Cast(count, dt), resultType)
case _ =>
Cast(sum, resultType) / Cast(count, resultType)
}
def setChild (newchild: Expression) = {
child = newchild
}
```
It is possible that Cast(count, dt), resultType) will make the precision
of the decimal number bigger than 38, and this causes over flow. Since count
is an integer and doesn't need a scale, I will cast it using
DecimalType.bounded(38,0)
## How was this patch tested?
In DataFrameSuite, I will add a test case.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/huaxingao/spark spark-22271
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19496.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19496
----
commit a3437ee4a87d1f51b362adeb20d4fcc264085ba7
Author: Huaxin Gao <[email protected]>
Date: 2017-10-14T04:45:27Z
[SPARK-22271][SQL]mean overflows and returns null for some decimal variables
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]