HyukjinKwon opened a new pull request #32619:
URL: https://github.com/apache/spark/pull/32619
### What changes were proposed in this pull request?
This PR proposes to avoid wrapping if-else to the constant literals for
`percentage` and `accuracy` in `percentile_approx`. They are expected to be
literals (or foldable expressions).
Pivot works by two phrase aggregations, and it works with manipulating the
input to `null` for non-matched values (pivot column and value).
Note that pivot supports an optimized version without such logic with
changing input to `null` for some types (non-nested types basically). So the
issue fixed by this PR is only for complex types.
```scala
val df = Seq(
("a", -1.0), ("a", 5.5), ("a", 2.5), ("b", 3.0), ("b", 5.2)).toDF("type",
"value")
.groupBy().pivot("type", Seq("a", "b")).agg(
percentile_approx(col("value"), array(lit(0.5)), lit(10000)))
df.show()
```
**Before:**
```
org.apache.spark.sql.AnalysisException: cannot resolve
'percentile_approx((IF((type <=> CAST('a' AS STRING)), value, CAST(NULL AS
DOUBLE))), (IF((type <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((type
<=> CAST('a' AS STRING)), 10000, CAST(NULL AS INT))))' due to data type
mismatch: The accuracy or percentage provided must be a constant literal;
'Aggregate [percentile_approx(if ((type#7 <=> cast(a as string))) value#8
else cast(null as double), if ((type#7 <=> cast(a as string))) array(0.5) else
cast(null as array<double>), if ((type#7 <=> cast(a as string))) 10000 else
cast(null as int), 0, 0) AS a#16, percentile_approx(if ((type#7 <=> cast(b as
string))) value#8 else cast(null as double), if ((type#7 <=> cast(b as
string))) array(0.5) else cast(null as array<double>), if ((type#7 <=> cast(b
as string))) 10000 else cast(null as int), 0, 0) AS b#18]
+- Project [_1#2 AS type#7, _2#3 AS value#8]
+- LocalRelation [_1#2, _2#3]
```
**After:**
```
+-----+-----+
| a| b|
+-----+-----+
|[2.5]|[3.0]|
+-----+-----+
```
### Why are the changes needed?
To make percentile_approx work with pivot as expected
### Does this PR introduce _any_ user-facing change?
Yes. It threw an exception but now it returns a correct result as shown
above.
### How was this patch tested?
Manually tested and unit test was added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]