HyukjinKwon opened a new pull request #32619:
URL: https://github.com/apache/spark/pull/32619


   ### What changes were proposed in this pull request?
   
   This PR proposes to avoid wrapping if-else to the constant literals for 
`percentage` and `accuracy` in `percentile_approx`. They are expected to be 
literals (or foldable expressions).
   
   Pivot works by two phrase aggregations, and it works with manipulating the 
input to `null` for non-matched values (pivot column and value).
   
   Note that pivot supports an optimized version without such logic with 
changing input to `null` for some types (non-nested types basically). So the 
issue fixed by this PR is only for complex types.
   
   ```scala
   val df = Seq(
     ("a", -1.0), ("a", 5.5), ("a", 2.5), ("b", 3.0), ("b", 5.2)).toDF("type", 
"value")
     .groupBy().pivot("type", Seq("a", "b")).agg(
       percentile_approx(col("value"), array(lit(0.5)), lit(10000)))
   df.show()
   ```
   
   **Before:**
   
   ```
   org.apache.spark.sql.AnalysisException: cannot resolve 
'percentile_approx((IF((type <=> CAST('a' AS STRING)), value, CAST(NULL AS 
DOUBLE))), (IF((type <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((type 
<=> CAST('a' AS STRING)), 10000, CAST(NULL AS INT))))' due to data type 
mismatch: The accuracy or percentage provided must be a constant literal;
   'Aggregate [percentile_approx(if ((type#7 <=> cast(a as string))) value#8 
else cast(null as double), if ((type#7 <=> cast(a as string))) array(0.5) else 
cast(null as array<double>), if ((type#7 <=> cast(a as string))) 10000 else 
cast(null as int), 0, 0) AS a#16, percentile_approx(if ((type#7 <=> cast(b as 
string))) value#8 else cast(null as double), if ((type#7 <=> cast(b as 
string))) array(0.5) else cast(null as array<double>), if ((type#7 <=> cast(b 
as string))) 10000 else cast(null as int), 0, 0) AS b#18]
   +- Project [_1#2 AS type#7, _2#3 AS value#8]
      +- LocalRelation [_1#2, _2#3]
   ```
   
   **After:**
   
   ```
   +-----+-----+
   |    a|    b|
   +-----+-----+
   |[2.5]|[3.0]|
   +-----+-----+
   ```
   
   ### Why are the changes needed?
   
   To make percentile_approx work with pivot as expected
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. It threw an exception but now it returns a correct result as shown 
above.
   
   ### How was this patch tested?
   
   Manually tested and unit test was added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to