bersprockets opened a new pull request, #36072:
URL: https://github.com/apache/spark/pull/36072

   ### What changes were proposed in this pull request?
   
   Add checks in `ResolveFunctions#validateFunction` to ensure the following 
about each aggregate filter:
   
   - has a datatype of boolean
   - doesn't contain an aggregate expression
   - doesn't contain a window expression
   
   `ExtractGenerator` already handles the case of a generator in an aggregate 
filter.
   
   ### Why are the changes needed?
   
   There are three cases where a query with an aggregate filter produces 
non-helpful error messages.
   
   1) Window expression in aggregate filter
   
   ```
   select sum(a) filter (where nth_value(a, 2) over (order by b) > 1)
   from (select 1 a, '2' b);
   ```
   The above query should produce an analysis error, but instead produces a 
stack overflow.
   
   With this PR, the query will instead produce
   ```
   org.apache.spark.sql.AnalysisException: FILTER expression contains window 
function. It cannot be used in an aggregate function; line 1 pos 7
   ```
   
   2) Non-boolean filter expression
   
   ```
   select sum(a) filter (where a) from (select 1 a, '2' b);
   ```
   This query should produce an analysis error, but instead causes a projection 
compilation error or whole-stage codegen error (depending on the datatype of 
the expression).
   
   With this PR, the query will instead produce
   ```
   org.apache.spark.sql.AnalysisException: FILTER expression is not of type 
boolean. It cannot be used in an aggregate function; line 1 pos 7
   ```
   
   3) Aggregate expression in filter expression
   
   ```
   select max(b) filter (where max(a) > 1) from (select 1 a, '2' b);
   ```
   The above query should produce an analysis error, but instead causes a 
projection compilation error or whole-stage codegen error (depending on the 
datatype of the expression being aggregated).
   
   With this PR, the query will instead produce
   ```
   org.apache.spark.sql.AnalysisException: FILTER expression contains 
aggregate. It cannot be used in an aggregate function; line 1 pos 7
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, except in error conditions.
   
   
   ### How was this patch tested?
   
   New unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to