xuanyuanking opened a new pull request #28294:
URL: https://github.com/apache/spark/pull/28294


   ### What changes were proposed in this pull request?
   Add a new logical node AggregateWithHaving, and the parser should create 
this plan for HAVING. The analyzer resolves it to Filter(..., Aggregate(...)).
   
   ### Why are the changes needed?
   The SQL parser in Spark creates Filter(..., Aggregate(...)) for the HAVING 
query, and Spark has a special analyzer rule ResolveAggregateFunctions to 
resolve the aggregate functions and grouping columns in the Filter operator.
   
   It works for simple cases in a very tricky way as it relies on rule 
execution order:
   1. Rule ResolveReferences hits the Aggregate operator and resolves 
attributes inside aggregate functions, but the function itself is still 
unresolved as it's an UnresolvedFunction. This stops resolving the Filter 
operator as the child Aggrege operator is still unresolved.
   2. Rule ResolveFunctions resolves UnresolvedFunction. This makes the Aggrege 
operator resolved.
   3. Rule ResolveAggregateFunctions resolves the Filter operator if its child 
is a resolved Aggregate. This rule can correctly resolve the grouping columns.
   
   In the example query, I put a CAST, which needs to be resolved by rule 
ResolveTimeZone, which runs after ResolveAggregateFunctions. This breaks step 3 
as the Aggregate operator is unresolved at that time. Then the analyzer starts 
next round and the Filter operator is resolved by ResolveReferences, which 
wrongly resolves the grouping columns.
   
   See the demo below:
   ```
   SELECT SUM(a) AS b, '2020-01-01' AS fake FROM VALUES (1, 10), (2, 20) AS 
T(a, b) GROUP BY b HAVING b > 10
   ```
   The query's result is
   ```
   +---+----------+
   |  b|      fake|
   +---+----------+
   |  2|2020-01-01|
   +---+----------+
   ```
   But if we add CAST, it will return an empty result.
   ```
   SELECT SUM(a) AS b, CAST('2020-01-01' AS DATE) AS fake FROM VALUES (1, 10), 
(2, 20) AS T(a, b) GROUP BY b HAVING b > 10
   ```
   
   ### Does this PR introduce any user-facing change?
   Yes, bug fix for cast in having aggregate expressions.
   
   
   ### How was this patch tested?
   New UT added.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to