[GitHub] [spark] cloud-fan commented on a diff in pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

GitBox Thu, 29 Dec 2022 20:12:34 -0800


cloud-fan commented on code in PR #38888:
URL: https://github.com/apache/spark/pull/38888#discussion_r1059235302



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2819,34 +2825,39 @@ class Analyzer(override val catalogManager: 
CatalogManager)
    * This rule finds aggregate expressions that are not in an aggregate 
operator.  For example,
    * those in a HAVING clause or ORDER BY clause.  These expressions are 
pushed down to the
    * underlying aggregate operator and then projected away after the original 
operator.
+   *
+   * We need to make sure the expressions all fully resolved before looking 
for aggregate functions
+   * and group by expressions from them.
    */
   object ResolveAggregateFunctions extends Rule[LogicalPlan] {
     def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
       _.containsPattern(AGGREGATE), ruleId) {
-      // Resolve aggregate with having clause to Filter(..., Aggregate()). 
Note, to avoid wrongly
-      // resolve the having condition expression, here we skip resolving it in 
ResolveReferences
-      // and transform it to Filter after aggregate is resolved. Basically 
columns in HAVING should
-      // be resolved with `agg.child.output` first. See more details in 
SPARK-31519.
-      case UnresolvedHaving(cond, agg: Aggregate) if agg.resolved =>
+      case UnresolvedHaving(cond, agg: Aggregate) if agg.resolved && 
cond.resolved =>
         resolveOperatorWithAggregate(Seq(cond), agg, (newExprs, newChild) => {
-          Filter(newExprs.head, newChild)
+          val newCond = newExprs.head
+          if (newCond.resolved) {
+            Filter(newCond, newChild)
+          } else {
+            // The condition can be unresolved after the resolution, as we may 
mark
+            // `TempResolvedColumn` as unresolved if it's not aggregate 
function inputs or grouping
+            // expressions. We should remain `UnresolvedHaving` as the rule 
`ResolveReferences` can
+            // re-resolve `TempResolvedColumn` and `UnresolvedHaving` has a 
special column
+            // resolution order.
+            UnresolvedHaving(newCond, newChild)
+          }
         })
 
-      case Filter(cond, agg: Aggregate) if agg.resolved =>
-        // We should resolve the references normally based on child 
(agg.output) first.
-        val maybeResolved = resolveExpressionByPlanOutput(cond, agg)
-        resolveOperatorWithAggregate(Seq(maybeResolved), agg, (newExprs, 
newChild) => {
+      case Filter(cond, agg: Aggregate) if agg.resolved && cond.resolved =>

Review Comment:
   Yea, and I added a comment to mention this: 
https://github.com/apache/spark/pull/38888/files#diff-ed19f376a63eba52eea59ca71f3355d4495fad4fad4db9a3324aade0d4986a47R2829



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

Reply via email to