[GitHub] [spark] cloud-fan commented on a diff in pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

GitBox Tue, 06 Dec 2022 02:27:42 -0800


cloud-fan commented on code in PR #38888:
URL: https://github.com/apache/spark/pull/38888#discussion_r1040777568



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1591,12 +1620,129 @@ class Analyzer(override val catalogManager: 
CatalogManager)
               notMatchedBySourceActions = newNotMatchedBySourceActions)
         }
 
-      // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
-      case h: UnresolvedHaving => h
+      // For the 3 operators below, they can host grouping expressions and 
aggregate functions.
+      // We should resolve columns with `agg.output` and the rule 
`ResolveAggregateFunctions` will
+      // push them down to Aggregate later.
+      case u @ UnresolvedHaving(cond, agg: Aggregate) if !cond.resolved =>
+        u.mapExpressions { e =>
+          // Columns in HAVING should be resolved with `agg.child.output` 
first, to follow the SQL
+          // standard. See more details in SPARK-31519.
+          resolveExpressionByPlanOutput(resolveColWithAgg(e, agg), agg, 
allowOuter = true)
+        }
+      case f @ Filter(cond, agg: Aggregate) if !cond.resolved =>
+        f.mapExpressions { e =>
+          val resolvedNoOuter = resolveExpressionByPlanOutput(e, agg)
+          // Outer reference has lower priority than this. See the doc of 
`ResolveReferences`.
+          resolveOuterRef(resolveColWithAgg(resolvedNoOuter, agg))
+        }
+      case s @ Sort(orders, _, agg: Aggregate) if !orders.forall(_.resolved) =>
+        s.mapExpressions { e =>
+          val resolvedNoOuter = resolveExpressionByPlanOutput(e, agg)
+          // Outer reference has lower priority than this. See the doc of 
`ResolveReferences`.
+          resolveOuterRef(resolveColWithAgg(resolvedNoOuter, agg))
+        }
+
+      // For the 3 operators below, they can host missing attributes that are 
from descendant nodes.
+      // For example, `SELECT a FROM t ORDER BY b`. We can resolve `b` with 
table `t` even if there
+      // is a Project node between the table scan node and Sort node. We also 
need to propagate
+      // the missing attributes from the descendant node to the current node, 
and project them way
+      // at the end via an extra Project.
+      case s @ Sort(order, _, child) if !s.resolved || s.missingInput.nonEmpty 
=>
+        val resolvedNoOuter = order.map(resolveExpressionByPlanOutput(_, 
child))

Review Comment:
   I didn't use `resolveExpressionByPlanChildren` to follow the previous code: 
https://github.com/apache/spark/pull/38888/files#diff-ed19f376a63eba52eea59ca71f3355d4495fad4fad4db9a3324aade0d4986a47L1469
 , I'm not sure if it will make a difference but just want to be safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

Reply via email to