cloud-fan commented on code in PR #38888:
URL: https://github.com/apache/spark/pull/38888#discussion_r1059235302
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2819,34 +2825,39 @@ class Analyzer(override val catalogManager:
CatalogManager)
* This rule finds aggregate expressions that are not in an aggregate
operator. For example,
* those in a HAVING clause or ORDER BY clause. These expressions are
pushed down to the
* underlying aggregate operator and then projected away after the original
operator.
+ *
+ * We need to make sure the expressions all fully resolved before looking
for aggregate functions
+ * and group by expressions from them.
*/
object ResolveAggregateFunctions extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan =
plan.resolveOperatorsUpWithPruning(
_.containsPattern(AGGREGATE), ruleId) {
- // Resolve aggregate with having clause to Filter(..., Aggregate()).
Note, to avoid wrongly
- // resolve the having condition expression, here we skip resolving it in
ResolveReferences
- // and transform it to Filter after aggregate is resolved. Basically
columns in HAVING should
- // be resolved with `agg.child.output` first. See more details in
SPARK-31519.
- case UnresolvedHaving(cond, agg: Aggregate) if agg.resolved =>
+ case UnresolvedHaving(cond, agg: Aggregate) if agg.resolved &&
cond.resolved =>
resolveOperatorWithAggregate(Seq(cond), agg, (newExprs, newChild) => {
- Filter(newExprs.head, newChild)
+ val newCond = newExprs.head
+ if (newCond.resolved) {
+ Filter(newCond, newChild)
+ } else {
+ // The condition can be unresolved after the resolution, as we may
mark
+ // `TempResolvedColumn` as unresolved if it's not aggregate
function inputs or grouping
+ // expressions. We should remain `UnresolvedHaving` as the rule
`ResolveReferences` can
+ // re-resolve `TempResolvedColumn` and `UnresolvedHaving` has a
special column
+ // resolution order.
+ UnresolvedHaving(newCond, newChild)
+ }
})
- case Filter(cond, agg: Aggregate) if agg.resolved =>
- // We should resolve the references normally based on child
(agg.output) first.
- val maybeResolved = resolveExpressionByPlanOutput(cond, agg)
- resolveOperatorWithAggregate(Seq(maybeResolved), agg, (newExprs,
newChild) => {
+ case Filter(cond, agg: Aggregate) if agg.resolved && cond.resolved =>
Review Comment:
Yea, and I added a comment to mention this:
https://github.com/apache/spark/pull/38888/files#diff-ed19f376a63eba52eea59ca71f3355d4495fad4fad4db9a3324aade0d4986a47R2829
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]