shrirangmhalgi opened a new pull request, #56056:
URL: https://github.com/apache/spark/pull/56056
### What changes were proposed in this pull request?
Fix `DetectAmbiguousSelfJoin` to not flag column references as ambiguous
when the root plan is a Project on top of a self-join with a foldable join
condition `(e.g., df.join(df, df("col") === 0).select(df("col")))`.
When the join condition compares a column to a literal, it doesn't matter
which side the column comes from - both sides have identical data. The
ambiguity check was incorrectly rejecting this pattern.
### Why are the changes needed?
`df.join(df, df("col") === 0).select(df("col"))` throws `AnalysisException:
Column are ambiguous with the regular resolver when
spark.sql.analyzer.failAmbiguousSelfJoin is true`. The single-pass resolver
handles this correctly. This inconsistency breaks multi-layer self-join
patterns that work fine with the single-pass resolver.
### Does this PR introduce _any_ user-facing change?
Yes. Self-join queries with foldable conditions followed by select no longer
throw a false ambiguity error.
### Design approach
First attempted a broader fix: skip ambiguity check for any column reference
whose `exprId` matches the plan's output
`(outputExprIds.contains(attr.exprId))`. This was too permissive - broke 4
existing tests (SPARK-28344: fail ambiguous self join - column ref in Project,
join three tables, SPARK-33071, SPARK-35454: join four tables) because it
suppressed legitimate ambiguity errors where the user genuinely needs to alias.
Narrowed to the specific case: only skip when the root plan is a Project
directly on top of a `self-join (leftId == rightId)` with a foldable join
condition. This correctly targets the false positive without affecting real
ambiguity detection.
### How was this patch tested?
Added a test in `DataFrameSelfJoinSuite` verifying single-layer and
multi-layer self-joins with foldable conditions. All 23 existing self-join
tests are passing.
### Was this patch authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
patch, please include the
phrase: 'Generated-by: ' followed by the name of the tool and its version.
If no, write 'No'.
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
-->
Yes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]