cloud-fan commented on code in PR #55606:
URL: https://github.com/apache/spark/pull/55606#discussion_r3173711754
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1868,7 +1868,14 @@ class Analyzer(
// Only Project, Aggregate, CollectMetrics can host star
expressions.
case u @ (_: Project | _: Aggregate | _: CollectMetrics) =>
Try(s.expand(u.children.head, resolver)) match {
- case Success(expanded) => expanded.map(wrapOuterReference)
+ case Success(expanded) =>
+ expanded.map {
Review Comment:
Design: this `expand` helper is shared with the function-arg /
`CreateStruct` / `CreateArray` / `In` / `Murmur3Hash` / `XxHash64` callers
further down (lines 1969-1995). Only the project-list path
(`buildExpandedProjectList`) exposes `Project.output`, which is where the
ExprId leak you're fixing actually surfaces. The other callers don't need an
`Alias` wrap — and as the `selectExcept.sql.out` and `ResolveSubquerySuite`
diffs show, wrapping them changes the auto-generated column name (the inner `AS
c1, AS c2` gets baked into the surrounding alias's `toPrettySQL` before
`CleanupAliases` runs).
Could the wrapping be moved to `buildExpandedProjectList` (and similar
project-list call sites) so function-arg expansions keep bare
`OuterReference`s? That would scope the change to where the leak actually
exists and avoid the column-name regression.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1868,7 +1868,14 @@ class Analyzer(
// Only Project, Aggregate, CollectMetrics can host star
expressions.
case u @ (_: Project | _: Aggregate | _: CollectMetrics) =>
Try(s.expand(u.children.head, resolver)) match {
- case Success(expanded) => expanded.map(wrapOuterReference)
+ case Success(expanded) =>
+ expanded.map {
+ case alias: Alias =>
+
alias.withNewChildren(Seq(wrapOuterReference(alias.child)))
+ .asInstanceOf[Alias]
Review Comment:
Minor: `wrapOuterReference[E <: Expression](e: E): E` recursively transforms
`Attribute` nodes and preserves the rest of the tree, so applying it to the
whole alias produces the same `Alias(GetStructField(OuterReference(attr), i),
name)` and avoids the `withNewChildren` + cast dance.
```suggestion
wrapOuterReference(alias)
```
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1868,7 +1868,14 @@ class Analyzer(
// Only Project, Aggregate, CollectMetrics can host star
expressions.
case u @ (_: Project | _: Aggregate | _: CollectMetrics) =>
Try(s.expand(u.children.head, resolver)) match {
- case Success(expanded) => expanded.map(wrapOuterReference)
+ case Success(expanded) =>
+ expanded.map {
+ case alias: Alias =>
+
alias.withNewChildren(Seq(wrapOuterReference(alias.child)))
+ .asInstanceOf[Alias]
+ case e =>
+ Alias(wrapOuterReference(e), toPrettySQL(e))()
Review Comment:
Nit: the inner `case e =>` shadows the outer `case e: AnalysisException`
from the surrounding catch. It still works (the outer `e` is referenced from
`case Failure(_) => throw e` at a different scope) but reads confusingly.
```suggestion
case other =>
Alias(wrapOuterReference(other),
toPrettySQL(other))()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]