Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14719
@cloud-fan, I was studying the ResolveSubquery code for my work on
SPARK-17348. I was first puzzle about the code in `def rewriteSubQuery`
// Make sure the inner and the outer query attributes do not collide.
val outputSet = outer.map(_.outputSet).reduce(_ ++ _)
val duplicates = basePlan.outputSet.intersect(outputSet)
val (plan, deDuplicatedConditions) = if (duplicates.nonEmpty) {
val aliasMap = AttributeMap(duplicates.map { dup =>
dup -> Alias(dup, dup.toString)()
}.toSeq)
val aliasedExpressions = basePlan.output.map { ref =>
aliasMap.getOrElse(ref, ref)
}
val aliasedProjection = Project(aliasedExpressions, basePlan)
val aliasedConditions = baseConditions.map(_.transform {
case ref: Attribute => aliasMap.getOrElse(ref, ref).toAttribute
})
(aliasedProjection, aliasedConditions)
} else {
(basePlan, baseConditions)
}
// Remove outer references from the correlated predicates. We wait
with extracting
// these until collisions between the inner and outer query
attributes have been
// solved.
val conditions = deDuplicatedConditions.map(_.transform {
case OuterReference(ref) => ref
})
(plan, conditions)
}
Until I debugged a SQL that referenced the same table in both the outer
table and the table in the subquery that I realized I ran into a similar issue
like this one we are trying to fix. I think my proposal of generating a new
ExprId for each column will make this piece of code unnecessary.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]