Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13867#discussion_r76094136
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -912,19 +912,24 @@ class Analyzer(
* Resolve the correlated expressions in a subquery by using the an
outer plans' references. All
* resolved outer references are wrapped in an [[OuterReference]]
*/
- private def resolveOuterReferences(plan: LogicalPlan, outer:
LogicalPlan): LogicalPlan = {
+ private def resolveOuterReferences(
+ plan: LogicalPlan,
+ outers: Seq[LogicalPlan]): LogicalPlan = {
plan transformDown {
case q: LogicalPlan if q.childrenResolved && !q.resolved =>
q transformExpressions {
case u @ UnresolvedAttribute(nameParts) =>
withPosition(u) {
- try {
- outer.resolve(nameParts, resolver) match {
- case Some(outerAttr) => OuterReference(outerAttr)
- case None => u
- }
- } catch {
- case _: AnalysisException => u
+ val outer = outers.iterator
+ var expr = Option.empty[NamedExpression]
+ while (expr.isEmpty && outer.hasNext) {
+ expr = outer.next().resolve(nameParts, resolver)
+ }
+ expr match {
+ case Some(outerAttr) => OuterReference(outerAttr)
+ case None =>
--- End diff --
@hvanhovell The purpose of the resolveOuterReferences() is to perform
column resolution as early as possible using the entire set of available outer
tables. If a column of type UnresolvedAttribute cannot be resolved, the method
will throw an error. Otherwise, it will continue with the more general
semantics analysis in the resolveSubquery(). For example, the execution flow
for the following query with supported correlation
is the following:
select * from t1
where c1 IN ( select t2.c1 from t2
where t2.c2 IN (select concat(v1.c1, 'a') from (select t2.c1
as c1 from t3) v1))
1) The call to the execute() in resolveSubquery() fails to resolve the plan
because of the correlated reference t2.c21.
2) resolveOuterReference() resolve the correlation given the set of outer
tables, e.g. t2
3) subquery is passed back to execute(), which resolve dependencies on
correlation and applies other semantics checking e.g. concat(cast(c1#289 as
string), a)) etc.
For unsupported correlation, e.g.
select * from t1
where c1 IN ( select t2.c1 from t2
where t2.c2 IN (select concat(v1.c1, 'a') from (select t1.c1
as c1 from t3) v1))
the resolveOuterReference() will throw an exception in step 2 above.
So the semantics analysis in the two methods, resolveOuterReferences() and
resolveSubquery(), complement each other. The analysis in resolveOuterReference
simply attempts to resolve base column references as early as possible using
the entire set of outer plans. The analysis in resolveSubquery() does the
remaining type resolution and other semantics analysis.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]