Github user ioana-delaney commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13867#discussion_r76094136
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -912,19 +912,24 @@ class Analyzer(
          * Resolve the correlated expressions in a subquery by using the an 
outer plans' references. All
          * resolved outer references are wrapped in an [[OuterReference]]
          */
    -    private def resolveOuterReferences(plan: LogicalPlan, outer: 
LogicalPlan): LogicalPlan = {
    +    private def resolveOuterReferences(
    +        plan: LogicalPlan,
    +        outers: Seq[LogicalPlan]): LogicalPlan = {
           plan transformDown {
             case q: LogicalPlan if q.childrenResolved && !q.resolved =>
               q transformExpressions {
                 case u @ UnresolvedAttribute(nameParts) =>
                   withPosition(u) {
    -                try {
    -                  outer.resolve(nameParts, resolver) match {
    -                    case Some(outerAttr) => OuterReference(outerAttr)
    -                    case None => u
    -                  }
    -                } catch {
    -                  case _: AnalysisException => u
    +                val outer = outers.iterator
    +                var expr = Option.empty[NamedExpression]
    +                while (expr.isEmpty && outer.hasNext) {
    +                  expr = outer.next().resolve(nameParts, resolver)
    +                }
    +                expr match {
    +                  case Some(outerAttr) => OuterReference(outerAttr)
    +                  case None =>
    --- End diff --
    
    @hvanhovell  The purpose of the resolveOuterReferences() is to perform 
column resolution as early as possible using the entire set of available outer 
tables. If a column of type UnresolvedAttribute cannot be resolved, the method 
will throw an error. Otherwise, it will continue with the more general 
semantics analysis in the resolveSubquery(). For example, the execution flow 
for the following query with supported correlation 
    is the following:
    
    select * from t1
     where c1 IN ( select t2.c1 from t2
                  where t2.c2 IN (select concat(v1.c1, 'a') from (select t2.c1 
as c1 from t3) v1))
    
    1) The call to the execute() in resolveSubquery() fails to resolve the plan 
because of the correlated reference t2.c21.
    2) resolveOuterReference() resolve the correlation given the set of outer 
tables, e.g. t2
    3) subquery is passed back to execute(), which resolve dependencies on 
correlation and applies other semantics checking e.g. concat(cast(c1#289 as 
string), a)) etc.
    
    For unsupported correlation, e.g.
    
    select * from t1
     where c1 IN ( select t2.c1 from t2
                  where t2.c2 IN (select concat(v1.c1, 'a') from (select t1.c1 
as c1 from t3) v1))
    
    the resolveOuterReference() will throw an exception in step 2 above.
    
    So the semantics analysis in the two methods, resolveOuterReferences() and 
resolveSubquery(), complement each other. The analysis in resolveOuterReference 
simply attempts to resolve base column references as early as possible using 
the entire set of outer plans. The analysis in resolveSubquery() does the 
remaining type resolution and other semantics analysis.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to