[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

nsyca Thu, 10 Nov 2016 07:00:46 -0800

Github user nsyca commented on the issue:

    https://github.com/apache/spark/pull/14719
  
    @cloud-fan, I was studying the ResolveSubquery code for my work on 
SPARK-17348. I was first puzzle about the code in `def rewriteSubQuery`
    
          // Make sure the inner and the outer query attributes do not collide.
          val outputSet = outer.map(_.outputSet).reduce(_ ++ _)
          val duplicates = basePlan.outputSet.intersect(outputSet)
          val (plan, deDuplicatedConditions) = if (duplicates.nonEmpty) {
            val aliasMap = AttributeMap(duplicates.map { dup =>
              dup -> Alias(dup, dup.toString)()
            }.toSeq)
            val aliasedExpressions = basePlan.output.map { ref =>
              aliasMap.getOrElse(ref, ref)
            }      
            val aliasedProjection = Project(aliasedExpressions, basePlan)
            val aliasedConditions = baseConditions.map(_.transform { 
              case ref: Attribute => aliasMap.getOrElse(ref, ref).toAttribute
            })     
            (aliasedProjection, aliasedConditions)
          } else {
            (basePlan, baseConditions)
          }      
          // Remove outer references from the correlated predicates. We wait 
with extracting
          // these until collisions between the inner and outer query 
attributes have been
          // solved.
          val conditions = deDuplicatedConditions.map(_.transform {
            case OuterReference(ref) => ref
          })
          (plan, conditions)
        }
    
    Until I debugged a SQL that referenced the same table in both the outer 
table and the table in the subquery that I realized I ran into a similar issue 
like this one we are trying to fix. I think my proposal of generating a new 
ExprId for each column will make this piece of code unnecessary.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

Reply via email to