[GitHub] spark pull request #23057: [SPARK-26078][SQL] Dedup self-join attributes on ...

viirya Sat, 17 Nov 2018 07:55:18 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23057#discussion_r234412635
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
    @@ -119,7 +139,7 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
               // (A.A1 = B.B1 OR ISNULL(A.A1 = B.B1)) AND (B.B2 = A.A2) AND 
B.B3 > 1
               val finalJoinCond = (nullAwareJoinConds ++ 
conditions).reduceLeft(And)
               // Deduplicate conflicting attributes if any.
    -          dedupJoin(Join(outerPlan, sub, LeftAnti, Option(finalJoinCond)))
    +          dedupJoin(Join(outerPlan, newSub, LeftAnti, 
Option(finalJoinCond)))
             case (p, predicate) =>
               val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
               Project(p.output, Filter(newCond.get, inputPlan))
    --- End diff --
    
    Can you try this test case?
    
    ```scala
    val df1 = spark.sql(
            """
              |SELECT id,num,source FROM (
              |  SELECT id, num, 'a' as source FROM a
              |  UNION ALL
              |  SELECT id, num, 'b' as source FROM b
              |) AS c WHERE c.id IN (SELECT id FROM b WHERE num = 2) OR
              |c.id IN (SELECT id FROM b WHERE num = 3)
            """.stripMargin)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #23057: [SPARK-26078][SQL] Dedup self-join attributes on ...

Reply via email to