GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/20345

    [SPARK-23172][SQL] Respect Project nodes in ReorderJoin

    ## What changes were proposed in this pull request?
    The current `ReorderJoin` optimizer rule cannot flatten a pattern `Join -> 
Project -> Join` because `ExtractFiltersAndInnerJoins` doesn't handle `Project` 
nodes. So, the current master cannot reorder joins in a query below;
    ```
    val df1 = spark.range(100).selectExpr("id % 10 AS k0", s"id % 10 AS k1", 
s"id % 10 AS k2", "id AS v1")
    val df2 = spark.range(10).selectExpr("id AS k0", "id AS v2")
    val df3 = spark.range(10).selectExpr("id AS k1", "id AS v3")
    val df4 = spark.range(10).selectExpr("id AS k2", "id AS v4")
    df1.join(df2, "k0").join(df3, "k1").join(df4, "k2").explain(true)
    
    == Analyzed Logical Plan ==
    k2: bigint, k1: bigint, k0: bigint, v1: bigint, v2: bigint, v3: bigint, v4: 
bigint
    Project [k2#5L, k1#4L, k0#3L, v1#6L, v2#16L, v3#24L, v4#32L]
    +- Join Inner, (k2#5L = k2#31L)
       :- Project [k1#4L, k0#3L, k2#5L, v1#6L, v2#16L, v3#24L]
       :  +- Join Inner, (k1#4L = k1#23L)
       :     :- Project [k0#3L, k1#4L, k2#5L, v1#6L, v2#16L]
       :     :  +- Join Inner, (k0#3L = k0#15L)
       :     :     :- Project [(id#0L % cast(10 as bigint)) AS k0#3L, (id#0L % 
cast(10 as bigint)) AS k1#4L, (id#0L % cast(10 as bigint)) AS k2#5L, id#0
    L AS v1#6L]
       :     :     :  +- Range (0, 100, step=1, splits=Some(4))
       :     :     +- Project [id#12L AS k0#15L, id#12L AS v2#16L]
       :     :        +- Range (0, 10, step=1, splits=Some(4))
       :     +- Project [id#20L AS k1#23L, id#20L AS v3#24L]
       :        +- Range (0, 10, step=1, splits=Some(4))
       +- Project [id#28L AS k2#31L, id#28L AS v4#32L]
          +- Range (0, 10, step=1, splits=Some(4))
    ```
    To reorder the query, this pr added code to handle `Project` in 
`ExtractFiltersAndInnerJoins`.
    
    ## How was this patch tested?
    Added new tests in `JoinOptimizationSuite` and modified some existing tests
    in `JoinReorderSuite` and `StarJoinReorderSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark FixFlattenJoins

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20345
    
----
commit 8ad6a813818b34b3bdfd94f93c0a1f664945da34
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-01-18T09:42:46Z

    Fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to