[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

maropu Tue, 13 Mar 2018 02:25:02 -0700

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/20345
  
    When I re-checked the code of the `ReorderJoin` rule, I found 
`ExtractFiltersAndInnerJoins` was applied into a join tree multiple times. IIUC 
we can use `OrderedJoin` to avoid this case though, any reason not to do so? I 
just made [a trivial 
patch](https://github.com/apache/spark/compare/master...maropu:GuardReAppliedReorder)
 for that and checked the metrics for the rule;
    ```
    scala> import org.apache.spark.sql.catalyst.rules.RuleExecutor
    scala> :paste
    RuleExecutor.resetMetrics()
    val numJoins = 9
    spark.range(1).selectExpr((0 until numJoins).map { i => s"id AS k$i" }: 
_*).write.saveAsTable("t")
    (0 until numJoins).foreach { i =>
      spark.range(1).selectExpr(s"id AS k$i").write.saveAsTable(s"t$i")
    }
    val joinSql = s"""
      SELECT *
        FROM t, ${ (0 until numJoins).map(i => s"t$i").mkString(", ") }
        WHERE ${(0 until numJoins).map(i => s"t.k$i = t$i.k$i").mkString(" AND 
")}
    """
    sql(joinSql).explain
    println(RuleExecutor.dumpTimeSpent())
    
    -- master
    Rule                                                 Effective Time / Total 
Time  Effective Runs / Total Runs    
    org.apache.spark.sql.catalyst.optimizer.ReorderJoin  97010505 / 126269245   
      2 / 26  
    
    -- w/ the patch
    Rule                                                 Effective Time / Total 
Time  Effective Runs / Total Runs    
    org.apache.spark.sql.catalyst.optimizer.ReorderJoin  20498471 / 34859643    
      2 / 26 
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

Reply via email to