Github user maropu commented on the issue:
https://github.com/apache/spark/pull/20345
When I re-checked the code of the `ReorderJoin` rule, I found
`ExtractFiltersAndInnerJoins` was applied into a join tree multiple times. IIUC
we can use `OrderedJoin` to avoid this case though, any reason not to do so? I
just made [a trivial
patch](https://github.com/apache/spark/compare/master...maropu:GuardReAppliedReorder)
for that and checked the metrics for the rule;
```
scala> import org.apache.spark.sql.catalyst.rules.RuleExecutor
scala> :paste
RuleExecutor.resetMetrics()
val numJoins = 9
spark.range(1).selectExpr((0 until numJoins).map { i => s"id AS k$i" }:
_*).write.saveAsTable("t")
(0 until numJoins).foreach { i =>
spark.range(1).selectExpr(s"id AS k$i").write.saveAsTable(s"t$i")
}
val joinSql = s"""
SELECT *
FROM t, ${ (0 until numJoins).map(i => s"t$i").mkString(", ") }
WHERE ${(0 until numJoins).map(i => s"t.k$i = t$i.k$i").mkString(" AND
")}
"""
sql(joinSql).explain
println(RuleExecutor.dumpTimeSpent())
-- master
Rule Effective Time / Total
Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 97010505 / 126269245
2 / 26
-- w/ the patch
Rule Effective Time / Total
Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 20498471 / 34859643
2 / 26
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]