Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22008#discussion_r208385422
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -152,3 +153,45 @@ object EliminateOuterJoin extends Rule[LogicalPlan]
with PredicateHelper {
if (j.joinType == newJoinType) f else Filter(condition,
j.copy(joinType = newJoinType))
}
}
+
+/**
+ * Swaps right and left logical plans of a join when left is bigger than
right. This is useful
+ * because underlying cartesian product performs a nested loop, thus if
the outer table is
+ * smaller there are less iterator initialization.
--- End diff --
I have no idea that we have a good way so far to estimate the effort of
materializing elements in one RDD, especially before we materialize it. That is
why I think this optimization of swapping cross join doesn't always introduce
improvement but sometimes regression. Let us see if others have more ideas.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]