[GitHub] spark pull request #22008: [SPARK-24928][SQL] Optimize cross join according ...

viirya Tue, 07 Aug 2018 14:06:43 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22008#discussion_r208385422
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
    @@ -152,3 +153,45 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
           if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
       }
     }
    +
    +/**
    + * Swaps right and left logical plans of a join when left is bigger than 
right. This is useful
    + * because underlying cartesian product performs a nested loop, thus if 
the outer table is
    + * smaller there are less iterator initialization.
    --- End diff --
    
    I have no idea that we have a good way so far to estimate the effort of 
materializing elements in one RDD, especially before we materialize it. That is 
why I think this optimization of swapping cross join doesn't always introduce 
improvement but sometimes regression. Let us see if others have more ideas.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22008: [SPARK-24928][SQL] Optimize cross join according ...

Reply via email to