[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

chenghao-intel Sun, 25 Oct 2015 22:08:54 -0700

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8652#discussion_r42960365
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanner.scala ---
    @@ -44,8 +44,7 @@ class SparkPlanner(val sqlContext: SQLContext) extends 
SparkStrategies {
           EquiJoinSelection ::
           InMemoryScans ::
           BasicOperators ::
    -      CartesianProduct ::
    -      BroadcastNestedLoopJoin :: Nil)
    +      CartesianProduct :: Nil)
    --- End diff --
    
    `BroadcastNestedLoopJoin` actually supports equi-join, thus we never run 
into this case, as we have more optimal solution for it in previous rules.
    
    After a double think, I am a little hesitate to combine the rules in 
`CartesianProduct` and `BroadcastNestedLoopJoin`, as the later is supposed to 
be the last gate for JOIN, and works for all kinds of JOIN type w/ or w/o join 
condition, the others can be considered as the optimization compared to it.
    
    I am going to revert the code change if @yhuai is not strongly opposite to 
it. Or we can refactor the JOIN strategy after this PR been merged.
    
    What do you think @yhuai ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

Reply via email to