[GitHub] spark pull request: [SQL][SPARK-2212]HashJoin(Shuffled)

chenghao-intel Sun, 22 Jun 2014 19:13:24 -0700

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1147#issuecomment-46800787
  
    Thank you all for the comments, I will changed some of the code accordingly.
    This PR actually contains 2 relevant parts:
    - Code Re-factor for Join
      - Removed `FilteredOperation` from the patterns.scala, cause the 
filters(WHERE CONDITION & JOIN CONDITION) has been pushed down via the 
`PushPredicateThroughJoin` in logical.Optimizer.scala already. Discard the 
combination of filters(where and join condition) seems make the join pattern 
match more clean and simple.
      - Pattern matching order is actually very critical for the Join Operator 
Selection in SparkStrategies.scala, hence I merged the 3 Join Strategies into 1.
      - The trait `BinaryJoinNode`, which can be utilized by `HashJoin` / 
`SortMergeJoin`(will implement soon) / `CartesionProduct`(InnerJoin) / `MapSide 
Join` (Left/Inner/LeftSemi, assume the right table is the build table) for all 
of the join types; and if we want to add code gen for join condition, only we 
need to modify is the trait `BinaryJoinNode`.
    - Add Outer Join Support for HashJoin
      - With `BinaryJoinNode`, add hash based outer join support is easy.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL][SPARK-2212]HashJoin(Shuffled)

Reply via email to