Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/1147#issuecomment-46800787
Thank you all for the comments, I will changed some of the code accordingly.
This PR actually contains 2 relevant parts:
- Code Re-factor for Join
- Removed `FilteredOperation` from the patterns.scala, cause the
filters(WHERE CONDITION & JOIN CONDITION) has been pushed down via the
`PushPredicateThroughJoin` in logical.Optimizer.scala already. Discard the
combination of filters(where and join condition) seems make the join pattern
match more clean and simple.
- Pattern matching order is actually very critical for the Join Operator
Selection in SparkStrategies.scala, hence I merged the 3 Join Strategies into 1.
- The trait `BinaryJoinNode`, which can be utilized by `HashJoin` /
`SortMergeJoin`(will implement soon) / `CartesionProduct`(InnerJoin) / `MapSide
Join` (Left/Inner/LeftSemi, assume the right table is the build table) for all
of the join types; and if we want to add code gen for join condition, only we
need to modify is the trait `BinaryJoinNode`.
- Add Outer Join Support for HashJoin
- With `BinaryJoinNode`, add hash based outer join support is easy.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---