Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/7904#issuecomment-127473413
Current plan is to have separate iterators for left, right and full outer
join, with some possible code-reuse / sharing of the iterators defined in the
HashOuterJoin trait (I'll move them elsewhere). The key idea here is that once
you've constructed the buffer for half of the left outer join then it doesn't
really matter whether that buffer came from a hash map or was built up by
scanning over the other sorted input. This should substantially reduce code
complexity and will make it easier to spot the functionality which is only used
for full outer join.
I'll work on implementing this design tomorrow.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]