Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/1134#issuecomment-48396637
I think there are major questions that will need to be answered before we
could merge this PR:
- Is skew just a hint instead of a join type and how do we propagate that
information through?
- @chenghao-intel asks a valid question about join keys. I'm not sure how
this could work without them.
- I think the current implementation of execute() is going to suffer from
serious performance issues. It does many passes over the data, does a lot of
unnecessary string manipulation and computes several Cartesian products. You
will need to run some performance experiments with large datasets in order to
show that this operator actually has benefits.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---