[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

marmbrus Tue, 08 Jul 2014 13:40:40 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1134#issuecomment-48396637
  
    I think there are major questions that will need to be answered before we 
could merge this PR:
     - Is skew just a hint instead of a join type and how do we propagate that 
information through?
     - @chenghao-intel asks a valid question about join keys.  I'm not sure how 
this could work without them.
     - I think the current implementation of execute() is going to suffer from 
serious performance issues.  It does many passes over the data, does a lot of 
unnecessary string manipulation and computes several Cartesian products.  You 
will need to run some performance experiments with large datasets in order to 
show that this operator actually has benefits.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

Reply via email to