[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

rxin Wed, 10 Feb 2016 14:15:13 -0800

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11106#issuecomment-182603275
  
    I haven't actually looked at your pull request, but I'm fairly sure the 
implementation is wrong given the number of lines involved. The actual change 
is probably much larger to implement intersect all.
    
    Intersect all is actually not just a join. It is multisect intersect, e.g.
    
    [1, 2, 2] intersect [1, 2] == [1, 2]
    [1, 2, 2] intersect_all [1, 2] == [1, 2]
    [1, 2, 2] intersect_all [1, 2, 2] == [1, 2, 2]
    
    i.e. in order to support intersect all, we'd need to count the number of 
times each row appears.
    
    same thing with except all.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13225] [SQL] Support Intersect All/Dist...

Reply via email to