[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

marmbrus Sat, 07 Jun 2014 12:08:53 -0700

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/837#issuecomment-45418467
  
    I think this is looking pretty good.  One problem is that there are no 
tests for the nested loop version.  I tried adding this to SQLQuerySuite:
    
    ```scala
      test("left semi greater than predicate") {
        checkAnswer(
          sql("SELECT * FROM testData2 x JOIN testData2 y WHERE x.a >= y.a + 
2"),
          Seq((3,1), (3,2))
        )
      }
    ```
    
    However this points out that we need to fix the other join strategies to 
avoid matching semi joins:
    ```scala
    [info] - left semi greater than predicate *** FAILED *** (174 milliseconds)
    [info]   Results do not match for query:
    ...
    [info] == Physical Plan ==
    [info] Project [a#18:0,b#19:1,a#20:2,b#21:3]
    [info]  Filter (a#18:0 >= (a#20:2 + 2))
    [info]   CartesianProduct 
    [info]    ExistingRdd [a#18,b#19], MapPartitionsRDD[4] at mapPartitions at 
basicOperators.scala:174
    [info]    ExistingRdd [a#20,b#21], MapPartitionsRDD[4] at mapPartitions at 
basicOperators.scala:174
    [info] 
    [info] == Results ==
    [info] !== Correct Answer - 2 ==   == Spark Answer - 4 ==
    [info] !Vector(3, 1)               [3,1,1,1]
    [info] !Vector(3, 2)               [3,1,1,2]
    [info] !                           [3,2,1,1]
    [info] !                           [3,2,1,2] (QueryTest.scala:54)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

Reply via email to