GitHub user nsyca opened a pull request:

    https://github.com/apache/spark/pull/16467

    [SPARK-19017][SQL] NOT IN subquery with more than one column may return 
incorrect results

    ## What changes were proposed in this pull request?
    
    This PR fixes the code in Optimizer phase where the NULL-aware expression 
of a NOT IN query is expanded in Rule `RewritePredicateSubquery`.
    
    Example:
    The query
    
     select a1,b1
     from   t1
     where  (a1,b1) not in (select a2,b2
                            from   t2);
    
    has the (a1, b1) = (a2, b2) rewritten from (before this fix):
    
    Join LeftAnti, ((isnull((_1#2 = a2#16)) || isnull((_2#3 = b2#17))) || 
((_1#2 = a2#16) && (_2#3 = b2#17)))
    
    to (after this fix):
    
    Join LeftAnti, (((_1#2 = a2#16) || isnull((_1#2 = a2#16))) && ((_2#3 = 
b2#17) || isnull((_2#3 = b2#17))))
    
    ## How was this patch tested?
    
    sql/test, catalyst/test and new test cases in SQLQueryTestSuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nsyca/spark 19017

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16467
    
----
commit b98865127a39bde885f9b1680cfe608629d59d51
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-07-29T21:43:56Z

    [SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results
    
    ## What changes were proposed in this pull request?
    
    This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.
    
    ## How was this patch tested?
    ./dev/run-tests
    a new unit test on the problematic pattern.

commit 069ed8f8e5f14dca7a15701945d42fc27fe82f3c
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-07-29T21:50:02Z

    [SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results
    
    ## What changes were proposed in this pull request?
    
    This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.
    
    ## How was this patch tested?
    ./dev/run-tests
    a new unit test on the problematic pattern.

commit edca333c081e6d4e53a91b496fba4a3ef4ee89ac
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-07-30T00:28:15Z

    New positive test cases

commit 64184fdb77c1a305bb2932e82582da28bb4c0e53
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-01T13:20:09Z

    Fix unit test case failure

commit 29f82b05c9e40e7934397257c674b260a8e8a996
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-05T17:42:01Z

    blocking TABLESAMPLE

commit ac43ab47907a1ccd6d22f920415fbb4de93d4720
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-05T21:10:19Z

    Fixing code styling

commit 631d396031e8bf627eb1f4872a4d3a17c144536c
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-07T18:39:44Z

    Correcting Scala test style

commit 7eb9b2dbba3633a1958e38e0019e3ce816300514
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-08T02:31:09Z

    One (last) attempt to correct the Scala style tests

commit 1387cf51541408ac20048064fa5e559836af932c
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-12T20:11:50Z

    Merge remote-tracking branch 'upstream/master'

commit 3faa2d5edc030495f8b870d2c017cb714c17b6a7
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-12-14T16:35:52Z

    Merge remote-tracking branch 'upstream/master'

commit a30863457ef49f99aff001b1987da75093c20f86
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-12-30T17:18:18Z

    Merge remote-tracking branch 'upstream/master'

commit 473c81bacda2b12e6b85fe3f609ba334460bf0fe
Author: Nattavut Sutyanyong <[email protected]>
Date:   2017-01-01T16:15:07Z

    first try on the fix

commit 278ebaea9ab52bc141e85e578416203107d38eda
Author: Nattavut Sutyanyong <[email protected]>
Date:   2017-01-03T22:07:35Z

    add/update test cases

commit f1524b99aff70e688e4763db7898da53286a321e
Author: Nattavut Sutyanyong <[email protected]>
Date:   2017-01-03T22:08:03Z

    Merge remote-tracking branch 'upstream/master'

commit 9e1b29e99f33a5f78f1edca80495ab33b2389d2a
Author: Nattavut Sutyanyong <[email protected]>
Date:   2017-01-03T22:09:26Z

    Merge branch 'master' into 19017

commit de655d0d00693a2bc98fddad7be6f55fb2690555
Author: Nattavut Sutyanyong <[email protected]>
Date:   2017-01-04T01:26:45Z

    Add descriptive comment

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to