GitHub user nsyca opened a pull request:

    https://github.com/apache/spark/pull/15936

    [SPARK-18504][SQL] Scalar subquery with extra group by columns returning 
incorrect result

    ## What changes were proposed in this pull request?
    
    This PR blocks an incorrect result scenario in scalar subquery where there 
are GROUP BY column(s)
    that are not part of the correlated predicate(s).
    
    Example:
    // Incorrect result
    Seq(1).toDF("c1").createOrReplaceTempView("t1")
    Seq((1,1),(1,2)).toDF("c1","c2").createOrReplaceTempView("t2")
    sql("select (select sum(-1) from t2 where t1.c1=t2.c1 group by t2.c2) from 
t1").show
    
    // How can selecting a scalar subquery from a 1-row table return 2 rows?
    
    ## How was this patch tested?
    sql/test, catalyst/test
    new test case covering the reported problem is added to SubquerySuite.scala

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nsyca/spark scalarSubqueryIncorrect-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15936.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15936
    
----
commit b98865127a39bde885f9b1680cfe608629d59d51
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-07-29T21:43:56Z

    [SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results
    
    ## What changes were proposed in this pull request?
    
    This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.
    
    ## How was this patch tested?
    ./dev/run-tests
    a new unit test on the problematic pattern.

commit 069ed8f8e5f14dca7a15701945d42fc27fe82f3c
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-07-29T21:50:02Z

    [SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results
    
    ## What changes were proposed in this pull request?
    
    This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.
    
    ## How was this patch tested?
    ./dev/run-tests
    a new unit test on the problematic pattern.

commit edca333c081e6d4e53a91b496fba4a3ef4ee89ac
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-07-30T00:28:15Z

    New positive test cases

commit 64184fdb77c1a305bb2932e82582da28bb4c0e53
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-01T13:20:09Z

    Fix unit test case failure

commit 29f82b05c9e40e7934397257c674b260a8e8a996
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-05T17:42:01Z

    blocking TABLESAMPLE

commit ac43ab47907a1ccd6d22f920415fbb4de93d4720
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-05T21:10:19Z

    Fixing code styling

commit 631d396031e8bf627eb1f4872a4d3a17c144536c
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-07T18:39:44Z

    Correcting Scala test style

commit 7eb9b2dbba3633a1958e38e0019e3ce816300514
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-08T02:31:09Z

    One (last) attempt to correct the Scala style tests

commit 1387cf51541408ac20048064fa5e559836af932c
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-08-12T20:11:50Z

    Merge remote-tracking branch 'upstream/master'

commit 6d9bade4df8954987078c479274d90a7612cc772
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-04T03:51:54Z

    Merge remote-tracking branch 'upstream/master'

commit baf0e6084a838ce2d72eeeac9d7618ae4536ffb6
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-04T03:54:24Z

    First version of code+test cases

commit 4e6d99b92cd0908856371569479debc72e03703c
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-04T16:14:03Z

    Address rxin's comment: inline the call to report Analyzer exception

commit 9a1f80b12cdc9857f4b906688f8691a2db502fa5
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-04T16:14:25Z

    Merge remote-tracking branch 'upstream/master'

commit 217c0e955b55d10dd462e077d67097704ab86f61
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-04T16:14:41Z

    Merge branch 'master' into spark-17348

commit 3fe9429c009eb156ac89ef6732e9230d583ed5d0
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-05T00:52:45Z

    Merge remote-tracking branch 'upstream/master'

commit 1c1864caa764130f947be9ccd2b132d4ac75ec2d
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-05T00:56:34Z

    Merge with master

commit 0757b8134316f8b5c87ef1c023966304228a0eeb
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-11T16:31:36Z

    Merge remote-tracking branch 'upstream/master'

commit 89bb31c10314cca2473568716649f80f5e28781f
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-11T16:31:55Z

    Merge branch 'master' into spark-17348

commit 35b77f0ca477bf6427e18588c4514a3f0209f426
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-12T03:36:14Z

    Merge remote-tracking branch 'upstream/master'

commit c63b8c627cb13253b3776aec57b8a73d685d7bd1
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-14T15:29:09Z

    Merge remote-tracking branch 'upstream/master'

commit f3351d5aba8b5b52f5e1b12a8e068e0d4a4ece08
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-18T17:03:23Z

    Merge remote-tracking branch 'upstream/master'

commit 61880e42a5ca9be84b488ff05cc98cc74a31ba9f
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-18T17:03:48Z

    Merge branch 'master' into scalarSubqueryIncorrect-1

commit b05353ebe55cf8fee4d3f2e10f291a662c43909e
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-18T21:15:48Z

    Fix scalar subquery bug (SPARK-18504)

commit 9fc5c3305f7c23593b1ef93a43fd266b2d5bed5a
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-18T22:31:30Z

    Merge remote-tracking branch 'upstream/master'

commit 452f5d08c9017c2f08455c48bd76bf14c2f1d5fe
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-18T22:31:49Z

    Merge branch 'master' into scalarSubqueryIncorrect-1

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to