[ 
https://issues.apache.org/jira/browse/SPARK-46446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46446:
-----------------------------------

    Assignee: Jack Chen

> Correctness bug in correlated subquery with OFFSET
> --------------------------------------------------
>
>                 Key: SPARK-46446
>                 URL: https://issues.apache.org/jira/browse/SPARK-46446
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Jack Chen
>            Assignee: Jack Chen
>            Priority: Major
>              Labels: pull-request-available
>
> Subqueries with correlation under LIMIT with OFFSET have a correctness bug, 
> introduced recently when support for correlation under OFFSET was enabled but 
> were not handled correctly. (So we went from unsupported, query throws error 
> -> wrong results.)
> It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS
>  
> It's easy to repro with a query like
> {code:java}
> create table x(x1 int, x2 int);
> insert into x values (1, 1), (2, 2);
> create table y(y1 int, y2 int);
> insert into y values (1, 1), (1, 2), (2, 4);
> select * from x where exists (select * from y where x1 = y1 limit 1 offset 
> 2){code}
> Correct result: empty set, see postgres: 
> [https://www.db-fiddle.com/f/dtXNn7hwDnemiCTUhvwgYM/0] 
> Spark result: Array([2,2])
>  
> The 
> [PR|https://github.com/apache/spark/pull/43111/files/324a106611e6d62c31535cfc43863fdaa16e5dda#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR1403]
>  where it was introduced added a test for it, but the golden file results for 
> the test actually were incorrect and we didn't notice. (The bug was initially 
> found by https://github.com/apache/spark/pull/44084)
> I'll work on both:
>  * Adding support for offset in DecorrelateInnerQuery (the transformation is 
> into a filter on row_number window function, similar to limit).
>  * Adding a feature flag to enable/disable offset in subquery support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to