jchen5 opened a new pull request, #44401:
URL: https://github.com/apache/spark/pull/44401

   ### What changes were proposed in this pull request?
   Subqueries with correlation under LIMIT with OFFSET have a correctness bug, 
introduced recently when support for correlation under OFFSET was enabled but 
were not handled correctly. (So we went from unsupported, query throws error -> 
wrong results.) This is in master branch, not yet released.
   
   This PR first disables correlated OFFSET. Next PR will add support for it 
and re-enable it.
   
   It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS
   
   Example repro:
   
   ```
   create table x(x1 int, x2 int);
   insert into x values (1, 1), (2, 2);
   create table y(y1 int, y2 int);
   insert into y values (1, 1), (1, 2), (2, 4);
   
   select * from x where exists (select * from y where x1 = y1 limit 1 offset 2)
   ```
   
   Correct result: empty set
   Spark result: Array([2,2])
   
   ### Why are the changes needed?
   Correctness bug
   
   ### Does this PR introduce _any_ user-facing change?
   Disables correlated OFFSET query shape which was not handled correctly. 
(This was enabled on master branch but not yet released.)
   
   ### How was this patch tested?
   Add tests
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to