[GitHub] [spark] bersprockets opened a new pull request #35368: [SPARK-38075][SQL] Fix hasNext in HiveScriptTransformationExec's process output iterator

GitBox Sun, 30 Jan 2022 15:13:52 -0800


bersprockets opened a new pull request #35368:
URL: https://github.com/apache/spark/pull/35368



   ### What changes were proposed in this pull request?
   
   Fix hasNext in HiveScriptTransformationExec's process output iterator to 
always return false if it had previously returned false.
   
   ### Why are the changes needed?
   
   When hasNext on the process output iterator returns false, it leaves the 
iterator in a state (i.e., scriptOutputWritable is not null) such that the next 
call returns true.
   
   The Guava Ordering used in TakeOrderedAndProjectExec will call hasNext on 
the process output iterator even after an earlier call had returned false. This 
results in fake rows when script transform is used with `order by` and `limit`. 
For example:
   
   ```
   create or replace temp view t as
   select * from values
   (1),
   (2),
   (3)
   as t(a);
   
   select transform(a)
   USING 'cat' AS (a int)
   FROM t order by a limit 10;
   ```
   This returns:
   ```
   NULL
   NULL
   NULL
   1
   2
   3
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, other than removing the correctness issue.
   
   ### How was this patch tested?
   
   New unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bersprockets opened a new pull request #35368: [SPARK-38075][SQL] Fix hasNext in HiveScriptTransformationExec's process output iterator

Reply via email to