wankunde commented on PR #42206: URL: https://github.com/apache/spark/pull/42206#issuecomment-1663191874
> It looks fine to me, except maybe check the code for left semi joins. > > I could not make the crash happen with left semi joins. I think the bug might actually exist in that code (within the same task, I see a call to processRows _after_ eager cleanup). However, it seems that for left semi joins, the optimizer moves the `Window` after the `Join` (that is, the windowing is performed on the joined result), so there is no X row to copy. > > By the way, there is a reason you see `processRows` called again even after `BufferedIterator.hasNext` returns false: `FileFormatWriter` calls `hasNext` to see if the iterator is empty. If it is, it instantiates an instance of `EmptyDirectoryDataWriter`, which also calls `hasNext`. Thanks for your review. Fix this issue for LeftSemi SMJ. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
