Victsm commented on issue #25888: [SPARK-21492] [SQL] Fix memory leak issue in SMJ URL: https://github.com/apache/spark/pull/25888#issuecomment-536175743 A bit more context on the 2 corner cases that were fixed in the recent commits: The first corner case is when two SMJs inner joins are stacked together in a task like the following and they both drain the right table iterator but not the left table iterator. SMJ (inner) / \ Sort SMJ (inner) / \ Sort Sort When this happens, the generated code for the top SMJ inner join might invoke hasNext() twice on the iterator of the bottom SMJ inner join when no more records can be retrieved. The first time this happens, the iterators of both the bottom SMJ inner join's left and right child will be freed up. The second time this happens, it could lead to NPE. The second corner case is when one of a SMJ inner join's child operator is not codegened, e.g. a SMJ inner join on top of a SMJ left semi join: SMJ (inner) / \ Sort SMJ (leftsemi) In this case, when the top SMJ inner join finishes the join and attempts to release the resources of both iterators of its child operator, in the previous version of this PR it would attempt to cast the iterators of both children as ScalaIteratorWithBufferedIterator. However, since the right child operator is not codegened, the casting would fail.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
