ankurdave commented on PR #36425: URL: https://github.com/apache/spark/pull/36425#issuecomment-1116249393
From talking to @sadikovi, it sounds like the use-after-free that caused this crash does in fact occur in the Python writer thread, not the main task thread. And since `RDD#isEmpty()` is implemented using `limit(1)`, this is a very similar situation as the one described in https://github.com/apache/spark/pull/34245. The main difference appears to be the presence of a group-by with codegen enabled. Given that, the question is why https://github.com/apache/spark/pull/34245 was not sufficient to fix this. I'm guessing [the task completion listener that frees the off-heap memory](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L489) is being registered in the wrong order relative to the BasePythonRunner task completion listener. Anyway, even if that were fixed I think we would still need the fix in this PR for performance reasons. Otherwise the writer thread could read an arbitrary amount of data before checking the interrupt status. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
