ankurdave commented on PR #36425:
URL: https://github.com/apache/spark/pull/36425#issuecomment-1116249393

   From talking to @sadikovi, it sounds like the use-after-free that caused 
this crash does in fact occur in the Python writer thread, not the main task 
thread. And since `RDD#isEmpty()` is implemented using `limit(1)`, this is a 
very similar situation as the one described in 
https://github.com/apache/spark/pull/34245. The main difference appears to be 
the presence of a group-by with codegen enabled.
   
   Given that, the question is why https://github.com/apache/spark/pull/34245 
was not sufficient to fix this. I'm guessing [the task completion listener that 
frees the off-heap 
memory](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L489)
 is being registered in the wrong order relative to the BasePythonRunner task 
completion listener.
   
   Anyway, even if that were fixed I think we would still need the fix in this 
PR for performance reasons. Otherwise the writer thread could read an arbitrary 
amount of data before checking the interrupt status.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to