mridulm commented on PR #42426:
URL: https://github.com/apache/spark/pull/42426#issuecomment-1685199364

   To make sure I understand correctly - there is an OOM which is thrown, which 
happens to be within `initiateRetry` and so shuffle fetch stalled ? I have not 
looked in detail at whether this is possible, but given this is an OOM and can 
be thrown at anywhere, would require careful analysis if we are trying to 
mitigate this.
   
   In meantime, you can simply run with `-XX:OnOutOfMemoryError` to kill the 
executor in case of OOM if this is blocking you ?
   This is what Spark on Yarn does (see 
`YarnSparkHadoopUtil.addOutOfMemoryErrorArgument`) - looks like this is not 
done in other resource managers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to