mridulm commented on PR #42426: URL: https://github.com/apache/spark/pull/42426#issuecomment-1685199364
To make sure I understand correctly - there is an OOM which is thrown, which happens to be within `initiateRetry` and so shuffle fetch stalled ? I have not looked in detail at whether this is possible, but given this is an OOM and can be thrown at anywhere, would require careful analysis if we are trying to mitigate this. In meantime, you can simply run with `-XX:OnOutOfMemoryError` to kill the executor in case of OOM if this is blocking you ? This is what Spark on Yarn does (see `YarnSparkHadoopUtil.addOutOfMemoryErrorArgument`) - looks like this is not done in other resource managers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
