anishshri-db commented on PR #42504: URL: https://github.com/apache/spark/pull/42504#issuecomment-1684519240
@JoshRosen - I addressed your comments. I think your proposal seems fine too. At the time that we throw this TaskKilled exception, the context should be non-null and should have the reason set even though interruptThread is passed as false. The issue of whether the plugins will block on I/O remains. We should possibly move that within this `try` after the check for killIfInterrupted maybe ? But in any case, the issues we have seen have been with threads that actually run the task execution and get blocked on some network I/O (remote RPC or otherwise) and whose timeouts are effectively larger than the reaper timeout causing us to block task slots and force a executor JVM kill eventually. Those majority of cases should be handled by your proposed fix, I believe. Let me know what you think. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
