Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53902169
It looks like Spark core automatically registers JVM shutdown hooks for
several different components, so it might okay to have the similar global logic
here.
One (perhaps minor) concern with the global hook is that the functions we
register will contain strong references to the SparkContext, which might lead
to resource leaks in a long-running Python process that creates and destroys
many contexts.
PySpark does not currently support running multiple SparkContexts at the
same time, so one option would be to define a single shutdown hook that stops
`SparkContext._active_spark_context`. There's currently a lock
(`SparkContext._lock`) guarding that field and I'm not sure whether it's safe
to attempt to acquire it during a shutdown hook (it's fine for shutdown hooks
to throw exceptions, but they shouldn't block). To guard against this, maybe
we can attempt to acquire the lock and just throw an exception after a short
timeout. This is a super-rare edge case, though, and I'd be shocked if anyone
ran into it, since it requires a separate thread attempting to start or stop a
SparkContext while the Python interpreter is exiting.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]