Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2624#issuecomment-57850587
> Either we can do something minimal to just clear the reference, so that
repeated sparkContext creation works from pySpark.
I'm not sure that there's an easy, minimal approach that's also correct,
though. The problem is that some threads obtain a SparkEnv by calling
`SparkEnv.get`, so these threads are prone to reading old ThreadLocals that
haven't been cleaned up. In order for an approach that clears ThreadLocals to
be safe, I think we'd need some way to ensure that _any_ thread that sets the
ThreadLocal eventually clears that ThreadLocal before it's re-used. I suppose
that we could audit all calls of `SparkEnv.set()` and add the equivalent of a
`try ... finally` to ensure that the ThreadLocal is eventually cleared. This
is starting to get complex, though, and I'm not sure that it's simpler than
simply removing the ThreadLocals for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]