Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57850587
  
    > Either we can do something minimal to just clear the reference, so that 
repeated sparkContext creation works from pySpark.
    
    I'm not sure that there's an easy, minimal approach that's also correct, 
though.  The problem is that some threads obtain a SparkEnv by calling 
`SparkEnv.get`, so these threads are prone to reading old ThreadLocals that 
haven't been cleaned up.  In order for an approach that clears ThreadLocals to 
be safe, I think we'd need some way to ensure that _any_ thread that sets the 
ThreadLocal eventually clears that ThreadLocal before it's re-used.  I suppose 
that we could audit all calls of `SparkEnv.set()` and add the equivalent of a 
`try ... finally` to ensure that the ThreadLocal is eventually cleared.  This 
is starting to get complex, though, and I'm not sure that it's simpler than 
simply removing the ThreadLocals for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to