Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4869#issuecomment-77005785
  
    >  In this case, if the cleaner is in the middle of cleaning a broadcast, 
for instance, it will do so through SparkEnv.get.blockManager, which could be 
one that belongs to a different SparkContext.
    
    In the old code, the race could happen even if we weren't in the middle of 
a cleanup task when SparkContext was stopped; there's about a 100 millisecond 
window where this race can occur.  One potential race looks something like this:
    
    - The original SparkContext's ContextCleaner thread is blocked in a 
`referenceQueue.remove()` call.  This is called with a 100ms timeout, hence the 
100ms window for a race.
    - SparkContext.stop() is called on the original context
    - A new SparkContext is created
    - A job with begins running with the new SparkContext and creates new 
broadcast variables.  These broadcast variables' ids can overlap with the ones 
created by the old context, since broadcast ids are only unique within a 
SparkContext and not globally-unique or unique within a JVM.
    - The old ContextCleaner finally unblocks from the `referenceQueue.remove` 
call.  Because the old SparkContext was destroyed, the RDDs and broadcasts that 
it created may have become garbage-collected, which means that this 
`referenceQueue.remove` call might actually return an old broadcast variable 
cleanup task.
    - This cleanup task runs in `doCleanupBroadcast()`, which calls methods on 
the original SparkContext's components, namely 
`broadcastManager.unbroadcast(broadcastId, true, blocking)`.
    - Through a chain of calls, this ends up calling the static 
`TorrentBroadcast.unpersist()` method, which calls 
`SparkEnv.get.blockManager.master.removeBroadcast`, causing it to remove the 
broadcast's blocks from the _new_ SparkContext.  Recall that SparkEnv is 
(effectively) global.
    
    This was a really subtle race condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to