Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/18695
  
    Thanks @HyukjinKwon and @viirya!  I updated the description.  To sum up 
this issue, `_prepare_for_python_RDD` will pickle a command, which pickles any 
broadcast variables part of that command.  When a broadcast var is pickled, the 
`__reduce__` function adds itself to a common registry.  After pickling, 
`_prepare_for_python_RDD` will get all broadcast vars that were just pickled 
and use them for that command.  If multiple threads are both writing to the 
same pickled registry, then broadcast variables for different commands can be 
pickled together and added to the registry.  The first thread to make the call 
to get them from the registry will get them all.  Making the pickled registry 
thread local will give each thread it's own view to write and read from, so 
it's no longer a shared resource.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to