Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18695
Thanks @HyukjinKwon and @viirya! I updated the description. To sum up
this issue, `_prepare_for_python_RDD` will pickle a command, which pickles any
broadcast variables part of that command. When a broadcast var is pickled, the
`__reduce__` function adds itself to a common registry. After pickling,
`_prepare_for_python_RDD` will get all broadcast vars that were just pickled
and use them for that command. If multiple threads are both writing to the
same pickled registry, then broadcast variables for different commands can be
pickled together and added to the registry. The first thread to make the call
to get them from the registry will get them all. Making the pickled registry
thread local will give each thread it's own view to write and read from, so
it's no longer a shared resource.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]