Github user jsoltren commented on the issue:

    https://github.com/apache/spark/pull/17694
  
    This seems reasonable to me.
    
    There are some typos in the PR description. I think you meant "pickled" 
instead of "picked" in a few places.
    
    Using threading.Lock seems okay here from my admittedly limited 
understanding of the deep details of Python, and my reading of 
https://docs.python.org/2/library/threading.html#lock-objects.
    
    @vundela and I chatted off thread about this some. The precise race is 
this: the call to _wrap_function will define a number of broadcast variables. 
In the time between when the _wrap_function call finishes and 
self.ctx._jvm.PythonRDD executes, the RDD itself can be modified, perhaps 
changing broadcast variables and introducing the "Broadcast variable '%s' not 
loaded!" exception.
    
    My understanding is that, due to the Global Interpreter Lock, this lock 
will cause all other execution to cease while this block of code runs, 
implicitly preventing any races. This is a very coarse grained lock for this 
action but it is as good as we can get. (Someone please correct me if I’m 
wrong here.)
    
    It would be good if the PR description captured some of the above 
discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to