Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/189#issuecomment-39604918
  
    Hey Will, I'd actually go even further on this one and *clone* the closure 
by serializing it when sc.clean() is called. That way it captures exactly the 
values of the variables it had when it was passed to Spark, not potentially 
changed values later.
    
    For example, consider something like this:
    ```
    val rdd = sc.parallelize(1 to 10)
    val data = Array(1, 2, 3)
    val mapped = rdd.map(x => data(0))
    data(0) = 4
    mapped.first
    ```
    
    Under the current version of Spark, as well as with this patch, this prints 
out "4", even though we called map() when Array(0) was 1. Is this the behavior 
we want?
    
    I can see this being too big a change for some programs, in which case we 
could leave it to just check for serializability in 1.0, and make this change 
later if it takes some further consideration. But it's worth thinking about. 
CCing @pwendell, @joshrosen, @rxin, @aarondav.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to