Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/189#issuecomment-39604918
Hey Will, I'd actually go even further on this one and *clone* the closure
by serializing it when sc.clean() is called. That way it captures exactly the
values of the variables it had when it was passed to Spark, not potentially
changed values later.
For example, consider something like this:
```
val rdd = sc.parallelize(1 to 10)
val data = Array(1, 2, 3)
val mapped = rdd.map(x => data(0))
data(0) = 4
mapped.first
```
Under the current version of Spark, as well as with this patch, this prints
out "4", even though we called map() when Array(0) was 1. Is this the behavior
we want?
I can see this being too big a change for some programs, in which case we
could leave it to just check for serializability in 1.0, and make this change
later if it takes some further consideration. But it's worth thinking about.
CCing @pwendell, @joshrosen, @rxin, @aarondav.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---