GitHub user ilganeli opened a pull request:
https://github.com/apache/spark/pull/4696
[SPARK-4423] Improve foreach() documentation to avoid confusion between
local- and cluster-mode behavior
Hi all - I've added a writeup on how closures work within Spark to help
clarify the general case for this problem and similar problems. I hope this
addresses the issue and would love any feedback.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ilganeli/spark SPARK-4423
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4696.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4696
----
commit c768ab2ce4b371058daee1e676f3855b55a97b41
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-12T15:20:38Z
Updated documentation to add a section on closures. This helps understand
confusing behavior of foreach and map functions when attempting to modify
variables outside of the scope of an RDD action or transformation
commit 26006688bfc7b4b80bb77d4fd1113bf33aab474b
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-12T15:55:10Z
Minor edits
commit d374d3a8e1086ae315dd8c1ca3fcc0ff3c105fcc
Author: Ilya Ganelin <[email protected]>
Date: 2015-02-17T23:55:13Z
Merge remote-tracking branch 'upstream/master' into SPARK-4423
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]