GitHub user ilganeli opened a pull request:

    https://github.com/apache/spark/pull/4696

    [SPARK-4423] Improve foreach() documentation to avoid confusion between 
local- and cluster-mode behavior

    Hi all - I've added a writeup on how closures work within Spark to help 
clarify the general case for this problem and similar problems. I hope this 
addresses the issue and would love any feedback. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ilganeli/spark SPARK-4423

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4696.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4696
    
----
commit c768ab2ce4b371058daee1e676f3855b55a97b41
Author: Ilya Ganelin <[email protected]>
Date:   2015-02-12T15:20:38Z

    Updated documentation to add a section on closures. This helps understand 
confusing behavior of foreach and map functions when attempting to modify 
variables outside of the scope of an RDD action or transformation

commit 26006688bfc7b4b80bb77d4fd1113bf33aab474b
Author: Ilya Ganelin <[email protected]>
Date:   2015-02-12T15:55:10Z

    Minor edits

commit d374d3a8e1086ae315dd8c1ca3fcc0ff3c105fcc
Author: Ilya Ganelin <[email protected]>
Date:   2015-02-17T23:55:13Z

    Merge remote-tracking branch 'upstream/master' into SPARK-4423

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to