Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/4696#discussion_r25054127
--- Diff: docs/programming-guide.md ---
@@ -728,6 +728,61 @@ def doStuff(self, rdd):
</div>
+### Understanding closures
+One of the confusing things about Spark is understanding which variables
and methods are within the closure of some executing code. Specifically,
operations like `foreach()` may behave in an un-intuitive way. In our example,
we look at `foreach()` but this same scenario will apply to any other RDD
operations that modify variables outside of their scope.
--- End diff --
probably better to word it as "It is important to understand the scope and
life cycle of variables and methods ...", instead of saying it is unintuitive.
Also overall I think you'd want to point out closures are always executed
on executors and should not be used to mutate state, and state that the only
exception is when running in local testing mode. If some global aggregation is
needed, use an aggregator.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]