[GitHub] spark pull request: [SPARK-4423] Improve foreach() documentation t...

rxin Fri, 20 Feb 2015 00:14:22 -0800

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4696#discussion_r25054127
  
    --- Diff: docs/programming-guide.md ---
    @@ -728,6 +728,61 @@ def doStuff(self, rdd):
     
     </div>
     
    +### Understanding closures
    +One of the confusing things about Spark is understanding which variables 
and methods are within the closure of some executing code. Specifically, 
operations like `foreach()` may behave in an un-intuitive way. In our example, 
we look at `foreach()` but this same scenario will apply to any other RDD 
operations that modify variables outside of their scope. 
    --- End diff --
    
    probably better to word it as "It is important to understand the scope and 
life cycle of variables and methods ...", instead of saying it is unintuitive.
    
    Also overall I think you'd want to point out closures are always executed 
on executors and should not be used to mutate state, and state that the only 
exception is when running in local testing mode. If some global aggregation is 
needed, use an aggregator.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4423] Improve foreach() documentation t...

Reply via email to