[jira] [Updated] (SPARK-12760) inaccurate description for difference between local vs cluster mode in closure handling

2016-01-23 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12760:
--
Assignee: Mortada Mehyar

> inaccurate description for difference between local vs cluster mode in 
> closure handling
> ---
>
> Key: SPARK-12760
> URL: https://issues.apache.org/jira/browse/SPARK-12760
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Mortada Mehyar
>Assignee: Mortada Mehyar
>Priority: Minor
>
> In the spark documentation there's an example for illustrating how `local` 
> and `cluster` mode can differ 
> http://spark.apache.org/docs/latest/programming-guide.html#example
> " In local mode with a single JVM, the above code will sum the values within 
> the RDD and store it in counter. This is because both the RDD and the 
> variable counter are in the same memory space on the driver node." 
> However the above doesn't seem to be true. Even in `local` mode it seems like 
> the counter value should still be 0, because the variable will be summed up 
> in the executor memory space, but the final value in the driver memory space 
> is still 0. I tested this snippet and verified that in `local` mode the value 
> is indeed still 0. 
> Is the doc wrong or perhaps I'm missing something the doc is trying to say? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12760) inaccurate description for difference between local vs cluster mode in closure handling

2016-01-12 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12760:
--
  Priority: Minor  (was: Trivial)
Issue Type: Bug  (was: Question)
   Summary: inaccurate description for difference between local vs cluster 
mode in closure handling  (was: inaccurate description for difference between 
local vs cluster mode )

I think the example needs an update, but not for this reason. There's no 
separate "memory space" in local mode. It's one JVM. However it's undefined 
whether the copy of {{counter}} is the same or different in this case. 
Actually, I find a copy is serialized with the closure at this point so the 
result is still 0.

I think the explanation should be changed to say the result is undefined here, 
and could be 0 or not, and explain why. Do you want to try a PR?

> inaccurate description for difference between local vs cluster mode in 
> closure handling
> ---
>
> Key: SPARK-12760
> URL: https://issues.apache.org/jira/browse/SPARK-12760
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Mortada Mehyar
>Priority: Minor
>
> In the spark documentation there's an example for illustrating how `local` 
> and `cluster` mode can differ 
> http://spark.apache.org/docs/latest/programming-guide.html#example
> " In local mode with a single JVM, the above code will sum the values within 
> the RDD and store it in counter. This is because both the RDD and the 
> variable counter are in the same memory space on the driver node." 
> However the above doesn't seem to be true. Even in `local` mode it seems like 
> the counter value should still be 0, because the variable will be summed up 
> in the executor memory space, but the final value in the driver memory space 
> is still 0. I tested this snippet and verified that in `local` mode the value 
> is indeed still 0. 
> Is the doc wrong or perhaps I'm missing something the doc is trying to say? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org