[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-30 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-57357513
  
Cool, thanks. Going to merge this as is then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2508


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-25 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56829748
  
@mateiz Yeah, there's no mention of zip methods in the programming guide, 
so if the groupBy method note isn't so valuable, I think there's probably no 
useful note to be made in the docs that I can see. I reverted that (will see if 
I can get git to not think there is a single whitespace change as a result).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56830592
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20809/consoleFull)
 for   PR 2508 at commit 
[`b7c96fd`](https://github.com/apache/spark/commit/b7c96fd68ba5816e6bcb6334bef9b5b4c1a4b15b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56841671
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20809/consoleFull)
 for   PR 2508 at commit 
[`b7c96fd`](https://github.com/apache/spark/commit/b7c96fd68ba5816e6bcb6334bef9b5b4c1a4b15b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56841680
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20809/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-24 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56651085
  
@mateiz Got it. On the zip methods, I want to capture the key point from 
https://issues.apache.org/jira/browse/SPARK-3098 , that the ordering is not 
only not guaranteed but also may change on reevaluation. I hope that wording is 
OK to retain and merge into yours.

I'll find some place in the programming guide to note this, and remove 
wording about persist and/or replace with suggestion to sort the RDD.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56661626
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20750/consoleFull)
 for   PR 2508 at commit 
[`ad4aeec`](https://github.com/apache/spark/commit/ad4aeec504ad07269511a2aad843a5b815dfcf5d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56669661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20750/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56669653
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20750/consoleFull)
 for   PR 2508 at commit 
[`ad4aeec`](https://github.com/apache/spark/commit/ad4aeec504ad07269511a2aad843a5b815dfcf5d).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-24 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/2508#discussion_r18012607
  
--- Diff: docs/programming-guide.md ---
@@ -882,7 +882,11 @@ for details.
 /tr
 tr
   td bgroupByKey/b([inumTasks/i]) /td
-  td When called on a dataset of (K, V) pairs, returns a dataset of (K, 
Iterablelt;Vgt;) pairs. br /
+  td When called on a dataset of (K, V) pairs, returns a dataset of (K, 
Iterablelt;Vgt;) pairs.
+br /
+bNote:/b The ordering of elements within each group is not 
guaranteed, and may even differ
+ each time the resulting RDD is evaluated.
--- End diff --

I don't think this is a good place to put this in the programming guide. If 
you can't find another place, maybe just leave it out. The other note here is a 
much more important and more common pitfall.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-23 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/2508

[SPARK-3356] [DOCS] Document when RDD elements' ordering within partitions 
is nondeterministic

As suggested by @mateiz , and because it came up on the mailing list again 
last week, this attempts to document that ordering of elements is not 
guaranteed across RDD evaluations in groupBy, zip, and partition-wise RDD 
methods. Suggestions welcome about the wording, or other methods that need a 
note.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-3356

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2508.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2508


commit fce943b3401135074ec943c56653fbb36657804c
Author: Sean Owen so...@cloudera.com
Date:   2014-09-23T12:57:47Z

Note that ordering of elements is not guaranteed across RDD evaluations in 
groupBy, zip, and partition-wise RDD methods




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56516002
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20704/consoleFull)
 for   PR 2508 at commit 
[`fce943b`](https://github.com/apache/spark/commit/fce943b3401135074ec943c56653fbb36657804c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56525327
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20704/consoleFull)
 for   PR 2508 at commit 
[`fce943b`](https://github.com/apache/spark/commit/fce943b3401135074ec943c56653fbb36657804c).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56525341
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20704/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3356] [DOCS] Document when RDD elements...

2014-09-23 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/2508#issuecomment-56572221
  
Hey Sean, I don't think it makes sense to add the ordering of elements 
within each partition is not guaranteed to all the mapPartitions and zip 
methods. For some RDDs, ordering is guaranteed, and these methods might use 
that. It's better to leave it on the group-by methods instead, and adding a 
note on just the zip methods to say note that some RDDs, such as those 
returned by groupBy, do not guarantee order of elements in a partition; in 
those cases you should sort the RDD with sortByKey or save it to a file.

You might also consider adding a section on this in the programming guide, 
if there's a good spot for it.

Finally, don't recommend persist as a way to preserve order because even 
persist is not guaranteed to prevent recomputation if there are faults. It's 
better to tell them to use something with a guaranteed order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org