Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2508#issuecomment-56572221 Hey Sean, I don't think it makes sense to add "the ordering of elements within each partition is not guaranteed" to all the mapPartitions and zip methods. For some RDDs, ordering is guaranteed, and these methods might use that. It's better to leave it on the group-by methods instead, and adding a note on just the zip methods to say "note that some RDDs, such as those returned by groupBy, do not guarantee order of elements in a partition; in those cases you should sort the RDD with sortByKey or save it to a file". You might also consider adding a section on this in the programming guide, if there's a good spot for it. Finally, don't recommend persist as a way to preserve order because even persist is not guaranteed to prevent recomputation if there are faults. It's better to tell them to use something with a guaranteed order.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org