Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/2508#issuecomment-56572221
Hey Sean, I don't think it makes sense to add "the ordering of elements
within each partition is not guaranteed" to all the mapPartitions and zip
methods. For some RDDs, ordering is guaranteed, and these methods might use
that. It's better to leave it on the group-by methods instead, and adding a
note on just the zip methods to say "note that some RDDs, such as those
returned by groupBy, do not guarantee order of elements in a partition; in
those cases you should sort the RDD with sortByKey or save it to a file".
You might also consider adding a section on this in the programming guide,
if there's a good spot for it.
Finally, don't recommend persist as a way to preserve order because even
persist is not guaranteed to prevent recomputation if there are faults. It's
better to tell them to use something with a guaranteed order.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]