Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/2508#issuecomment-56572221
  
    Hey Sean, I don't think it makes sense to add "the ordering of elements 
within each partition is not guaranteed" to all the mapPartitions and zip 
methods. For some RDDs, ordering is guaranteed, and these methods might use 
that. It's better to leave it on the group-by methods instead, and adding a 
note on just the zip methods to say "note that some RDDs, such as those 
returned by groupBy, do not guarantee order of elements in a partition; in 
those cases you should sort the RDD with sortByKey or save it to a file".
    
    You might also consider adding a section on this in the programming guide, 
if there's a good spot for it.
    
    Finally, don't recommend persist as a way to preserve order because even 
persist is not guaranteed to prevent recomputation if there are faults. It's 
better to tell them to use something with a guaranteed order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to