Here¹s my understanding of row order guarantees by RDD in the context of
limit() and collect(). Can someone confirm this?
* sparkContext.parallelize(myList) returns an RDD that may have a different
row order than myList.
* Every RDD loaded with the same file in HDFS (e.g.
sparkContext.textFile(³hdfs://path_to_file²)) will collect rows in the same
order.
* Row order of an RDD is preserved through non-shuffling operations (e.g.
Map, filter).
Mingyu


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to