Row order of RDDs

Mingyu Kim Wed, 29 Jan 2014 01:19:33 -0800

Here¹s my understanding of row order guarantees by RDD in the context of
limit() and collect(). Can someone confirm this?
* sparkContext.parallelize(myList) returns an RDD that may have a different
row order than myList.
* Every RDD loaded with the same file in HDFS (e.g.
sparkContext.textFile(³hdfs://path_to_file²)) will collect rows in the same
order.
* Row order of an RDD is preserved through non-shuffling operations (e.g.
Map, filter).
Mingyu

smime.p7s
Description: S/MIME cryptographic signature

Row order of RDDs

Reply via email to