IIRC - the random is seeded with the index, so it will always produce the same result for the same index. Maybe I don't totally follow though. Could you give a small example of how this might change the RDD ordering in a way that you don't expect? In general repartition() will not preserve the ordering of an RDD.
On Wed, Oct 8, 2014 at 3:42 PM, Sung Hwan Chung <coded...@cs.stanford.edu> wrote: > I noticed that repartition will result in non-deterministic lineage because > it'll result in changed orders for rows. > > So for instance, if you do things like: > > val data = read(...) > val k = data.repartition(5) > val h = k.repartition(5) > > It seems that this results in different ordering of rows for 'k' each time > you call it. > And because of this different ordering, 'h' will result in different > partitions even, because 'repartition' distributes through a random number > generator with the 'index' as the key. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org