Re: Inconsistent behavior of randomSplit in YARN mode

2015-12-28 Thread Gaurav Kumar
Hi Ted, I am using Spark 1.5.2 Without repartition in the picture, it works exactly as it's supposed to. With repartition, I am guessing when we call takeOrdered on train, it goes ahead and compute the rdd, which has repartitioning on it, and prints out the numbers. With the next call to takeOrde

Re: Inconsistent behavior of randomSplit in YARN mode

2015-12-28 Thread Ted Yu
bq. the train and test have overlap in the numbers being outputted Can the call to repartition explain the above ? Which release of Spark are you using ? Thanks On Sun, Dec 27, 2015 at 9:56 PM, Gaurav Kumar wrote: > Hi, > > I noticed an inconsistent behavior when using rdd.randomSplit when th

Inconsistent behavior of randomSplit in YARN mode

2015-12-27 Thread Gaurav Kumar
Hi, I noticed an inconsistent behavior when using rdd.randomSplit when the source rdd is repartitioned, but only in YARN mode. It works fine in local mode though. *Code:* val rdd = sc.parallelize(1 to 100) val rdd2 = rdd.repartition(64) rdd.partitions.size rdd2.partitions.size val Array(train