Hi Ted,
I am using Spark 1.5.2
Without repartition in the picture, it works exactly as it's supposed to.
With repartition, I am guessing when we call takeOrdered on train, it goes
ahead and compute the rdd, which has repartitioning on it, and prints out
the numbers. With the next call to takeOrde
bq. the train and test have overlap in the numbers being outputted
Can the call to repartition explain the above ?
Which release of Spark are you using ?
Thanks
On Sun, Dec 27, 2015 at 9:56 PM, Gaurav Kumar
wrote:
> Hi,
>
> I noticed an inconsistent behavior when using rdd.randomSplit when th
Hi,
I noticed an inconsistent behavior when using rdd.randomSplit when the
source rdd is repartitioned, but only in YARN mode. It works fine in local
mode though.
*Code:*
val rdd = sc.parallelize(1 to 100)
val rdd2 = rdd.repartition(64)
rdd.partitions.size
rdd2.partitions.size
val Array(train