Hey folks, I am trying to split a data set into two parts. Since I am using Spark 1.0.0 I cannot use the randomSplit method. I found this SO question : http://stackoverflow.com/questions/24864828/spark-scala-shuffle-rdd-split-rdd-into-two-random-parts-randomly
which contains this implementation in Scala and Spark 1.0.0: def randomSplit(weights: Array[Double], seed: Long = Utils.random.nextLong): Array[RDD[T]] = { val sum = weights.sum val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _) normalizedCumWeights.sliding(2).map { x => new PartitionwiseSampledRDD[T, T](this, new BernoulliSampler[T](x(0), x(1)),seed) }.toArray I am using Java, and I tried implementing the above code, but I am unable to figure out how to do that. Any Ideas? Using: Spark 1.0.0 and Java 1.7 Thanks, Samarth