Hi,
I want to split a RDD by certain percentage, like 10 % (split the RDD into
10 piece)
Ideally, the function preferred is as below:
def deterministicSplit[T](dataSet: RDD[T], nb: Int): Array[RDD[T]] = {
/* code */
}
the "dataSet" is a RDD sorted by its key. For example, if "nb" = 10 here,
this function returns an array containing the first 10% data, the second 10%
data, ... the tenth 10% data.
In fact, I can do this in an ugly way. But I prefer to do it properly. Any
hints ? Share your good ideas, please.
I have looked up the RDD APIs, but could not find what I want.
Thank you.
Hao
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/split-a-RDD-by-pencetage-tp333.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.