Hi,

I want to split a RDD by certain percentage, like 10 % (split the RDD into
10 piece)

Ideally, the function preferred is as below: 

def deterministicSplit[T](dataSet: RDD[T], nb: Int): Array[RDD[T]] = {
    /* code */
}

the "dataSet" is a RDD sorted by its key. For example, if "nb" = 10 here,
this function returns an array containing the first 10% data, the second 10%
data, ... the tenth 10% data.

In fact, I can do this in an ugly way. But I prefer to do it properly. Any
hints ?  Share your good ideas, please.

I have looked up the RDD APIs, but could not find what I want.

Thank you.

Hao



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/split-a-RDD-by-pencetage-tp333.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to