-incubator, +user So, are you trying to "transpose" your data?
val rdd = sc.parallelize(List(List(1,2,3,4,5),List(6,7,8,9,10))).repartition(2) First you could pair each value with its position in its list: val withIndex = rdd.flatMap(_.zipWithIndex) then group by that position, and discard the position: withIndex.groupBy(_._2).values.map(_.map(_._1)) Printing the RDD gives what you want: List(5, 10) List(1, 6) List(3, 8) List(2, 7) List(4, 9) On Tue, Aug 12, 2014 at 5:42 AM, Kevin Jung <itsjb.j...@samsung.com> wrote: > Hi > It may be simple question, but I can not figure out the most efficient way. > There is a RDD containing list. > > RDD > ( > List(1,2,3,4,5) > List(6,7,8,9,10) > ) > > I want to transform this to > > RDD > ( > List(1,6) > List(2,7) > List(3,8) > List(4,9) > List(5,10) > ) > > And I want to achieve this without using collect method because realworld > RDD can have a lot of elements then it may cause out of memory. > Any ideas will be welcome. > > Best regards > Kevin > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Transform-RDD-List-tp11948.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org