Re: Transform RDD[List]

Sean Owen Tue, 12 Aug 2014 00:23:28 -0700

-incubator, +user

So, are you trying to "transpose" your data?


val rdd = sc.parallelize(List(List(1,2,3,4,5),List(6,7,8,9,10))).repartition(2)

First you could pair each value with its position in its list:

val withIndex = rdd.flatMap(_.zipWithIndex)

then group by that position, and discard the position:

withIndex.groupBy(_._2).values.map(_.map(_._1))

Printing the RDD gives what you want:

List(5, 10)
List(1, 6)
List(3, 8)
List(2, 7)
List(4, 9)

On Tue, Aug 12, 2014 at 5:42 AM, Kevin Jung <itsjb.j...@samsung.com> wrote:
> Hi
> It may be simple question, but I can not figure out the most efficient way.
> There is a RDD containing list.
>
> RDD
> (
>  List(1,2,3,4,5)
>  List(6,7,8,9,10)
> )
>
> I want to transform this to
>
> RDD
> (
> List(1,6)
> List(2,7)
> List(3,8)
> List(4,9)
> List(5,10)
> )
>
> And I want to achieve this without using collect method because realworld
> RDD can have a lot of elements then it may cause out of memory.
> Any ideas will be welcome.
>
> Best regards
> Kevin
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Transform-RDD-List-tp11948.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Transform RDD[List]

Reply via email to