Re: Appending an incrental value to each RDD record
You would do: rdd.zipWithIndexGives you an RDD[Original, Int] where the second element is the index. To have a (index,original) tuple, you will need to map that previous RDD to the desired shape: rdd.zipWithIndex.map(_.swap) -kr, Gerard. kr, Gerard. On Tue, Dec 16, 2014 at 4:12 PM, bethesda swearinge...@mac.com wrote: I think this is sort of a newbie question, but I've checked the api closely and don't see an obvious answer: Given an RDD, how would I create a new RDD of Tuples where the first Tuple value is an incremented Int e.g. 1,2,3 ... and the second value of the Tuple is the original RDD record? I'm trying to simply assign a unique ID to each record in my RDD. (I want to stay in RDD land, and not convert to a List and back to RDD, since that seems unnecessary and probably bad form.) Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Appending an incrental value to each RDD record
You could try using zipWIthIndex (links below to API docs). For example, in python: items =['a','b','c'] items2= sc.parallelize(items) print(items2.first()) items3=items2.map(lambda x: (x, x+!)) print(items3.first()) items4=items3.zipWithIndex() print(items4.first()) items5=items4.map(lambda x: (x[1], x[0])) print(items5.first()) This will give you an output of (0, ('a', 'a!')) - where the 0 is the index. You could also use a map to increment them up by a value (e.g. if you wanted to count from 1). Links http://spark.apache.org/docs/latest/api/python/index.html http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718p20720.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Appending an incrental value to each RDD record
Thanks! zipWithIndex() works well. I had overlooked it because the name 'zip' is rather odd -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718p20722.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org