Re: Appending an incrental value to each RDD record

2014-12-16 Thread Gerard Maas
You would do:

rdd.zipWithIndexGives you  an RDD[Original, Int] where the second
element is the index.
To have a (index,original) tuple, you will need to map that previous RDD to
the desired shape:
rdd.zipWithIndex.map(_.swap)

-kr, Gerard.



kr, Gerard.

On Tue, Dec 16, 2014 at 4:12 PM, bethesda swearinge...@mac.com wrote:

 I think this is sort of a newbie question, but I've checked the api closely
 and don't see an obvious answer:

 Given an RDD, how would I create a new RDD of Tuples where the first Tuple
 value is an incremented Int e.g. 1,2,3 ... and the second value of the
 Tuple
 is the original RDD record?  I'm trying to simply assign a unique ID to
 each
 record in my RDD.  (I want to stay in RDD land, and not convert to a List
 and back to RDD, since that seems unnecessary and probably bad form.)

 Thanks.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Appending an incrental value to each RDD record

2014-12-16 Thread mj
You could try using zipWIthIndex (links below to API docs). For example, in
python:

items =['a','b','c']
items2= sc.parallelize(items)

print(items2.first())

items3=items2.map(lambda x: (x, x+!))

print(items3.first())

items4=items3.zipWithIndex()

print(items4.first())

items5=items4.map(lambda x: (x[1], x[0]))
print(items5.first())


This will give you an output of (0, ('a', 'a!')) - where the 0 is the index.
You could also use a map to increment them up by a value (e.g. if you wanted
to count from 1).

Links
http://spark.apache.org/docs/latest/api/python/index.html
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718p20720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Appending an incrental value to each RDD record

2014-12-16 Thread bethesda
Thanks! zipWithIndex() works well.  I had overlooked it because the name
'zip' is rather odd



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718p20722.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org