If just want arbitrary unique id attached to each record in a dstream (no ordering etc), then why not create generate and attach an UUID to each record?
On Wed, Aug 27, 2014 at 4:18 PM, Soumitra Kumar <kumar.soumi...@gmail.com> wrote: > I see a issue here. > > If rdd.id is 1000 then rdd.id * 1e9.toLong would be BIG. > > I wish there was DStream mapPartitionsWithIndex. > > > On Wed, Aug 27, 2014 at 3:04 PM, Xiangrui Meng <men...@gmail.com> wrote: > >> You can use RDD id as the seed, which is unique in the same spark >> context. Suppose none of the RDDs would contain more than 1 billion >> records. Then you can use >> >> rdd.zipWithUniqueId().mapValues(uid => rdd.id * 1e9.toLong + uid) >> >> Just a hack .. >> >> On Wed, Aug 27, 2014 at 2:59 PM, Soumitra Kumar >> <kumar.soumi...@gmail.com> wrote: >> > So, I guess zipWithUniqueId will be similar. >> > >> > Is there a way to get unique index? >> > >> > >> > On Wed, Aug 27, 2014 at 2:39 PM, Xiangrui Meng <men...@gmail.com> >> wrote: >> >> >> >> No. The indices start at 0 for every RDD. -Xiangrui >> >> >> >> On Wed, Aug 27, 2014 at 2:37 PM, Soumitra Kumar >> >> <kumar.soumi...@gmail.com> wrote: >> >> > Hello, >> >> > >> >> > If I do: >> >> > >> >> > DStream transform { >> >> > rdd.zipWithIndex.map { >> >> > >> >> > Is the index guaranteed to be unique across all RDDs here? >> >> > >> >> > } >> >> > } >> >> > >> >> > Thanks, >> >> > -Soumitra. >> > >> > >> > >