Thanks; What I'm looking for is a way to see changes to the state of some variable during map(..) phase. I simplified the scenario in my example by making row_index() increment "incr" by 1 but in reality, the change to "incr" can be anything.
On Wed, Jan 27, 2016 at 6:25 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you looked at this method ? > > * Zips this RDD with its element indices. The ordering is first based > on the partition index > ... > def zipWithIndex(): RDD[(T, Long)] = withScope { > > On Wed, Jan 27, 2016 at 6:03 PM, Krishna <research...@gmail.com> wrote: > >> Hi, >> >> I've a scenario where I need to maintain state that is local to a worker >> that can change during map operation. What's the best way to handle this? >> >> *incr = 0* >> *def row_index():* >> * global incr* >> * incr += 1* >> * return incr* >> >> *out_rdd = inp_rdd.map(lambda x: row_index()).collect()* >> >> "out_rdd" in this case only contains 1s but I would like it to have index >> of each row in "inp_rdd". >> > >