Thanks; What I'm looking for is a way to see changes to the state of some
variable during map(..) phase.
I simplified the scenario in my example by making row_index() increment
"incr" by 1 but in reality, the change to "incr" can be anything.

On Wed, Jan 27, 2016 at 6:25 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Have you looked at this method ?
>
>    * Zips this RDD with its element indices. The ordering is first based
> on the partition index
> ...
>   def zipWithIndex(): RDD[(T, Long)] = withScope {
>
> On Wed, Jan 27, 2016 at 6:03 PM, Krishna <research...@gmail.com> wrote:
>
>> Hi,
>>
>> I've a scenario where I need to maintain state that is local to a worker
>> that can change during map operation. What's the best way to handle this?
>>
>> *incr = 0*
>> *def row_index():*
>> *  global incr*
>> *  incr += 1*
>> *  return incr*
>>
>> *out_rdd = inp_rdd.map(lambda x: row_index()).collect()*
>>
>> "out_rdd" in this case only contains 1s but I would like it to have index
>> of each row in "inp_rdd".
>>
>
>

Reply via email to