Hi All,

I have a spark data frame which has 992 rows inside it.
When I run a map on this data frame I expect that the map should work for
all the 992 rows.

As a mapper runs on an executor on  a cluster I did a distributed count of
the number of rows the mapper is being run on.

dataframe.map(r => {
   //distributed count inside here using zookeeper
})

I have found that this distributed count inside the mapper is not exactly
992. I have found this number to vary with different runs.

Does anybody have any idea what might be happening ? By the way, I am using
spark 1.6.1

Thanks,
Shashank

Reply via email to