Spark data frame map problem

Shashank Mandil Tue, 21 Mar 2017 11:40:45 -0700

Hi All,

I have a spark data frame which has 992 rows inside it.
When I run a map on this data frame I expect that the map should work for
all the 992 rows.


As a mapper runs on an executor on  a cluster I did a distributed count of
the number of rows the mapper is being run on.

dataframe.map(r => {
   //distributed count inside here using zookeeper
})

I have found that this distributed count inside the mapper is not exactly
992. I have found this number to vary with different runs.

Does anybody have any idea what might be happening ? By the way, I am using
spark 1.6.1

Thanks,
Shashank

Spark data frame map problem

Reply via email to