Hi All, I have a spark data frame which has 992 rows inside it. When I run a map on this data frame I expect that the map should work for all the 992 rows.
As a mapper runs on an executor on a cluster I did a distributed count of the number of rows the mapper is being run on. dataframe.map(r => { //distributed count inside here using zookeeper }) I have found that this distributed count inside the mapper is not exactly 992. I have found this number to vary with different runs. Does anybody have any idea what might be happening ? By the way, I am using spark 1.6.1 Thanks, Shashank