Hi, Is the following guaranteed to always provide an exact count?
foreachRDD(foreachFunc = rdd => { rdd.count() In the literature it mentions "However, output operations (like foreachRDD) have *at-least once* semantics, that is, the transformed data may get written to an external entity more than once in the event of a worker failure." http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node Thanks, Josh