Hi,

Is the following guaranteed to always provide an exact count?

foreachRDD(foreachFunc = rdd => {
   rdd.count()

In the literature it mentions "However, output operations (like foreachRDD)
have *at-least once* semantics, that is, the transformed data may get
written to an external entity more than once in the event of a worker
failure."

http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node

Thanks,
Josh

Reply via email to