Hi!

In the documentation it says:


   - By default, foreachBatch provides only at-least-once write guarantees.
   However, you can use the batchId provided to the function as way to
   deduplicate the output and get an exactly-once guarantee.


Taking the example snippet :


streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.persist()
  batchDF.write.format(...).save(...)  // location 1
  batchDF.write.format(...).save(...)  // location 2
  batchDF.unpersist()}


Let's assume I'm reading from Kafka, that means that by default *batchDF *may
or may not have duplicates?

Thanks!

Reply via email to