Hi! In the documentation it says:
- By default, foreachBatch provides only at-least-once write guarantees. However, you can use the batchId provided to the function as way to deduplicate the output and get an exactly-once guarantee. Taking the example snippet : streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) => batchDF.persist() batchDF.write.format(...).save(...) // location 1 batchDF.write.format(...).save(...) // location 2 batchDF.unpersist()} Let's assume I'm reading from Kafka, that means that by default *batchDF *may or may not have duplicates? Thanks!