Checkpoints record what has been processed for a specific query, and as
such only need to be defined when writing (which is how you "start" a
query).
You can use the DataFrame created with readStream to start multiple
queries, so it wouldn't really make sense to have a single checkpoint there.
On
Hi All,
I was wondering if we need to checkpoint both read and write streams when
reading from Kafka and inserting into a target store?
for example
sparkSession.readStream().option("checkpointLocation", "hdfsPath").load()
vs
dataSet.writeStream().option("checkpointLocation", "hdfsPath")
Thank