I'll answer in the context of structured streaming (the new streaming API
build on DataFrames). When reading from files, the FileSource, records
which files are included in each batch inside of the given
checkpointLocation. If you fail in the middle of a batch, the streaming
engine will retry that
Hello,
I'm planning to use fileStream Spark streaming API to stream data from
HDFS. My Spark job would essentially process these files and post the
results to an external endpoint.
*How does fileStream API handle checkpointing of the file it processed ? *In
other words, if my Spark job failed whi