Hi Dipayan,
You ought to maintain data source consistency minimising changes. upstream.
Spark is not a Swiss Army knife :)
Anyhow, we already do this in spark structured streaming with the concept
of checkpointing.You can do so by implementing
- Checkpointing
- Stateful processing in
Hi Team,
One of the biggest pain points we're facing is when Spark reads upstream
partition data and during Action, the upstream also gets refreshed and the
application fails with 'File not exists' error. It could happen that the
job has already spent a reasonable amount of time, and re-running