Github user stoader commented on the issue:
https://github.com/apache/spark/pull/21067
@mccheah
> But whether or not the driver should be relaunchable should be determined
by the application submitter, and not necessarily done all the time. Can we
make this behavior configurable?
This should be easy by configuring [Pod Backoff failure
policy](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy)
of the job such that it executes the pod only once.
> We don't have a solid story for checkpointing streaming computation right
now
We've done work for this to store checkpointing on persistence volume but
thought that should be a separate PR as it's not strictly linked to this change.
> you'll certainly lose all progress from batch jobs
Agree that the batch job would be rerun from scratch. Still I think there
is value for one being able to run the batch job unattended and not intervene
in case of machine failure as the batch job will be rescheduled to another node.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]