Github user foxish commented on the issue:
https://github.com/apache/spark/pull/21067
> ReadWriteOnce storage can only be attached to one node.
This is well known. Using the RWO volume for fencing here would work - but
this is not representative of all users. This breaks down if you include
checkpointing to object storage (s3) or HDFS or into ReadWriteMany volumes like
NFS. In all of those cases, there will be a problem with correctness.
For folks that need it right away, the same restarts feature can be
realized using an approach like the
[spark-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
without any of this hassle in a safe way, so, why are we trying to fit this
into Spark with caveats around how volumes should be used to ensure fencing?
This seems more error prone and harder to explain and I can't see the gain from
it. One way forward is proposing to the k8s community to have a new option jobs
that allow us to get fencing from the k8s apiserver through deterministic
names. I think that would be a good way forward.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]