Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/21067
  
    > ReadWriteOnce storage can only be attached to one node.
    
    This is well known. Using the RWO volume for fencing here would work - but 
this is not representative of all users. This breaks down if you include 
checkpointing to object storage (s3) or HDFS or into ReadWriteMany volumes like 
NFS. In all of those cases, there will be a problem with correctness. 
    
    For folks that need it right away, the same restarts feature can be 
realized using an approach like the 
[spark-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) 
without any of this hassle in a safe way, so, why are we trying to fit this 
into Spark with caveats around how volumes should be used to ensure fencing? 
This seems more error prone and harder to explain and I can't see the gain from 
it. One way forward is proposing to the k8s community to have a new option jobs 
that allow us to get fencing from the k8s apiserver through deterministic 
names. I think that would be a good way forward. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to