[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

stoader Sat, 14 Apr 2018 00:56:56 -0700

Github user stoader commented on the issue:

    https://github.com/apache/spark/pull/21067
  
    @mccheah 
    
    > But whether or not the driver should be relaunchable should be determined 
by the application submitter, and not necessarily done all the time. Can we 
make this behavior configurable?
    
    This should be easy by configuring [Pod Backoff failure 
policy](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy)
 of the job such that it executes the pod only once.
    
    > We don't have a solid story for checkpointing streaming computation right 
now
    
    We've done work for this to store checkpointing on persistence volume but 
thought that should be a separate PR as it's not strictly linked to this change.
    
    > you'll certainly lose all progress from batch jobs
    
    Agree that the batch job would be rerun from scratch. Still I think there 
is value for one being able to run the batch job unattended and not intervene 
in case of machine failure as the batch job will be rescheduled to another node.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

Reply via email to