Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
@foxish I just checked on a Google Kubernetes Cluster with Kubernetes
version 1.10.4-gke.2. I created a two node cluster and I emulated "network
partition" with iptables rules (node running the spark driver become NotReady).
After a short/configurable delay the driver pod state changed to Unknown and
the Job controller initiated a new spark driver. After that I removed the
iptables rules denying the kubelet to speak with the master (The node with
status NotReady become Ready again). The node become ready and the driver pod
with the unknown state got terminated, with all of it's executors. In this case
there are no parallel running spark drivers so I think we are not sacrificing
correctness. Am I missing something?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]