Github user baluchicken commented on the issue: https://github.com/apache/spark/pull/21067 @foxish I just checked on a Google Kubernetes Cluster with Kubernetes version 1.10.4-gke.2. I created a two node cluster and I emulated "network partition" with iptables rules (node running the spark driver become NotReady). After a short/configurable delay the driver pod state changed to Unknown and the Job controller initiated a new spark driver. After that I removed the iptables rules denying the kubelet to speak with the master (The node with status NotReady become Ready again). The node become ready and the driver pod with the unknown state got terminated, with all of it's executors. In this case there are no parallel running spark drivers so I think we are not sacrificing correctness. Am I missing something?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org