Dear Spark community, Just wanted to bring this issue up which was filed for Spark 1.6.1 ( https://issues.apache.org/jira/browse/SPARK-15544) but also exists in Spark 2.3.0 (https://issues.apache.org/jira/browse/SPARK-23530)
We have run into this on production, where Spark Master shuts down if the Zookeeper leader on another node is shutdown during our upgrade procedure. Actually this is a serious issue in our opinion and defeats the purpose of Spark being Highly Available. Rest of the software components like Kafka are not affected by zookeeper leader shut down. The problem manifests in unusual way, since it affects not the node that is being rebooted or upgraded but some other node in the cluster and it can go unnoticed, unless we are actively monitoring for this to happen on other nodes during upgrade. (BTW by upgrade we mean upgrade of our application software stack, which might include changes to base operating system packages, not Spark version upgrade) Can we increase the priortiy of these two JIRA's or better still can someone pick this issue up please? Thank you Ashwin