hi todd, Recently we use the QJM for HA, and i read the zkfc_design. I have a question, IMO, each zkfc hold a connection to zookeeper with an ephemeral node, And i worry about the network between zkfc and zookeeper node is not very stable(lost at a moment and recovery soon), which may cause the connection is lost, according to the design, this will cause a failover, but i think the failover is not necessary.
All above are just my opinion, am i wrong or there is a retry mechanism when the network is not so stable. My regards & Best wishes!
