Greetings Zookeepers, I'm investigating possible ways for Zookeeper to run safely on top of Kubernetes clusters.
When the zookeeper containers come online, the value for $MYID is initially derived from the Kubernetes pod name. All active pod names are guaranteed to be unique within the cluster at any given point in time. Example values: - zookeeper-0 - zookeeper-1 - zookeeper-2 and the formula for $MYID is ((the trailing number of the pod name) + 1): - zookeeper-0 => $MYID=1 - zookeeper-1 => $MYID=2 - zookeeper-2 => $MYID=3 The part I'm uncertain of is the relationship between $MYID and ensuring each zookeeper data set stays in sync with the rest of the cluster, particularly across container restarts. Restarts can lead to Zookeeper data set being launched with a different value of $MYID compared with the previous run. I.e., Zookeeper may have already run on any given data set in the past when the myid file contained a different value. Is it part of the mechanism used to ensure all follower members are in sync with the current leader? It seems to me that if the leader (or followers) keep track of their peers via myid and it gets changed, there could be problems. Initial testing (without much load) has gone fine and things seem to work fine when launched with updated $MYID values. I've also been perusing the ZK source code and inspecting how myid is used, and nothing stood out to indicate that this will lead to future problems. However, experience dictates that with distributed systems the devil is often in nuanced details, so I'm hoping the experts out there may be able to shed light about the internal dependencies on the value of myid. Specific questions: - Is myid relied on to never change, or does it only need to be unique within the cluster at any given time? - What are the risks with changing myid in relation to ZK data set directories across runs? Your insights will be greatly appreciated! Kind regards, Jay Taylor
