We are looking to use ZK as a discovery service. This ensemble will be read from/written to by Apache ignites event bus. The event bus allows multiple running processes to share data across nodes. This is all running in a K8s cluster.
Our plan was to deploy a 5-7 instance ZK ensemble accessible via a fixed service IP address. Every K8s pod that started up would be provided this IP and thus would be able to register itself - while discovering the other processes on the event bus. When we proposed this, there was great concern from the software architects that network traffic between the kubernetes pods and the ZK ensemble must be minimized. As a result, they are requesting/requiring us to run a ZK ensemble member on every node of our Kubernetes cluster. Given this input, we changed plans such that for each kubernetes pod that gets started, a new Observer instance (running as a side-car container) will dynamically join the "core" ensemble. This gives localhost access to ZK to the primary container. This means that, at a minimum, we would be running at least 1 ZK ensemble member on every node of our K8S cluster. We intend to have several hundred nodes at least. Our concern is that ZK does not seem like it was intended to horizontally scale in this fashion. Beyond that, the frequency with which ensemble members would be joining/leaving the ensemble is unknown. My question is: What is the maximum number of ZK ensemble members that can be run within a single ensemble, with consideration that most of those members will be observers? What kinds of problems might this many members cause? thanks for your feedback Jay