James, we treat observers not part of the ensemble's dynamic config, and it's all using -1 as server id, that's fine for us since we don't allow global sessions on observers.
If you don't need global sessions on observers, then probably you can adopt similar solutions here for now. Thanks, Fangmin On Fri, Apr 10, 2020 at 2:58 PM James Arbo <ja...@tracelink.com> wrote: > Thanks Fangmin. That's an Interesting feature - allowing followers to host > observers. > but I assume the entire collection of servers is still considered part of > the ensemble. > If so, isn't the upper limit still capped to 256 - by the lowest 8 bits of > the server id? > > > On Fri, Apr 10, 2020 at 5:32 PM Fangmin Lv <lvfang...@gmail.com> wrote: > > > There is ObserverMaster feature contributed back in ZOOKEEPER-3140 > > <https://issues.apache.org/jira/browse/ZOOKEEPER-3140> could be used to > > scale the > > number of observers and traffics a single ensemble can support. > > > > It allows followers to serve observers as well, which relieves the fanout > > load on leader. > > > > But as Michael mentioned, there is server id limit given lowest 8 bits > are > > used guarantee the session id > > uniqueness, so max servers are limited to 255. > > > > Internally, we use local sessions only on observers, so we use dynamic > > observer id (-1) for all observers, > > which is not part of the dynamic config. It helps us scale more > observers, > > but this may not be a good > > solution for community since there is limitation here. > > > > Thanks, > > Fangmin > > > > On Fri, Apr 10, 2020 at 1:43 PM Michael Han <h...@apache.org> wrote: > > > > > If you have 100s of 1000s of ZK clients then having observer in each > pod > > > will presumably reduce traffic as most of the fan out traffic, from > > server > > > to clients is localized to each pod. > > > > > > Observer is not part of quorum, and a quorum can't scale pass a few > > servers > > > (typical just 5 or 7). Observers can scale from 100s to 1000s (depends > on > > > whether only leader hosts them, or follower can host them) but actual > > > number depends on workload and hardware capacity. Although it's > > recommended > > > myid being [0,255] but I vaguely remember we can pass this limit, just > > need > > > to make sure the lower 8 bits of the myid always to be unique as that's > > > used to construct session id. > > > > > > On Fri, Apr 10, 2020 at 12:09 PM James Arbo <ja...@tracelink.com> > wrote: > > > > > > > That was my instinct as well. I *think* any ZK writes would require a > > > > quorum before the transaction is committed. Getting a quorum over a > > > several > > > > hundred/thousand node ensemble seems like a lot of traffic. > > > > Plus, from what I've read - though not 100% certain, it seems the > > number > > > ZK > > > > nodes is capped at 255. > > > > > > > > On Fri, Apr 10, 2020 at 2:52 PM Bram Van Dam <bram.van...@intix.eu> > > > wrote: > > > > > > > > > On 10/04/2020 20:13, James Arbo wrote: > > > > > > When we proposed this, there was great concern from the software > > > > > architects > > > > > > that network traffic between the kubernetes pods and the ZK > > ensemble > > > > must > > > > > > be minimized. > > > > > > > > > > > This means that, at a minimum, we would be running at least 1 ZK > > > > ensemble > > > > > > member on every node of our K8S cluster. > > > > > > > > > > Sounds to me like this would *increase* network traffic, not > decrease > > > > > it. Instead of having communication between the pod and ZK whenever > > > > > needed (which likely isn't very frequently?), you'll now be having > > > > > constant communication between the ensemble and your hundreds of > > > > > observers in order to keep the observers in sync. > > > > > > > > > > Maybe I'm missing something? > > > > > > > > > > - Bram > > > > > > > > > > > > > > > > > > > >