I've got a few thousand clients connected to a ZK cluster that retrieve a 
~100KB json blob with some configuration. When any configuration value is 
updated  it causes a significant network spike as every client tries to get the 
new json blob. Multiple updates at the same time can also cause a bit of client 
thrashing as the clients try to push their updated json and it's rejected 
because some other client was pushing a different change.

There are a variety of ways to work around these issues, but it got me curious 
how expensive it is to keep watches. Let's say I have ~10000 configuration 
values and ~1000 clients. Could I create one ZK node for each configuration 
value and have each client place a watch on every node (for a total of ~10M 
watches)? The docs[1] weren't super clear on how they're implemented, so I 
can't tell how expensive they are. 

Using this method would make my updates MUCH cheaper because I'd be updating a 
trivially small node and sending that tiny amount of data to the same number of 
clients. What I'm not sure about is if the steady state of so many watches is 
going to overload the zk cluster. I'm also curious if adding observer nodes to 
the cluster would be a good idea in this scenario.

Any help would be appreciated!

-Jeff

[1] 
http://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkWatches 
<http://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkWatches>


Reply via email to