i think there is a wiki page on this, but for the short answer:
the number of znodes impact two things: memory footprint and recovery
time. there is a base overhead to znodes to store its path, pointers to
the data, pointers to the acl, etc. i believe that is around 100 bytes.
you cant just divide your memory by 100+1K (for data) though, because
the GC needs to be able to run and collect things and maintain a free
space. if you use 3/4 of your available memory, that would mean with 4G
you can store about three million znodes. when there is a crash and you
recover, servers may need to read this data back off the disk or over
the network. that means it will take about a minute to read 3G from the
disk and perhaps a bit more to read it over the network, so you will
need to adjust your initLimit accordingly.
of course this is all back-of-the-envelope. i would suggest doing some
quick benchmarks to test and make sure your results are in line with
On 07/15/2010 02:56 AM, Maarten Koopmans wrote:
I am mapping a filesystem to ZooKeeper, and use it for locking and mapping a
filesystem namespace to a flat data object space (like S3). So assuming proper
nesting and small ZooKeeper nodes (< 1KB), how many nodes could a cluster with
a few GBs of memory per instance realistically hold totally?