We're running Zookeeper with about 2 million nodes. It's working, with one specific exception: When I try to get all children on one of the main node trees, I get an IOException out of ClientCnxn ("Packet len4648067 is out of range!"). There are 150329 children under the node in question. I should also mention that I can successfully ls other nodes with similarly high children counts. But this specific node always fails.
Googling led me to see that Mahadev dealt with this last year: http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00175.html Source diving led me to see that ClientCnxn enforces a bound based on the jute.maxbuffer setting: > packetLen = Integer.getInteger("jute.maxbuffer", 4096 * 1024); ... if (len < 0 || len >= packetLen) { throw new IOException("Packet len" + len + " is out of range!"); So maybe I could bump this up in config... but, I'm confused when reading the documentation on jute.maxbuffer: "It specifies the maximum size of the data that can be stored in a znode." It's true we have an extremely high node count. However, we've been careful to keep each node's data very small -- e.g., we certainly should have no single data entry longer than 256 characters. The way I'm reading the docs, the jute.maxbuffer bound is purely against the data size of specific nodes, and shouldn't relate to child count. Or does it relate to child count as well? Here is a stat on the offending node: cZxid = 0x10000000e ctime = Mon May 03 17:40:58 PDT 2010 mZxid = 0x10000000e mtime = Mon May 03 17:40:58 PDT 2010 pZxid = 0x100315064 cversion = 150654 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 150372 Thanks for any insights... Aaron