Can't ls with large node count and I don't understand the use of jute.maxbuffer

Aaron Crow Thu, 13 May 2010 10:19:07 -0700

We're running Zookeeper with about 2 million nodes. It's working, with one
specific exception: When I try to get all children on one of the main node
trees, I get an IOException out of ClientCnxn ("Packet len4648067 is out of
range!"). There are 150329 children under the node in question. I should
also mention that I can successfully ls other nodes with similarly high
children counts. But this specific node always fails.


Googling led me to see that Mahadev dealt with this last year:
http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00175.html

Source diving led me to see that ClientCnxn enforces a bound based on
the jute.maxbuffer setting:

> packetLen = Integer.getInteger("jute.maxbuffer", 4096 * 1024);

...

if (len < 0 || len >= packetLen) {

  throw new IOException("Packet len" + len + " is out of range!");


So maybe I could bump this up in config... but, I'm confused when reading
the documentation on jute.maxbuffer:
"It specifies the maximum size of the data that can be stored in a znode."

It's true we have an extremely high node count. However, we've been careful
to keep each node's data very small -- e.g., we certainly should have no
single data entry longer than 256 characters.  The way I'm reading the docs,
the jute.maxbuffer bound is purely against the data size of specific nodes,
and shouldn't relate to child count. Or does it relate to child count as
well?

Here is a stat on the offending node:

cZxid = 0x10000000e

ctime = Mon May 03 17:40:58 PDT 2010

mZxid = 0x10000000e

mtime = Mon May 03 17:40:58 PDT 2010

pZxid = 0x100315064

cversion = 150654

dataVersion = 0

aclVersion = 0

ephemeralOwner = 0x0

dataLength = 0

numChildren = 150372


Thanks for any insights...


Aaron

Can't ls with large node count and I don't understand the use of jute.maxbuffer

Reply via email to