Hi All, I’m experiencing an issue on multiple hosts w/ Zookeeper 4.6 where Apache Solr filled the /overseer/queue node too full and can no longer read from it, and now I’m trying to “rmr /overseer/queue” to get things working again. Both systems have 200k+ child nodes of the node at fault.
On both systems I set -Djute.maxbuffer=5242880 within the zkServer.sh throughout the cluster and -Djute.maxbuffer=10000000 in zkCli.sh. On one system I couldn’t get this to work until I set zkCli’s setting substantially higher than the zkServer’s, but I *did* get it to work and have since cleared the queue for that given system. However, I’m beating my head against a wall for our other system. I’ve set all of the exact same settings and am having no luck rmr’ing the node. I’ve tried bumping the maxbuffer settings to 2-4x as high and still no luck. Every attempt from zkCli results in "ConnectionLossException: KeeperErrorCode = ConnectionLoss for /overseer/queue" I’m at my wits end here. I’ve checked everything over and over and cannot see any reason why this should not be working. It appears as a correctly set JVM arg when I grep the zookeeper process. Any advice from anyone is appreciated! -- James Hardwick
