Awesome nugget Shawn, I also faced similar issue a while ago while i was doing a full re-index. It would be great if such tips are added into FAQ type documentation on cwiki. I love the SOLR forum everyday I learn something new :-)
Thanks Ravi Kiran Bhaskar On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/1/2015 1:26 PM, Rallavagu wrote: > > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3. > > > > See following errors in ZK and Solr and they are connected. > > > > When I see the following error in Zookeeper, > > > > unexpected error, closing socket connection and attempting reconnect > > java.io.IOException: Packet len11823809 is out of range! > > This is usually caused by the overseer queue (stored in zookeeper) > becoming extraordinarily huge, because it's being flooded with work > entries far faster than the overseer can process them. This causes the > znode where the queue is stored to become larger than the maximum size > for a znode, which defaults to about 1MB. In this case (reading your > log message that says len11823809), something in zookeeper has gotten to > be 11MB in size, so the zookeeper client cannot read it. > > I think the zookeeper server code must be handling the addition of > children to the queue znode through a code path that doesn't pay > attention to the maximum buffer size, just goes ahead and adds it, > probably by simply appending data. I'm unfamiliar with how the ZK > database works, so I'm guessing here. > > If I'm right about where the problem is, there are two workarounds to > your immediate issue. > > 1) Delete all the entries in your overseer queue using a zookeeper > client that lets you edit the DB directly. If you haven't changed the > cloud structure and all your servers are working, this should be safe. > > 2) Set the jute.maxbuffer system property on the startup commandline for > all ZK servers and all ZK clients (Solr instances) to a size that's > large enough to accommodate the huge znode. In order to do the deletion > mentioned in option 1 above,you might need to increase jute.maxbuffer on > the servers and the client you use for the deletion. > > These are just workarounds. Whatever caused the huge queue in the first > place must be addressed. It is frequently a performance issue. If you > go to the following link, you will see that jute.maxbuffer is considered > an unsafe option: > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options > > In Jira issue SOLR-7191, I wrote the following in one of my comments: > > "The giant queue I encountered was about 850000 entries, and resulted in > a packet length of a little over 14 megabytes. If I divide 850000 by 14, > I know that I can have about 60000 overseer queue entries in one znode > before jute.maxbuffer needs to be increased." > > https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834 > > Thanks, > Shawn > >