Awesome nugget Shawn, I also faced similar issue a while ago while i was
doing a full re-index. It would be great if such tips are added into FAQ
type documentation on cwiki. I love the SOLR forum everyday I learn
something new :-)

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/1/2015 1:26 PM, Rallavagu wrote:
> > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
> >
> > See following errors in ZK and Solr and they are connected.
> >
> > When I see the following error in Zookeeper,
> >
> > unexpected error, closing socket connection and attempting reconnect
> > java.io.IOException: Packet len11823809 is out of range!
>
> This is usually caused by the overseer queue (stored in zookeeper)
> becoming extraordinarily huge, because it's being flooded with work
> entries far faster than the overseer can process them.  This causes the
> znode where the queue is stored to become larger than the maximum size
> for a znode, which defaults to about 1MB.  In this case (reading your
> log message that says len11823809), something in zookeeper has gotten to
> be 11MB in size, so the zookeeper client cannot read it.
>
> I think the zookeeper server code must be handling the addition of
> children to the queue znode through a code path that doesn't pay
> attention to the maximum buffer size, just goes ahead and adds it,
> probably by simply appending data.  I'm unfamiliar with how the ZK
> database works, so I'm guessing here.
>
> If I'm right about where the problem is, there are two workarounds to
> your immediate issue.
>
> 1) Delete all the entries in your overseer queue using a zookeeper
> client that lets you edit the DB directly.  If you haven't changed the
> cloud structure and all your servers are working, this should be safe.
>
> 2) Set the jute.maxbuffer system property on the startup commandline for
> all ZK servers and all ZK clients (Solr instances) to a size that's
> large enough to accommodate the huge znode.  In order to do the deletion
> mentioned in option 1 above,you might need to increase jute.maxbuffer on
> the servers and the client you use for the deletion.
>
> These are just workarounds.  Whatever caused the huge queue in the first
> place must be addressed.  It is frequently a performance issue.  If you
> go to the following link, you will see that jute.maxbuffer is considered
> an unsafe option:
>
> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
>
> In Jira issue SOLR-7191, I wrote the following in one of my comments:
>
> "The giant queue I encountered was about 850000 entries, and resulted in
> a packet length of a little over 14 megabytes. If I divide 850000 by 14,
> I know that I can have about 60000 overseer queue entries in one znode
> before jute.maxbuffer needs to be increased."
>
> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
>
> Thanks,
> Shawn
>
>

Reply via email to