Re: Ensemble fails when one node looses connectivity

Steph van Schalkwyk Thu, 01 Mar 2018 17:59:59 -0800

Does the log say anything about timing out on init?
Your initLimit is already pretty big, but then we don't know anything about
your setup.
Are you storing more than 1MB in a znode? Then increase jute.maxbuffer (in
java.env as a -Djute.maxbuffer=xxxxxx).
I've recently run into that with Fusion 3.1.
Post more details, if you would.
Good luck.
Steph



On Thu, Mar 1, 2018 at 7:43 PM, Jim Keeney <j...@fitterweb.com> wrote:

> I'm using Zookeeper with solr to create a cluster and I have come across
> what seems like an unexpected behavior. The cluster is setup on AWS using
> opsworks.  I am using a 3 node zookeeper ensemble. The zookeeper config
> on all three nodes is:
>
> clientPort=2181
>
> dataDir=/var/opt/zookeeper/data
>
> tickTime=2000
>
> autopurge.purgeInterval=24
>
> initLimit=100
>
> syncLimit=5
>
> server.1=172.31.86.130:2888:3888
>
> server.2=172.31.16.234:2888:3888
>
> server.3=172.31.73.122:2888:3888
>
>
> Here is the issue:
>
> If one node in the ensemble fails or is shut down the ensemble carries on.
> However, when the node is restarted it's attempt to connect to the other
> members of the cluster are rejected. The only way that I have found to
> restore the ensemble is to restart all of the nodes within a short time
> span of each other.
>
> If I do that they are able to discover each other  carry on a proper
> leader election and restore order.
>
> Once they are restored everything is fine but if one of the nodes goes
> down we are faced wit the same problem.
>
> How do I ensure that if a node goes down, it can restart and rejoin the
> ensemble with out having to manually restart all the other nodes?
>
> Any help appreciated.
>
> Thanks.
>
> Jim K.
>
>
>
>
> --
> Jim Keeney
> President, FitterWeb
> E: j...@fitterweb.com
> M: 703-568-5887 <(703)%20568-5887>
>
> *FitterWeb Consulting*
> *Are you lean and agile enough? *
>

Re: Ensemble fails when one node looses connectivity

Reply via email to