Re: Leader election problems

Filip Deleersnijder Thu, 25 Jun 2015 04:06:02 -0700

Hi,

The reason we are not using the architecture as proposed by you is because of 
our application requirements.
We have the following hard requirements in our application :


- No static hardware / servers can be used, beside the hardware that is 
on-board in the vehicles
- It must be allowed to shut any ( non-majority ) subset of vehicles off 
without the system stopping to work

Therefor, we have 1 ZK-server per vehicle, and every vehicle is also running 
the Client code that is connected to it’s “local” ZK-server

Filip


> On 25 Jun 2015, at 11:51, Guy Moshkowich <guy.moshkow...@gmail.com> wrote:
> 
> Are you using ZK client on your vehicles or ZK servers?
> You mentioned below 8 vehicles and i see 8 servers defined in the config.
> I would expect you have 8 client(running on your vehicles) communicating
> against 1 or 3 ZK servers as this will be more than enough for 8 clients.
> Guy
> 
> On Thursday, June 25, 2015, Filip Deleersnijder <fi...@motum.be 
> <mailto:fi...@motum.be>> wrote:
> 
>> Hi,
>> 
>> Thanks for your response.
>> 
>> Our application consists of 8 automatic vehicles in a warehouse setting.
>> Those vehicles need some consensus decisions, and that is what we use
>> Zookeeper for.
>> Because vehicles can come and go at random, we installed a ZK participant
>> on every vehicle. The ZK client is some other piece of software that is
>> also running on the vehicles.
>> 
>> Therefor :
>>        - We can not choose the number of ZK-participants because it just
>> depends on the number of vehicles.
>>        - The participants communicate over Wifi
>>        - The client is running on the same machine, so it communicates
>> over the local network
>> 
>> We are running Zookeeper version 3.4.6
>> 
>> Our zoo.cfg can be found below this e-mail.
>> 
>> Thanks in advance !
>> 
>> Filip
>> 
>> # The number of milliseconds of each tick
>> tickTime=2000
>> # The number of ticks that the initial
>> # synchronization phase can take
>> initLimit=10
>> # The number of ticks that can pass between
>> # sending a request and getting an acknowledgement
>> syncLimit=5
>> # the directory where the snapshot is stored.
>> # do not use /tmp for storage, /tmp here is just
>> # example sakes.
>> dataDir=c:/motum/config/MASS/ZK
>> # the port at which the clients will connect
>> clientPort=2181
>> 
>> server.1=172.17.35.11:2888:3888
>> server.2=172.17.35.12:2888:3888
>> server.3=172.17.35.13:2888:3888
>> server.4=172.17.35.14:2888:3888
>> server.5=172.17.35.15:2888:3888
>> server.6=172.17.35.16:2888:3888
>> server.7=172.17.35.17:2888:3888
>> server.8=172.17.35.18:2888:3888
>> 
>> # The number of snapshots to retain in dataDir
>> # Purge task interval in hours
>> # Set to "0" to disable auto purge feature
>> autopurge.snapRetainCount=3
>> autopurge.purgeInterval=1
>> 
>> 
>> 
>>> On 24 Jun 2015, at 18:54, Raúl Gutiérrez Segalés <r...@itevenworks.net
>> <javascript:;>> wrote:
>>> 
>>> Hi,
>>> 
>>> On 24 June 2015 at 06:05, Filip Deleersnijder <fi...@motum.be 
>>> <mailto:fi...@motum.be>
>> <javascript:;>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Let’s start with some description of our system :
>>>> 
>>>> - We our using a Zookeeper cluster with 8 participants for an
>> application
>>>> with mobile nodes ( connected over Wifi ).
>>>> 
>>> 
>>> You mean the participants talk over wifi or the clients?
>>> 
>>> 
>>>> ( Ip of the different nodes are according to the following structure :
>>>> Node X has IP : 172.17.35.1X )
>>>> 
>>> 
>>> Why 8 and not an odd number of machines (i.e.:
>>> 
>> http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
>>  
>> <http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup>
>>> )?
>>> 
>>> - It is not that unusual to have a node being shut-down or restarted
>>>> - We haven’t benchmarked the number of write operations yet, but I would
>>>> estimate that it would be less than 10 writes / second
>>>> 
>>> 
>>> What version of ZK are you using?
>>> 
>>> 
>>>> 
>>>> The problem we are having however is that sometimes(*), some instances
>>>> seem to be having problems with leader election.
>>>> Under the header “Attachment 1” below, you can find the leader election
>>>> times that were needed over 24h ( from 1 node ).  One average it took
>> more
>>>> than 1 minute !
>>>> I assume that this is not normal behaviour ? ( If somebody could confirm
>>>> that in a 8-node cluster, these are not normal leader election times,
>> that
>>>> would be nice )
>>>> 
>>>> In attachement 2 : I included an extract from the logging during a
>> leader
>>>> election that took 101874ms for 1 node ( server 2 ).
>>>> 
>>>> Any help is greatly appreciated.
>>>> If further or more specific logging is required, please ask !
>>>> 
>>>> 
>>> Do you mind sharing a copy of your config file (zoo.cfg)? Thanks!
>>> 
>>> 
>>> -rgs

Re: Leader election problems

Reply via email to