>> we should measure the total time more accurately
+1 - it would be good to have a new metric to measure reconfiguration time,
and leaving existing LE time metric dedicated to measure the conventional
FLE time. Mixing both (as of today) will provide some confusing insights on
how long the
Please see comments inline.
Thanks,
Alex
On Mon, Jul 29, 2019 at 5:29 PM Karolos Antoniadis
wrote:
> Hi ZooKeeper developers,
>
> ZooKeeper seems to be logging a "*LEADER ELECTION TOOK*" message even
> though no leader election takes place during a reconfiguration.
>
> This can be reproduced
>> Can we reduce this time by configuring "syncLimit" and "tickTime" to
let's say 5 seconds? Can we have a strong
guarantee on this time bound?
It's not possible to guarantee the time bound, because of FLP impossibility
(reliable failure detection is not possible in async environment). Though
Thanks a lot for sharing the design, Ted. It is very helpful. Will check
what is applicable to our case and let you know in case of questions.
On Mon, Dec 10, 2018 at 23:37 Ted Dunning wrote:
> One very useful way to deal with this is the method used in MapR FS. The
> idea is that ZK should
One very useful way to deal with this is the method used in MapR FS. The
idea is that ZK should only be used rarely and short periods of two leaders
must be tolerated, but other data has to be written with absolute
consistency.
The method that we chose was to associate an epoch number with every
Thanks, Maciej. That sounds good. We will try playing with the parameters
and have at least a known upper limit on the inconsistency interval.
On Fri, Dec 7, 2018 at 2:11 AM Maciej Smoleński wrote:
> On Fri, Dec 7, 2018 at 3:03 AM Michael Borokhovich
> wrote:
>
> > We are planning to run
Yes, I agree, our system should be able to tolerate two leaders for a short
and bounded period of time.
Thank you for the help!
On Thu, Dec 6, 2018 at 11:09 AM Jordan Zimmerman
wrote:
> > it seems like the
> > inconsistency may be caused by the partition of the Zookeeper cluster
> > itself
>
>
Makes sense. Thanks, Ted. We will design our system to cope with the short
periods where we might have two leaders.
On Thu, Dec 6, 2018 at 11:03 PM Ted Dunning wrote:
> ZK is able to guarantee that there is only one leader for the purposes of
> updating ZK data. That is because all commits have
On Fri, Dec 7, 2018 at 3:03 AM Michael Borokhovich
wrote:
> We are planning to run Zookeeper nodes embedded with the client nodes.
> I.e., each client runs also a ZK node. So, network partition will
> disconnect a ZK node and not only the client.
> My concern is about the following statement
ZK is able to guarantee that there is only one leader for the purposes of
updating ZK data. That is because all commits have to originate with the
current quorum leader and then be acknowledged by a quorum of the current
cluster. IF the leader can't get enough acks, then it has de facto lost
We are planning to run Zookeeper nodes embedded with the client nodes.
I.e., each client runs also a ZK node. So, network partition will
disconnect a ZK node and not only the client.
My concern is about the following statement from the ZK documentation:
"Timeliness: The clients view of the system
Tweak timeout is tempting as your solution might work most of the time yet
fail in certain cases (which others have pointed out). If the goal is
absolute correctness then we should avoid timeout, which does not guarantee
correctness as it only makes the problem hard to manifest. Fencing is the
> Old service leader will detect network partition max 15 seconds after it
> happened.
If the old service leader is in a very long GC it will not detect the
partition. In the face of VM pauses, etc. it's not possible to avoid 2 leaders
for a short period of time.
-JZ
Hello,
Ensuring reliability requires to use consensus directly in your service or
change the service to use distributed log/journal (e.g. bookkeeper).
However following idea is simple and in many situation good enough.
If you configure session timeout to 15 seconds - then zookeeper client will
> it seems like the
> inconsistency may be caused by the partition of the Zookeeper cluster
> itself
Yes - there are many ways in which you can end up with 2 leaders. However, if
properly tuned and configured, it will be for a few seconds at most. During a
GC pause no work is being done anyway.
Thanks Jordan,
Yes, I will try Curator.
Also, beyond the problem described in the Tech Note, it seems like the
inconsistency may be caused by the partition of the Zookeeper cluster
itself. E.g., if a "leader" client is connected to the partitioned ZK node,
it may be notified not at the same time
It is not possible to achieve the level of consistency you're after in an
eventually consistent system such as ZooKeeper. There will always be an edge
case where two ZooKeeper clients will believe they are leaders (though for a
short period of time). In terms of how it affects Apache Curator,
tor.java#L340
it can guarantee exactly one leader all the time(EPHEMERAL_SEQUENTIAL zk-node)
which has not too much correlations with the network partitions of zk ensembles
itself.
I guess,haha!
- 原始邮件 -
发件人:Michael Borokhovich
收件人:dev@zookeeper.apache.org, maoling199210...@sina.com
Michael,
Leader election is not enough.
You must have some mechanism to fence off the partitioned leader.
If you are building a replicated state machine Apache Zookeeper + Apache
Bookkeeper can be a good choice
See this just an example:
https://github.com/ivankelly/bookkeeper-tutorial
This is
Thanks, I will check it out.
However, do you know if it gives any better guarantees?
Can it happen that we end up with 2 leaders or 0 leader for some period of
time (for example, during network delays/partitions)?
On Wed, Dec 5, 2018 at 10:54 PM 毛蛤丝 wrote:
> suggest you use the ready-made
20 matches
Mail list logo