[
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flavio Junqueira updated ZOOKEEPER-790:
---------------------------------------
Attachment: ZOOKEEPER-790.v2.patch
Thanks for pointing this issue out, Sergei. It sounds like the previous patch
solved the issue discussed without making sure that the leader was ready to
process messages when learner handlers started to read them in. This v2 patch
does a number of things:
# It moves the startup method to processAck. This way we make sure that start
up the leader as soon as we have a quorum of acks for the newleader message;
# It moves the initialization of the database in startup to a method startdata.
There are two reasons for doing it. First, it didn't sound like a good idea to
throw exceptions or catch exceptions in processAck, and they were only
necessary because of the call to startup(). Second, the method startup() in
ZooKeeperServer throws these exceptions because of loadData(), which is called
separately in Leader.lead(), so it is not necessary to call it in processAck
after hearing from a quorum;
# It waits in LearnerHandler.run() until the leader ready before it starts the
while(true) loop. I also had to receive an ack before executing the code to
wait, otherwise the leader would never receive acks and form a quorum, thus
causing the system to halt.
To get some feedback on the changes implemented in this patch, I have discussed
them with Ben. Thanks, Ben!
Sergei, I would appreciate if you could give it a try, and if you could tell me
if it works for you.
> Last processed zxid set prematurely while establishing leadership
> -----------------------------------------------------------------
>
> Key: ZOOKEEPER-790
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
> Project: Zookeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.1
> Reporter: Flavio Junqueira
> Assignee: Flavio Junqueira
> Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch,
> ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790.patch,
> ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch,
> ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch
>
>
> The leader code is setting the last processed zxid to the first of the new
> epoch even before connecting to a quorum of followers. Because the leader
> code sets this value before connecting to a quorum of followers
> (Leader.java:281) and the follower code throws an IOException
> (Follower.java:73) if the leader epoch is smaller, we have that when the
> false leader drops leadership and becomes a follower, it finds a smaller
> epoch and kills itself.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.