[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid
[ https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910531#action_12910531 ] Diogo commented on ZOOKEEPER-869: - While trying to implement this, I found an interesting issue. Say we have an ensemble with 3 nodes. Say we start all nodes together and all have the state synchronized, meaning, all replicas return the same value with ZKDatabase().getLastLoggedZxid(). It seems that the leader will send a snapshot to all followers, although that is not necessary. They need no state transfer. The leader (quorum/Leader.java:283) reads its lastLoggedZxid() and adds a new epoch on it and stores it as lastProposed. In LearnerHandler.java:308 the thread will decide if the replica needs an empty DIFF otherwise a SNAP. (I am assuming the state of the system described above). But startForwarding will return lastProposed, which is necessarily larger than any other zxid. Then SNAP will be selected and sent. Here there is the part of an output, where 2 replicas have the same state stored and one is behind. 2010-09-17 12:11:27,296 [myid:3] - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:files...@82] - Reading snapshot /tmp/zoo3/version-2/snapshot.7 2010-09-17 12:11:27,298 [myid:3] - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:files...@82] - Reading snapshot /tmp/zoo3/version-2/snapshot.7 2010-09-17 12:11:27,301 [myid:3] - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:filetxnsnap...@208] - Snapshotting: 7 2010-09-17 12:11:27,303 [myid:3] - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:lea...@285] - lastLoggedZxid = 7 lastProposed = 8 -- added line just after leader sets its lastProposed 2010-09-17 12:11:27,309 [myid:3] - INFO [LearnerHandler-/127.0.0.1:48318:learnerhand...@247] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@12d3205 2010-09-17 12:11:27,310 [myid:3] - WARN [LearnerHandler-/127.0.0.1:48318:learnerhand...@326] - Sending snapshot last zxid of peer is 0x7 zxid of leader is 0x8 -- snapshot being sent! 2010-09-17 12:11:27,312 [myid:3] - WARN [LearnerHandler-/127.0.0.1:48318:lea...@474] - Commiting zxid 0x8 from /127.0.0.1:2890 not first! 2010-09-17 12:11:27,313 [myid:3] - WARN [LearnerHandler-/127.0.0.1:48318:lea...@476] - First is 0 2010-09-17 12:11:27,313 [myid:3] - INFO [LearnerHandler-/127.0.0.1:48318:lea...@500] - Have quorum of supporters; starting up and setting last processed zxid: 34359738368 2010-09-17 12:11:28,290 [myid:3] - INFO [LearnerHandler-/127.0.0.1:48319:learnerhand...@247] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@1319c 2010-09-17 12:11:28,291 [myid:3] - WARN [LearnerHandler-/127.0.0.1:48319:learnerhand...@326] - Sending snapshot last zxid of peer is 0x6 zxid of leader is 0x8 this follower needs the snapshot. Am I understanding something wrong? Support for election of leader with arbitrary zxid -- Key: ZOOKEEPER-869 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869 Project: Zookeeper Issue Type: New Feature Reporter: Diogo Priority: Minor Currently, the leader election algorithm implemented guarantees that the leader has the maximum zxid of the ensemble. The state synchronization after the election was built based on this assumption. However, other leader elections algorithms might elect leaders with arbitrary zxid. To support other leader election algorithms, the state synchronization should allow the leader to have an arbitrary zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid
[ https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910643#action_12910643 ] Benjamin Reed commented on ZOOKEEPER-869: - this is a good observation diogo, but i think you may be characterizing it improperly. the problem is that when we do a leadership we increment the epoch and propose a new leader, so all other processes will be much lower than the leader. when a follower connects we figure out how far behind the follower is by comparing the lastProposed zxids and taking the difference. we should really be using the recent history to do the comparison. as a side note, if we were to chose not to take the maximum zxid during recovery, we need to make sure that we still cover all committed messages. Support for election of leader with arbitrary zxid -- Key: ZOOKEEPER-869 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869 Project: Zookeeper Issue Type: New Feature Reporter: Diogo Priority: Minor Currently, the leader election algorithm implemented guarantees that the leader has the maximum zxid of the ensemble. The state synchronization after the election was built based on this assumption. However, other leader elections algorithms might elect leaders with arbitrary zxid. To support other leader election algorithms, the state synchronization should allow the leader to have an arbitrary zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener
FileTxnSnapLog.restore does not call listener - Key: ZOOKEEPER-874 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Diogo Priority: Trivial FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener
[ https://issues.apache.org/jira/browse/ZOOKEEPER-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diogo updated ZOOKEEPER-874: Status: Patch Available (was: Open) Affects Version/s: 3.3.1 Fix Version/s: 3.4.0 FileTxnSnapLog.restore does not call listener - Key: ZOOKEEPER-874 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.3.1 Reporter: Diogo Priority: Trivial Fix For: 3.4.0 FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener
[ https://issues.apache.org/jira/browse/ZOOKEEPER-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diogo updated ZOOKEEPER-874: Attachment: commitlog-listener.patch FileTxnSnapLog.restore does not call listener - Key: ZOOKEEPER-874 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.3.1 Reporter: Diogo Priority: Trivial Fix For: 3.4.0 Attachments: commitlog-listener.patch FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
build.xml from 3.3.1 distribution has version=3.3.2-dev
Ended up chasing our tail for a while today because the ant build.xml in the 3.3.1 distribution on the website has version is actually 3.3.2-dev. Any particular reason for that? The jar included in the distribution is obviously labeled correctly, and the version number in its manifest is 3.3.1, so it appears that the build.xml was modified and bundled up for distribution after the binary was built. Not sure if it's worth fixing in the distribution, but thought I'd at least mention it. -Dave Wright
Re: build.xml from 3.3.1 distribution has version=3.3.2-dev
Hi Dave, while it may appear that way, it's not the case. When building a release we run the following command: ant -Dversion=3.3.1 ... which overrides any setting in build.xml. This is documented in our release process (we pretty much follow what hadoop does, although some of the maven repo details are a bit different) http://wiki.apache.org/hadoop/ZooKeeper/HowToRelease Patrick On Fri, Sep 17, 2010 at 8:55 AM, Dave Wright wrig...@gmail.com wrote: Ended up chasing our tail for a while today because the ant build.xml in the 3.3.1 distribution on the website has version is actually 3.3.2-dev. Any particular reason for that? The jar included in the distribution is obviously labeled correctly, and the version number in its manifest is 3.3.1, so it appears that the build.xml was modified and bundled up for distribution after the binary was built. Not sure if it's worth fixing in the distribution, but thought I'd at least mention it. -Dave Wright
[jira] Updated: (ZOOKEEPER-831) BookKeeper: Throttling improved for reads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-831: Status: Resolved (was: Patch Available) Resolution: Fixed Committed revision 998200. thanx for the fix flavio and ivan for the reviews! BookKeeper: Throttling improved for reads - Key: ZOOKEEPER-831 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-831 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, ZOOKEEPER-831.patch Reads and writes in BookKeeper are asymmetric: a write request writes one entry, whereas a read request may read multiple requests. The current implementation of throttling only counts the number of read requests instead of counting the number of entries being read. Consequently, a few read requests reading a large number of entries each will spawn a large number of read-entry requests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-875) ResponderThread and udpSocket should be move from QuorumPeer to LeaderElection
ResponderThread and udpSocket should be move from QuorumPeer to LeaderElection -- Key: ZOOKEEPER-875 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-875 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Diogo Priority: Trivial Part of the algorithm implemented in the class LeaderElection is inside QuorumPeer. Is there any reason for that? ResponderThread and udpSocket belong to LeaderElection class and should be moved in LeaderElection.java. That would make the code look cleaner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910684#action_12910684 ] Hadoop QA commented on ZOOKEEPER-794: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12454780/ZOOKEEPER-794_4.patch.txt against trunk revision 998200. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/116/console This message is automatically generated. Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt, ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch, ZOOKEEPER-794_4.patch.txt I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch I'm adding a test to the patch. It tries to send a message to an address for which a connection request receives no response, so it has to timeout. The test then checks that the amount of time elapsed is less than 6s (the timeout value is hardcoded 5s). Raising the timeout from 5s to say 7s makes the test fail. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Patch Available (was: Open) Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Attaching patch for 3.3.2. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.