date:20100917

[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid

2010-09-17 Thread Diogo (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910531#action_12910531
 ] 

Diogo commented on ZOOKEEPER-869:
-

While trying to implement this, I found an interesting issue. Say we have an 
ensemble with 3 nodes. Say we start all nodes together and all have the state 
synchronized, meaning, all replicas return the same value with 
ZKDatabase().getLastLoggedZxid(). It seems that the leader will send a snapshot 
to all followers, although that is not necessary. They need no state transfer.

The leader (quorum/Leader.java:283) reads its lastLoggedZxid() and adds a new 
epoch on it and stores it as lastProposed. In LearnerHandler.java:308 the 
thread will decide if the replica needs an empty DIFF otherwise a SNAP. (I am 
assuming the state of the system described above). But startForwarding will 
return lastProposed, which is necessarily larger than any other zxid. Then SNAP 
will be selected and sent.

Here there is the part of an output, where 2 replicas have the same state 
stored and one is behind.

2010-09-17 12:11:27,296 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:files...@82] - Reading snapshot 
/tmp/zoo3/version-2/snapshot.7
2010-09-17 12:11:27,298 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:files...@82] - Reading snapshot 
/tmp/zoo3/version-2/snapshot.7
2010-09-17 12:11:27,301 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:filetxnsnap...@208] - Snapshotting: 7
2010-09-17 12:11:27,303 [myid:3] - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:lea...@285] - lastLoggedZxid = 7 
lastProposed = 8   -- added line just after leader sets its 
lastProposed
2010-09-17 12:11:27,309 [myid:3] - INFO  
[LearnerHandler-/127.0.0.1:48318:learnerhand...@247] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@12d3205
2010-09-17 12:11:27,310 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48318:learnerhand...@326] - Sending snapshot last 
zxid of peer is 0x7  zxid of leader is 0x8   -- snapshot 
being sent!
2010-09-17 12:11:27,312 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48318:lea...@474] - Commiting zxid 0x8 from 
/127.0.0.1:2890 not first!
2010-09-17 12:11:27,313 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48318:lea...@476] - First is 0
2010-09-17 12:11:27,313 [myid:3] - INFO  
[LearnerHandler-/127.0.0.1:48318:lea...@500] - Have quorum of supporters; 
starting up and setting last processed zxid: 34359738368
2010-09-17 12:11:28,290 [myid:3] - INFO  
[LearnerHandler-/127.0.0.1:48319:learnerhand...@247] - Follower sid: 2 : info : 
org.apache.zookeeper.server.quorum.quorumpeer$quorumser...@1319c
2010-09-17 12:11:28,291 [myid:3] - WARN  
[LearnerHandler-/127.0.0.1:48319:learnerhand...@326] - Sending snapshot last 
zxid of peer is 0x6  zxid of leader is 0x8   this follower 
needs the snapshot.


Am I understanding something wrong?

 Support for election of leader with arbitrary zxid
 --

 Key: ZOOKEEPER-869
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869
 Project: Zookeeper
  Issue Type: New Feature
Reporter: Diogo
Priority: Minor

 Currently, the leader election algorithm implemented guarantees that the 
 leader has the maximum zxid of the ensemble. The state synchronization after 
 the election was built based on this assumption. However, other leader 
 elections algorithms might elect leaders with arbitrary zxid. 
 To support other leader election algorithms, the state synchronization should 
 allow the leader to have an arbitrary zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid

2010-09-17 Thread Benjamin Reed (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910643#action_12910643
]

Benjamin Reed commented on ZOOKEEPER-869:
-

this is a good observation diogo, but i think you may be characterizing it
improperly. the problem is that when we do a leadership we increment the epoch
and propose a new leader, so all other processes will be much lower than the
leader. when a follower connects we figure out how far behind the follower is
by comparing the lastProposed zxids and taking the difference. we should really
be using the recent history to do the comparison.

as a side note, if we were to chose not to take the maximum zxid during
recovery, we need to make sure that we still cover all committed messages.

Support for election of leader with arbitrary zxid
--

Key: ZOOKEEPER-869
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869
Project: Zookeeper
Issue Type: New Feature
Reporter: Diogo
Priority: Minor

Currently, the leader election algorithm implemented guarantees that the
leader has the maximum zxid of the ensemble. The state synchronization after
the election was built based on this assumption. However, other leader
elections algorithms might elect leaders with arbitrary zxid.
To support other leader election algorithms, the state synchronization should
allow the leader to have an arbitrary zxid.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener

2010-09-17 Thread Diogo (JIRA)

FileTxnSnapLog.restore does not call listener
-

 Key: ZOOKEEPER-874
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Reporter: Diogo
Priority: Trivial


FileTxnSnapLog.restore() does not call listener passed as parameter. The result 
is that the commitLogs list is empty. When a follower connects to the leader, 
the leader is forced to send a snapshot to the follower instead of a couple of 
requests and commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener

2010-09-17 Thread Diogo (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diogo updated ZOOKEEPER-874:


   Status: Patch Available  (was: Open)
Affects Version/s: 3.3.1
Fix Version/s: 3.4.0

 FileTxnSnapLog.restore does not call listener
 -

 Key: ZOOKEEPER-874
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.3.1
Reporter: Diogo
Priority: Trivial
 Fix For: 3.4.0


 FileTxnSnapLog.restore() does not call listener passed as parameter. The 
 result is that the commitLogs list is empty. When a follower connects to the 
 leader, the leader is forced to send a snapshot to the follower instead of a 
 couple of requests and commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener

2010-09-17 Thread Diogo (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diogo updated ZOOKEEPER-874:


Attachment: commitlog-listener.patch

 FileTxnSnapLog.restore does not call listener
 -

 Key: ZOOKEEPER-874
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.3.1
Reporter: Diogo
Priority: Trivial
 Fix For: 3.4.0

 Attachments: commitlog-listener.patch


 FileTxnSnapLog.restore() does not call listener passed as parameter. The 
 result is that the commitLogs list is empty. When a follower connects to the 
 leader, the leader is forced to send a snapshot to the follower instead of a 
 couple of requests and commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

build.xml from 3.3.1 distribution has version=3.3.2-dev

2010-09-17 Thread Dave Wright

Ended up chasing our tail for a while today because the ant build.xml
in the 3.3.1 distribution on the website has version is actually
3.3.2-dev. Any particular reason for that? The jar included in the
distribution is obviously labeled correctly, and the version number in
its manifest is 3.3.1, so it appears that the build.xml was modified
and bundled up for distribution after the binary was built. Not sure
if it's worth fixing in the distribution, but thought I'd at least
mention it.

-Dave Wright

Re: build.xml from 3.3.1 distribution has version=3.3.2-dev

2010-09-17 Thread Patrick Hunt

Hi Dave, while it may appear that way, it's not the case. When building a
release we run the following command:

ant -Dversion=3.3.1 ...

which overrides any setting in build.xml. This is documented in our release
process (we pretty much follow what hadoop does, although some of the maven
repo details are a bit different)
http://wiki.apache.org/hadoop/ZooKeeper/HowToRelease

Patrick

On Fri, Sep 17, 2010 at 8:55 AM, Dave Wright wrig...@gmail.com wrote:

 Ended up chasing our tail for a while today because the ant build.xml
 in the 3.3.1 distribution on the website has version is actually
 3.3.2-dev. Any particular reason for that? The jar included in the
 distribution is obviously labeled correctly, and the version number in
 its manifest is 3.3.1, so it appears that the build.xml was modified
 and bundled up for distribution after the binary was built. Not sure
 if it's worth fixing in the distribution, but thought I'd at least
 mention it.

 -Dave Wright

[jira] Updated: (ZOOKEEPER-831) BookKeeper: Throttling improved for reads

2010-09-17 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-831:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed revision 998200.

thanx for the fix flavio and ivan for the reviews!

 BookKeeper: Throttling improved for reads
 -

 Key: ZOOKEEPER-831
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-831
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-831.patch, ZOOKEEPER-831.patch, 
 ZOOKEEPER-831.patch, ZOOKEEPER-831.patch


 Reads and writes in BookKeeper are asymmetric: a write request writes one 
 entry, whereas a read request may read multiple requests. The current 
 implementation of throttling only counts the number of read requests instead 
 of counting the number of entries being read. Consequently, a few read 
 requests reading a large number of entries each will spawn a large number of 
 read-entry requests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-875) ResponderThread and udpSocket should be move from QuorumPeer to LeaderElection

2010-09-17 Thread Diogo (JIRA)

ResponderThread and udpSocket should be move from QuorumPeer to LeaderElection
--

 Key: ZOOKEEPER-875
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-875
 Project: Zookeeper
  Issue Type: Improvement
  Components: leaderElection
Affects Versions: 3.3.1
Reporter: Diogo
Priority: Trivial


Part of the algorithm implemented in the class LeaderElection is inside 
QuorumPeer. Is there any reason for that? ResponderThread and udpSocket belong 
to LeaderElection class and should be moved in LeaderElection.java. That would 
make the code look cleaner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed

2010-09-17 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910684#action_12910684
]

Hadoop QA commented on ZOOKEEPER-794:
-

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12454780/ZOOKEEPER-794_4.patch.txt
against trunk revision 998200.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/116/console

This message is automatically generated.

Callbacks are not invoked when the client is closed
---

Key: ZOOKEEPER-794
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794
Project: Zookeeper
Issue Type: Bug
Components: java client
Affects Versions: 3.3.1
Reporter: Alexis Midon
Assignee: Alexis Midon
Fix For: 3.3.2, 3.4.0

Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt,
ZOOKEEPER-794_2.patch, ZOOKEEPER-794_3.patch, ZOOKEEPER-794_4.patch.txt

I noticed that ZooKeeper has different behaviors when calling synchronous or
asynchronous actions on a closed ZooKeeper client.
Actually a synchronous call will throw a session expired exception while an
asynchronous call will do nothing. No exception, no callback invocation.
Actually, even if the EventThread receives the Packet with the session
expired err code, the packet is never processed since the thread has been
killed by the ventOfDeath. So the call back is not invoked.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-09-17 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Attachment: ZOOKEEPER-822.patch

I'm adding a test to the patch. It tries to send a message to an address for
which a connection request receives no response, so it has to timeout. The test
then checks that the amount of time elapsed is less than 6s (the timeout value
is hardcoded 5s). Raising the timeout from 5s to say 7s makes the test fail.

Leader election taking a long time to complete
---

Key: ZOOKEEPER-822
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
Project: Zookeeper
Issue Type: Bug
Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
Fix For: 3.3.2, 3.4.0

Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1

Created a 3 node cluster.
1 Fail the ZK leader
2. Let leader election finish. Restart the leader and let it join the
3. Repeat
After a few rounds leader election takes anywhere 25- 60 seconds to finish.
Note- we didn't have any ZK clients and no new znodes were created.
zoo.cfg is shown below:
#Mon Jul 19 12:15:10 UTC 2010
server.1=192.168.4.12\:2888\:3888
server.0=192.168.4.11\:2888\:3888
clientPort=2181
dataDir=/var/zookeeper
syncLimit=2
server.2=192.168.4.13\:2888\:3888
initLimit=5
tickTime=2000
I have attached logs from two nodes that took a long time to form the cluster
after failing the leader. The leader was down anyways so logs from that node
shouldn't matter.
Look for START HERE. Logs after that point should be of our interest.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-09-17 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Status: Patch Available (was: Open)

Leader election taking a long time to complete
---

Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-09-17 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Attachment: ZOOKEEPER-822-3.3.2.patch

Attaching patch for 3.3.2.

Leader election taking a long time to complete
---

Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

2010-09-17 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Attachment: ZOOKEEPER-822.patch

Leader election taking a long time to complete
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid

[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid

[jira] Created: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener

[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener

[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener

build.xml from 3.3.1 distribution has version=3.3.2-dev

Re: build.xml from 3.3.1 distribution has version=3.3.2-dev

[jira] Updated: (ZOOKEEPER-831) BookKeeper: Throttling improved for reads

[jira] Created: (ZOOKEEPER-875) ResponderThread and udpSocket should be move from QuorumPeer to LeaderElection

[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

14 matches

Site Navigation

Mail list logo

Footer information