date:20091110

[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Flavio Paiva Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775947#action_12775947
]

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

Henry, good job so far. Please bear with me for a little longer:

# Could you please update the changes to the test files? Due to a few recently
committed patches they don't apply to trunk any longer;
# Could you make sure to remove all unnecessary LOG statements? Some of them
look like messages you used for your own debugging (they start with HNR) and
others are commented out. I think I've seen a TODO comment as well;
# It sounds like this feature works with both majority and hierarchical
quorums. Is it correct? Can I have observers with hierarchical quorums?

This might be a little late for this patch now, but for future patches that
introduce features like this, it is probably a good idea to have a brief design
document explaining changes to the protocol and to ensemble configuration.

Observers
-

Key: ZOOKEEPER-368
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
Project: Zookeeper
Issue Type: New Feature
Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
Attachments: obs-refactor.patch, observer-refactor.patch, observers
sync benchmark.png, observers.patch, ZOOKEEPER-368.patch,
ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch,
ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch

Currently, all servers of an ensemble participate actively in reaching
agreement on the order of ZooKeeper transactions. That is, all followers
receive proposals, acknowledge them, and receive commit messages from the
leader. A leader issues commit messages once it receives acknowledgments from
a quorum of followers. For cross-colo operation, it would be useful to have a
third role: observer. Using Paxos terminology, observers are similar to
learners. An observer does not participate actively in the agreement step of
the atomic broadcast protocol. Instead, it only commits proposals that have
been accepted by some quorum of followers.
One simple solution to implement observers is to have the leader forwarding
commit messages not only to followers but also to observers, and have
observers applying transactions according to the order followers agreed upon.
In the current implementation of the protocol, however, commit messages do
not carry their corresponding transaction payload because all servers
different from the leader are followers and followers receive such a payload
first through a proposal message. Just forwarding commit messages as they
currently are to an observer consequently is not sufficient. We have a couple
of options:
1- Include the transaction payload along in commit messages to observers;
2- Send proposals to observers as well.
Number 2 is simpler to implement because it doesn't require changing the
protocol implementation, but it increases traffic slightly. The performance
impact due to such an increase might be insignificant, though.
For scalability purposes, we may consider having followers also forwarding
commit messages to observers. With this option, observers can connect to
followers, and receive messages from followers. This choice is important to
avoid increasing the load on the leader with the number of observers.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Patrick Hunt (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775956#action_12775956
]

Patrick Hunt commented on ZOOKEEPER-368:

Henry, I don't see any docs for this in src/docs. I suggest that you start a
new document (new xml file) for this feature, it should explain why/how(torun)
at the very
least -- so that potential users can come up to speed.

Flavio, could you also review the comments on this JIRA as part of your commit
review? We should make sure that either all of the issues are addressed,
or at the very least new JIRAs are created (Henry could you do this?) for the
pending items so that we don't lose the comments/concerns/issues that have been
identified
previously (this is a major new/visible feature so I think it warrants the
extra time/effort).

Observers
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Henry Robinson (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776004#action_12776004
]

Henry Robinson commented on ZOOKEEPER-368:
--

Hi Falvio / Patrick -

Thanks for your comments!

Design document - there's a brief writeup at
http://wiki.apache.org/hadoop/ZooKeeper/Observers which very broadly covers the
design. I will update it when I get a moment to do so.

User documentation - yes, will do, already on my to do list. There is a section
in the above wiki page that will be a good start.

Quorums - yes, it should work with all mechanisms. The only caveat is that it
only works with the simple LeaderElection protocol, which presumes a majority
quorum approach (there are lines where votes quorum.size() / 2 is hardcoded
rather than using the verifier - I think this is the source of at least one of
the to-dos).

Debug messages: ugh, sorry about that. Will update the patch to build against
trunk shortly and remove those messages.

Observers
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-472) Making DataNode not instantiate a HashMap when the node is ephmeral

2009-11-10 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-472:
---

Status: Patch Available  (was: Open)

Is this ready? I see some updates, throwing into the patch queue, reviewer 
please be sure all the comments are addressed.

 Making DataNode not instantiate a HashMap when the node is ephmeral
 ---

 Key: ZOOKEEPER-472
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-472
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.2.0, 3.1.1
Reporter: Erik Holstad
Assignee: Erik Holstad
Priority: Minor
 Fix For: 3.3.0

 Attachments: zookeeper-472.patch, zookeeper-472.patch, 
 zookeeper-472.patch, zookeeper-472.patch


 Looking at the code, there is an overhead of a HashSet object for that nodes 
 children, even though the node might be an ephmeral node and cannot have 
 children.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Henry Robinson (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776233#action_12776233
]

Henry Robinson commented on ZOOKEEPER-368:
--

I just put up a set of notes on the patch on the wiki here:
http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide to help make the
review a little less painful - although non-comprehensive, it should help
explain most of the major code changes.

An updated patch will follow very shortly.

Observers
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-368) Observers

2009-11-10 Thread Henry Robinson (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Henry Robinson updated ZOOKEEPER-368:
-

Attachment: ZOOKEEPER-368.patch

Updated patch - removed some erroneous debugging logs, made a slight
improvement to one test.

Please see review guide at
http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide - comments on any
further tests required would be particularly welcome.

Observers
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-575) remove System.exit calls to make the server more container friendly

2009-11-10 Thread Patrick Hunt (JIRA)

remove System.exit calls to make the server more container friendly
---

 Key: ZOOKEEPER-575
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-575
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
 Fix For: 3.3.0


There are a handful of places left in the code that still use System.exit, we 
should remove these to make the server
more container friendly.

There are some legitimate places for the exits - in *Main.java for example 
should be fine - these are the command
line main routines. Containers should be embedding code that runs just below 
this layer (or we should refactor
so that it would).

The tricky bit is ensuring the server shuts down in case of an unrecoverable 
error occurring, afaik these are the
locations where we still have sys exit calls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Possible race in LETest.java

2009-11-10 Thread Patrick Hunt

Closing the loop - what's the status on this? Can one of you open a 
JIRA and provide a patch for this?


Thanks,

Patrick

Flavio Junqueira wrote:
Hi Henry, Apologies for the the delay. Your observation sounds right to 
me. Here is how I'm reading it; let me know if it makes sense.


If everyone votes for 3 in the second round and 3 has crashed, then in 
countVotes we will remove all votes to 3 and there will be no vote left. 
In such a case, there will be no winner as a result of the call to 
countVotes and lookForLeader won't change the current vote 
(LeaderElection.java:201). This is a situation in which we are stuck.


Does it sound reasonable to add an else to the if statement of 
LeaderElection.java:201 to reset the vote? This modification would 
implementing resetting the vote when countVotes returns no winner, which 
should happen only when the replica itself votes for a dead leader.


-Flavio

On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote:

[ Sending this direct since the Apache mailserver is rejecting my 
e-mails at the moment ]


As I understand it, 1 and 2 receive a vote for 3 in the first round, 
which causes them to vote for 3 in the second round. So in the second 
round, all votes cast are for 3. But 3 has died, so all votes for it 
are discounted. 1 and 2 continue to vote for 3 ad infinitum, never 
resetting their vote.


Does this sound plausible, or am I missing something?

cheers,
Henry

On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira f...@yahoo-inc.com 
wrote:
Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in 
your situation. If they exclude 3 in countVotes, then countVotes will 
end up returning 2 and not 3, assuming there is a vote for 2. What am 
I missing?


The problem with QuorumPeer you're pointing at was also an issue with 
the FLE tests, and I couldn't see an easy way around it other than 
timing out and restarting leader election.


Cheers,
-Flavio


On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:

I've been working on adding a TCPResponderThread to the leader election
process so that if a deployment needs to be TCP only, it can be and still
use all election types. Testing this has exposed what might be a race
condition in the leader election code that prevents a leader from being
elected.

Here's the behaviour I see in LETest occasionally. With three nodes 
(reduced
from 30 for ease of debugging), node 3 gets elected before either node 
1 or
node 2 finish their election (there is one round where each node that 
3 has
the highest id, and then 3 completes its second round by receiving 
votes for

itself from 1 and 2, but 1 and 2 do not receive votes from 3).

Now 3 is killed by the test harness. 1 and 2 are still voting for it, but
every time they try, the vote tally excludes 3 since it hasn't been heard
from. They then spin round the voting process, unable to reset their 
vote. I

expect that the heartbeat mechanism in a running QuorumPeer takes care of
this when the leader is lost, but the associated QuorumPeers aren't 
running.


If this is the case, then there is a simple fix to reset the nodes 
vote to

themselves if they are voting for a node that hasn't been heard from. I
don't know why using TCP instead of UDP for the responder thread is
exacerbating this (and we can't rule out my introducing a bug :)); but as
it's a race condition the different timings associated with waiting on 
a TCP

socket might just be enough to expose the issue.

Can someone verify this might be possible / figure out what I missed?

cheers,
Henry

[jira] Commented: (ZOOKEEPER-368) Observers

[jira] Commented: (ZOOKEEPER-368) Observers

[jira] Commented: (ZOOKEEPER-368) Observers

[jira] Updated: (ZOOKEEPER-472) Making DataNode not instantiate a HashMap when the node is ephmeral

[jira] Commented: (ZOOKEEPER-368) Observers

[jira] Updated: (ZOOKEEPER-368) Observers

[jira] Created: (ZOOKEEPER-575) remove System.exit calls to make the server more container friendly

Re: Possible race in LETest.java

8 matches

Site Navigation

Mail list logo

Footer information