[jira] Commented: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731711#action_12731711 ] Raghu S commented on ZOOKEEPER-368: --- Henry, I see a new compile error with your new patch: [javac] Compiling 2 source files to C:\EclipseWorkspace\ZK320\ZooKeeper320Source\build\classes [javac] C:\EclipseWorkspace\ZK320\ZooKeeper320Source\src\java\main\org\apache\zookeeper\server\quorum\Observer.java:165: cannot find symbol [javac] symbol : class Record [javac] location: class org.apache.zookeeper.server.quorum.Observer [javac] Record txn2 = SerializeUtils.deserializeTxn(ia2, hdr2); [javac] ^ Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731776#action_12731776 ] Raghu S commented on ZOOKEEPER-107: --- @henry, Sorry if this sounds like a repeat, thought I will summarize the error handling during view change. Could you comment if this makes sense? -- 1. Configuration change succeeds if the change is successfully committed in both the old view and the new view. An observer is promoted to a follower only after it receives a COMMIT for the new view. 2. Each peer could have two views of the cluster -- the last committed view and the last proposed view (which is created after a VIEWCHANGE proposal is received). The latter can be NULL if there is no view change attempt in progress. 2.A. Each peer will always attempt an election with the last committed view. Proposed views will be converted to committed views (or deleted) post leader election. 2.B. The proposal record of a peer contains (in addition to last logged ZXID and server ID) the last committed view of the peer 3. During election, if the last committed view of the peer with the smaller ZXID (P(ZXLOW)) is different from the last committed view of the peer with the higher ZXID (P(ZXHIGH), then P(ZXLOW) adapts P(ZXHIGH)'s last committed view and broadcasts the adapted view to all other peers. 3.A. Two nodes with the same ZXID should have the same committed views 3.B. If the last committed views of P(ZXLOW) and P(ZXHIGH) are the same, but P(ZXHIGH) has a proposed new view (not committed yet though), that view will not be considered by both the peers during election. Similarly, if the N(ZXLOW) has a proposed view, that will not be considered either. 3.C. If P(ZXLOW) adapts P(ZXHIGH)'s last committed view and that view doesn't include P(ZXLOW), P(ZXLOW) drops out of election (should it self destruct??) 4. Once a leader is elected, it will sync up the logs of the followers that are lagging behind just like it's done today: - If there is a follower who's last committed view is different from the leader's, log synchronization will make sure follower's last committed view gets updated to be in sync with the leader's. Follower doesn't do anything when its last committed view changes (the new view MUST have the follower since 3.C prevents a follower that is not in the leading candidate's committed view from successfully completing an election) - If there is an observer who upon log synchronization learns that the committed view includes the observer, the observer will promote itself to a follower - If a follower with a proposed view joins an already established leader who doesn't know about that proposed view, the follower's proposed view will be erased when the leader synchronizes the followers log - If the leader has a proposed new view in its log, the leader will send a COMMIT for the new view after majority peers in the old view and the new view have synced their log to the leader's log 4.A. The view change COMMIT doesn't mean much for the followers that are not impacted by the view change 4.B. The observer that gets view change COMMIT will promote itself to a follower if the new view includes the observer 4.C. The follower that gets the view change will drop out of the cluster if the new view doesn't include the follower 4.D. The leader will drop out of the cluster once COMMIT is delivered locally if the new view doesn't include the leader. This will result in a new election. 4.E. The leader will adjust the quorum size as per the new view otherwise. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730384#action_12730384 ] Raghu S commented on ZOOKEEPER-107: --- Sorry to jump around bit, I thought I will mention this if we haven't already talked about it. How do we plan to deal with a situation when a set of nodes can form a majority but can't form an ensemble because one or more peers have a grossly outdated configuration? Say an ensemble of ABCDE moved to EFGHI while E was offline and only EFG are up? They form a majority but can't form an ensemble since E doesn't know about any of the other servers yet? One way to address this is to implement an out of band synchronization mechanism in which E will realize that the ensemble has changed when F and G try to connect to E and have one them synchronize E's logs since their last know zxids are ahead of E's. E can then attempt to restart an election. Also, it is possible that F and G could see different ensembles (F is a bit out dated, G is the most up to date), in which case E might first sync up form F and then both E and F sync up form G if G comes online a bit later. Any simpler solutions? Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Henry Robinson Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-451) ZK should enforce quota
ZK should enforce quota --- Key: ZOOKEEPER-451 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-451 Project: Zookeeper Issue Type: Improvement Components: server Affects Versions: 3.3.0 Reporter: Raghu S Email exchange with Mahadev: Mahadev Konar wrote: Hi Raghu, We do have plans to enforce quota in future. Enforcing requires some more work then just reporting. Reporting is a good enough tool for operations to manage a zookeeper cluster but we would certainly like to enforce it in the near future. Thanks mahadev On 6/18/09 7:01 PM, rag...@yahoo.com rag...@yahoo.com wrote: Is there a reason why node count/byte quota is not actually enforced but rather ZK just warns? Are there any plans to enforce the quota in a future release? Thanks Raghu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721554#action_12721554 ] Raghu S commented on ZOOKEEPER-107: --- That sounds great! I know this is a complex task and lot of work, can live with the kinks in the beginning. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721566#action_12721566 ] Raghu S commented on ZOOKEEPER-107: --- Henry, the JIRA is unassigned. You might want to assign it to yourself. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720334#action_12720334 ] Raghu S commented on ZOOKEEPER-107: --- Ben, to be honest, I wasn't thinking batch addition/deletion. I was thinking we will allow only one node to join or leave the cluster at a time, in which case we won't end up in a split brain. One thing I am still missing is, how do we plan to reconcile the divergence in conifguration info during leader election if we use ZAB? With ZAB, we go ahead and write to the log as soon as a PROPOSAL is sent. COMMIT is used only to notify the servers that the a majority have logged the update and the clients can start reading the new update. So I am not really seeing how this will help configuration change. Now in the example that you bring up, if D, E and F have logged the new view and all the nodes are brought up after a power cycle, a split brain could still occur, no? Should we allow only one node to be added/deleted at a time? Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720411#action_12720411 ] Raghu S commented on ZOOKEEPER-107: --- Ben, I still believe the split brain won't occur: A. After (2), A and C have config verion X + 1, B and D are at X B. After A dies, a leader election is not possible without C. During LE, B and D discover that C is at X + 1. This will force B and D to update their configuration to X + 1 and restart the election. This is what I refer to when I say reconciling configuration divergence in my write up. D now leaves the cluster since it just learnt that it was deleted. C. A new quorum is formed with B and C. D. When A comes back, config version of A B and C are the same. A will simply join the leader. If A were still at X, then it will first update it's configuration to X + 1 when it starts an election and then restart the election. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Attachments: SimpleAddition.rtf Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706697#action_12706697 ] Raghu S commented on ZOOKEEPER-107: --- I think there are some corner cases that may make the leader election impossible during a node addition. Say the current config is A,B,C and the new config is A,B,C,D. When the leader is trying to commit the new configuration, the power goes out and comes back on when only A and B have logged the new configuration. Peer count in A,B,C,D = 4,4,3,3 now. An election is not possible if C is down because A and B think the majority is 3 peers and D can't participate in the election since it hasn't joined the cluster yet. It sounds like some out of band communication between an existing peer and a new peer is needed to make this thing work. If a peer restarts or notices quorum loss and if the last logged update is a node addition, the peer should try to contact the newly added server so that it can push it's log to the new peer (if the new peer doesn't already have an up to date log) and ask the new peer to restart. Until A or B do that in the above case, an election may not be possible. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.