Re: [Zookeeper-user] Leader election

2008-07-11 Thread Flavio Junqueira
Hi Avinash, getChildren returns a list in lexicographic order, so if you are updating the children of the election node concurrently, then you may get a different first node with different clients. If you are using the sequence flag to create nodes, then you may consider stripping the prefix of

RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

2008-08-27 Thread Flavio Junqueira
Mark, Please use a port for electionPort different from the one you're using in the server configuration. Thanks, -Flavio > -Original Message- > From: mark harwood [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 27, 2008 1:12 PM > To: zookeeper-user@hadoop.apache.org > Subject: Migrat

RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

2008-08-28 Thread Flavio Junqueira
With the new leader election, we require a third port. So, there is the clientPort, there is the port servers use for communication upon regular ZooKeeper operation, and a third port for leader election among the ZooKeeper servers. The previous leader election algorithm used UDP, and the regular co

RE: Leader election stalled

2008-09-16 Thread Flavio Junqueira
Austin, Please check: https://issues.apache.org/jira/browse/ZOOKEEPER-140 Thanks, -Flavio > -Original Message- > From: Austin Shoemaker [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 16, 2008 12:22 PM > To: zookeeper-user@hadoop.apache.org > Subject: Re: Leader election stalled >

RE: Dynamic server management?

2008-11-17 Thread Flavio Junqueira
Hi Thomas, We currently don't have such a feature of adding and removing servers dynamically, although we would like to, so we'll have it eventually. Without a dynamic mechanism for adding and removing servers, your example is problematic. Suppose that you configure your ensemble to have 3 servers

RE: Dynamic server management?

2008-11-17 Thread Flavio Junqueira
HiPath Applications > > SEN LIP DA 11 > Schertlinstr. 8 > 81379 Munich, Germany > > -Ursprüngliche Nachricht- > Von: Flavio Junqueira [mailto:[EMAIL PROTECTED] > Gesendet: Montag, 17. November 2008 13:49 > An: zookeeper-user@hadoop.apache.org > Betreff: RE: Dynamic server m

Re: myid....

2009-01-04 Thread Flavio Junqueira
Hi Kevin, The admin doesn't need to create "myid". This is created automaticaly after parsing the configuration file (check QuorumPeerConfig.parse(String[]) if you are interested in the internals). -Flavio On Jan 5, 2009, at 1:10 AM, Kevin Burton wrote: This wasn't clear in the documentat

Re: myid....

2009-01-05 Thread Flavio Junqueira
uorumPeerConfig code tomorrow.. Kevin On Sun, Jan 4, 2009 at 11:58 PM, Flavio Junqueira inc.com>wrote: Hi Kevin, The admin doesn't need to create "myid". This is created automaticaly after parsing the configuration file (check QuorumPeerConfig.parse(String[]) if you are in

Re: Reconnecting to another host on failure but before session expires...

2009-01-05 Thread Flavio Junqueira
new ClientCnxn(host, sessionTimeout, this, watchManager, sessionId, sessionPasswd); } On Mon, Jan 5, 2009 at 1:51 AM, Flavio Junqueira wrote: Are you guys passing one server to the ZooKeeper constructor or a list of servers? If possible, could you provide your part of the

Re: Reconnecting to another host on failure but before session expires...

2009-01-05 Thread Flavio Junqueira
Are you guys passing one server to the ZooKeeper constructor or a list of servers? If possible, could you provide your part of the code in which you create a ZooKeeper object? Thanks, -Flavio On Jan 5, 2009, at 10:46 AM, David Yee wrote: I'm seeing this behavior as well, and I'm dealing wit

Re: group messaging, empheral nodes on zookeeper

2009-01-06 Thread Flavio Junqueira
If I understand it correctly, you propose two mechanisms: 1- Have one single node, and modify the data of that znode; 2- Have a znode, say "/broadcast", and have clients creating a new child znode under "/broadcast" for every new message they want to broadcast. In case 1), if you are propos

Re: group messaging, empheral nodes on zookeeper

2009-01-06 Thread Flavio Junqueira
ages ahead. To avoid synchronization, clients have to start at different points of the message sequence. This might not be useful to you, but I wonder if this is useful to others in this list. -Flavio On Jan 6, 2009, at 6:48 PM, Flavio Junqueira wrote: If I understand it correctly, you propose two

Re: group messaging, empheral nodes on zookeeper

2009-01-06 Thread Flavio Junqueira
On Jan 6, 2009, at 6:55 PM, Kevin Burton wrote: In case 1), if you are proposing to overwrite the content of the znode, then you would need first to make sure that all receivers have already received the previous message. This doesn't seem a good solution to me because a client that want

Re: Distributed queue: how to ensure no lost items?

2009-01-08 Thread Flavio Junqueira
You can't simply leave an element in the queue until a consumer finishes processing it, otherwise multiple consumers may end up processing it. What about the following: - Use a failure detector to detect which consumers are up; - Before removing an element from the queue, a consumer creates a

Re: SyncRequestProcessor Possible Bug

2009-01-21 Thread Flavio Junqueira
Andrew, It sounds right to me that this is a bug. In any case, I'd like to suggest that you open a jira issue and propose your patch. -Flavio On Jan 21, 2009, at 6:52 AM, Andrew Carman wrote: We were looking through the ZK server code and came across a possible bug that occurs when shutti

Re: [ANNOUNCE] Apache ZooKeeper 3.1.0

2009-02-20 Thread Flavio Junqueira
Hi Bill, I'm sorry, I missed this message initially. I'm sending below a table that gives you throughput figures for BookKeeper. The rows correspond to distinct BookKeeper configuration (ensemble size, quorum size, entry type), and the columns to different values for the length of an entry

Re: [ANNOUNCE] Apache ZooKeeper 3.1.0

2009-02-20 Thread Flavio Junqueira
Also, you may consider checking a graph that we posted comparing the performance of BookKeeper with the one of HDFS using a local file system and local+NFS in the jira issue 5189 (https://issues.apache.org/jira/browse/HADOOP-5189 ). -Flavio On Feb 20, 2009, at 10:05 AM, Flavio Junqueira

Sharing logs for research purposes

2009-04-02 Thread Flavio Junqueira
I was wondering if anyone in this list would be willing to share ZooKeeper logs for research purposes, even if anonymized. As there are a few research groups in different universities conducting research using ZooKeeper, it would be useful for them to have workloads they can work with. If y

Re: Leader Elections

2009-07-18 Thread Flavio Junqueira
Todd, I don't think a feature right now that allows you to do exactly what you're requesting. However, we have been working on a couple of features that might give you what you want: 1- Hierarchical quorums: this feature allows you to split servers into groups (perhaps mapping groups to com

Re: Leader Elections

2009-07-20 Thread Flavio Junqueira
requirements. Do bear in mind that the patch on the jira is only for discussion purposes; I would not consider it currently fit for production use. I hope to put up a much better patch this week. Henry On Sat, Jul 18, 2009 at 7:38 PM, Ted Dunning wrote: Can you submit updates via an observ

Re: Leader Elections

2009-07-20 Thread Flavio Junqueira
better patch this week. Henry On Sat, Jul 18, 2009 at 7:38 PM, Ted Dunning wrote: Can you submit updates via an observer? On Sat, Jul 18, 2009 at 6:38 AM, Flavio Junqueira > wrote: 2- Observers: you could have one computing center containing an ensemble and observers around the edge just learning committed values. -- Ted Dunning, CTO DeepDyve

Re: Zookeeper WAN Configuration

2009-07-24 Thread Flavio Junqueira
Servers in a quorum need to be able to talk to each other to elect a leader. Once a leader is elected, followers only talk to the leader. Of course, if the leader fails, servers in some quorum will need to talk to each other again. If no quorum can be formed, the system is stalled. -Flavi

Re: Zookeeper WAN Configuration

2009-07-24 Thread Flavio Junqueira
Just a few quick observations: On Jul 24, 2009, at 4:40 PM, Ted Dunning wrote: On Fri, Jul 24, 2009 at 4:23 PM, Todd Greenwood wrote: Could you explain the idea behind the Observers feature, what this concept is supposed to address, and how it applies to the WAN configuration problem in parti

Re: Zookeeper WAN Configuration

2009-07-25 Thread Flavio Junqueira
hat Flavio mentions below. -Original Message----- From: Flavio Junqueira [mailto:f...@yahoo-inc.com] Sent: Friday, July 24, 2009 4:50 PM To: zookeeper-user@hadoop.apache.org Subject: Re: Zookeeper WAN Configuration Just a few quick observations: On Jul 24, 2009, at 4:40 PM, Ted Dunning wrote

Re: Zookeeper WAN Configuration

2009-07-26 Thread Flavio Junqueira
Todd, Answers inline: On Jul 26, 2009, at 11:05 AM, Todd Greenwood wrote: Flavio, thank you for the suggestion. I have looked at the documention (relevant snippets pasted in below), and looked at the presentations (http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations ), but I sti

Re: Zookeeper WAN Configuration

2009-07-27 Thread Flavio Junqueira
Todd, Some more answers. Please check out carefully the information at the bottom of this message. On Jul 27, 2009, at 4:02 PM, Todd Greenwood wrote: I'm assuming that you're setting the weight of ZooKeeper servers in PODs to zero, which means that their votes when ordering updates do not co

Re: test failures in branch-3.2

2009-07-30 Thread Flavio Junqueira
Todd, On Jul 30, 2009, at 5:08 PM, Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different fai

Re: Unending Leader Elections in WAN deploy

2009-07-31 Thread Flavio Junqueira
You're missing 491 from your set of patches. -Flavio On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote: This repro's in both branch-3.2, and branch-3.2+patches(473, 479, 481). Basically, it seems like the nodes are electing pd4-zook02 to be the leader. However, pd4-zook02 seems to realize i

Re: Unending Leader Elections in WAN deploy

2009-07-31 Thread Flavio Junqueira
er Elections in WAN deploy Ok, I'll apply that patch and report back. -Todd -Original Message- From: Flavio Junqueira [mailto:f...@yahoo-inc.com] Sent: Friday, July 31, 2009 7:18 PM To: zookeeper-user@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy You're

Re: Unending Leader Elections in WAN deploy

2009-07-31 Thread Flavio Junqueira
: Flavio Junqueira [mailto:f...@yahoo-inc.com] Sent: Friday, July 31, 2009 7:48 PM To: zookeeper-user@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy It should be in 479. Perhaps you have a stale version of the patch. -Flavio On Jul 31, 2009, at 7:46 PM, Todd Greenwood

Re: Question about ephemeral nodes

2009-08-13 Thread Flavio Junqueira
Hi Qian, It would be useful to have access to the log and to know which version of ZooKeeper you're using. You may want to open a jira and attach the log there. Thanks, -Flavio On Aug 13, 2009, at 7:51 AM, Qian Ye wrote: Hi all: My friend encountered a problem when using ZooKeeper. He bui

Re: Watches

2009-08-31 Thread Flavio Junqueira
I agree with Mahadev that it sounds like a stretch. I just wanted to point out that we have been working on a new feature that would certainly help in this case: Observers (ZOOKEEPER-368). I don't think we have decided yet in which release we will include it, but my current guess is 4.0.0 a

Re: Watches

2009-08-31 Thread Flavio Junqueira
I forgot to mention this. You may also consider adding more zookeeper servers and setting the weight of such servers to zero. We will be introducing this possibility in 3.2.1 (the upcoming release). Zero- weight servers simulate observers, but they do not behave exactly as a observers, since

Re: The idea behind 'myid'

2009-09-30 Thread Flavio Junqueira
We just need a unique identifier for every server. If such an identifier "magically" appears somehow, then I believe our protocols will be equally happy. Now, a mechanism to assign ids would also have to take into consideration the group scheme we have for hierarchical quorums. To assign se

Re: Killing a zookeeper server

2010-01-13 Thread Flavio Junqueira
Hi Nick, Your assessment sounds correct, the issue seems to be caused by the bug described in ZOOKEEPER-427. Can't you upgrade to a newer release? Killing the leader should do it, but the bug will still be there, so I recommend upgrading. Thanks, -Flavio On Jan 12, 2010, at 10:52 PM, Nick

Re: Question regarding Membership Election

2010-01-14 Thread Flavio Junqueira
Hi Vijay, I'm just curious: why exactly you want all voting nodes in a single data center? Are you concerned about latency? It might not be possible in your case, but if you have a third location available, you would be able to tolerate one location going down. -Flavio On Jan 14, 2010, a

Re: Question regarding Membership Election

2010-01-15 Thread Flavio Junqueira
2010, at 11:00 PM, Vijay wrote: Hi Falvio, Yes i am concerned about the latency between the DC's (Across continents), We actually have 6 locations but how exactly are we going to do it if we have the 3rd DC? Regards, On Thu, Jan 14, 2010 at 1:46 PM, Flavio Junqueira inc.com> wrot

Re: Namespace partitioning ?

2010-01-15 Thread Flavio Junqueira
Hi, Mahadev said it all, we have been thinking about it for a while, but haven't had time to work on it. I also don't think we have a jira open for it; at least I couldn't find one. But, we did put together some comments: http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper

Re: Managing multi-site clusters with Zookeeper

2010-03-15 Thread Flavio Junqueira
On top of Ben's description, you probably need to set initLimit to several minutes to transfer 700MB (worst case). The value of syncLimit, however, does not need to be that large. -Flavio On Mar 15, 2010, at 7:24 PM, Benjamin Reed wrote: it is a bit confusing but initLimit is the timer tha

Re: zookeeper consistency model?

2010-04-29 Thread Flavio Junqueira
Hi Chen, Let's say that the value of a znode "/test" is initially v and client A writes value v' to znode "/test". If the server that client B is connected to has not persisted the update operation of A, it will read v. If it submits sync before the read, client B will read v'. -Flavio O

Re: Problems with ZooKeeper apache wiki.

2010-05-14 Thread Flavio Junqueira
Hi Sudipto, I'm sorry but I don't have slides I can share that include a description of leader election. I'll work on it when I have a chance. In any case, you may consider inspecting the code if you need it urgently. -Flavio On May 13, 2010, at 10:54 PM, Sudipto Das wrote: Thanks Mahade

BookKeeper Performance Figures

2010-05-18 Thread Flavio Junqueira
Just in case anyone is interested, I've posted some BookKeeper performance figures here: http://wiki.apache.org/hadoop/BookKeeperPerfPage I'll be adding more numbers soon. -Flavio

Re: Concurrent reads and writes on BookKeeper

2010-05-19 Thread Flavio Junqueira
Hi Andre, To guarantee that two clients that read from a ledger will read the same sequence of entries, we need to make sure that there is agreement on the end of the sequence. A client is still able to read from an open ledger, though. We have an open jira about informing clients of the pr

Re: Concurrent reads and writes on BookKeeper

2010-05-20 Thread Flavio Junqueira
t I'd certainly be happy to hear a different perspective. Having said that, we have interesting projects to get folks involved with BK, but I don't have it clear that this is one of them. -Flavio On May 20, 2010, at 1:36 AM, Patrick Hunt wrote: On 05/19/2010 01:23 PM, Flavio Junqu

Re: zookeeper crash

2010-06-02 Thread Flavio Junqueira
Hi Charity, This is certainly not expected. It would be very useful if you could provide us with as much information about your issue as possible. I would suggest that either you create a new jira and link it to ZOOKEEPER-335, or that you add to 335 directly. We'll be looking further into w

Re: zookeeper crash

2010-06-16 Thread Flavio Junqueira
I would recommend opening a separate jira issue. I'm not convinced the issues are the same, so I'd rather keep them separate and link the issues if it is the case. -Flavio On Jun 17, 2010, at 12:16 AM, Patrick Hunt wrote: We are unable to reproduce this issue. If you can provide the server

Re: Zookeeper outage recap & questions

2010-06-30 Thread Flavio Junqueira
Hi Travis, Do you think it would be possible for you to open a jira and upload your logs?Thanks,-FlavioOn Jul 1, 2010, at 8:13 AM, Travis Crawford wrote:Hey zookeepers -We just experienced a total zookeeper outage, and here's a quickpost-mortem of the issue, and some questions about preventing it g

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Flavio Junqueira
Hi Sergei, I'm not sure what the implementation of QuorumVerifier you have in mind would look like to make your setting work. Even if you don't have partitions, variation in message delays can cause inconsistencies in your ZooKeeper cluster. Keep in mind that we make the assumption that quorums int

Re: Achieving quorum with only half of the nodes

2010-07-15 Thread Flavio Junqueira
? Would it result in considerable performance impact due to network latency? I hope that at least in theory since quorum can be reached without ack from EC2 node performance impact might be manageable.Regards,SergeiOn 07/14/2010 04:52 PM, Flavio Junqueira wrote:Hi Sergei, I'm not sure wha

Re: ZooDefs.Sync

2010-09-18 Thread Flavio Junqueira
Avinash, The sync() call flushes the pending updates between the leader and a follower. Check the "consistency guarantees" section here:http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html-FlavioOn Sep 17, 2010, at 10:29 PM, Avinash Lakshman wrote:What is a Sync OpCode stand for

Re: BookKeeper newbie question

2010-10-01 Thread Flavio Junqueira
Thanks for your questions, Amit. On Sep 28, 2010, at 6:37 PM, amit jaiswal wrote: Hi, I am experimenting with BookKeeper and have a question on LedgerHandler class. The readEntries(firstEntry, lastEntry) method takes the indexes of first and last entries. Also, the LedgerSequence object r

Re: Changing configuration

2010-10-07 Thread Flavio Junqueira
We don't have dynamic configuration yet, but it is on our todo list: http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembershipso for now I believe you would have to reconfigure manually and restart the cluster. For Zab, you should be looking at org.apache.zookeeper.server.quorum.Cheers,-FlavioOn Oc

Re: Is it possible to read/write a ledger concurrently

2010-10-22 Thread Flavio Junqueira
I thought we had agreed at some point that the application should do it in the case it needs this feature. That is, every so often the app writer either writes to ZooKeeper its last confirmed write or it sends directly to the reader. Knowing a confirmed write x enables the reader to read up to x.-F

Re: Zookeeper leader stop and restart question

2010-11-01 Thread Flavio Junqueira
Hi Ruifang, It is not clear to me if you verified that leader1 restarted correctly. Was it able to join the ensemble by following leader2?-FlavioOn Nov 1, 2010, at 8:09 PM, Ruifang Ge wrote:Hi,I started a 5-node zookeeper cluster, then killed the leader (leader1).  One of the other server became th