Re: Running cluster behind load balancer
DNS round-robin works as well. On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reed wrote: > it would have to be a TCP based load balancer to work with ZooKeeper > clients, but other than that it should work really well. The clients will be > doing heart beats so the TCP connections will be long lived. The client > library does random connection load balancing anyway. > > ben > > On 11/03/2010 12:19 PM, Luka Stojanovic wrote: > >> What would be expected behavior if a three node cluster is put behind a >> load >> balancer? It would ease deployment because all clients would be configured >> to target zookeeper.example.com regardless of actual cluster >> configuration, >> but I have impression that client-server connection is stateful and that >> jumping randomly from server to server could bring strange behavior. >> >> Cheers, >> >> -- >> Luka Stojanovic >> lu...@vast.com >> Platform Engineering >> > >
Re: Running cluster behind load balancer
it would have to be a TCP based load balancer to work with ZooKeeper clients, but other than that it should work really well. The clients will be doing heart beats so the TCP connections will be long lived. The client library does random connection load balancing anyway. ben On 11/03/2010 12:19 PM, Luka Stojanovic wrote: What would be expected behavior if a three node cluster is put behind a load balancer? It would ease deployment because all clients would be configured to target zookeeper.example.com regardless of actual cluster configuration, but I have impression that client-server connection is stateful and that jumping randomly from server to server could bring strange behavior. Cheers, -- Luka Stojanovic lu...@vast.com Platform Engineering
Re: Getting a "node exists" code on a sequence create
We're building a product for users that aren't used to deploying distributed systems, and we're trying to make it as easy to configure and use as possible. That means not requiring the full list of IP addresses for every node at configuration time; instead, each node can be configured with a single IP of an existing node and the cluster can grow dynamically as new nodes come up. Furthermore, if a new node needs to be added to the cluster, it can be done without changing the configurations of the existing nodes. As you might be able to guess, we're really hoping there's progress in the future on a version of Zookeeper that can properly support dynamic joins, but for now this approach seems to work ok for us. Jeremy On 11/03/2010 11:37 AM, Benjamin Reed wrote: yes, i think you have summarized the problem nicely jeremy. i'm curious about your reasoning for running servers in standalone mode and then merging. can you explain that a bit more? thanx ben On 11/01/2010 04:51 PM, Jeremy Stribling wrote: I think this is caused by stupid behavior on our application's part, and the error message just confused me. Here's what I think is happening. 1) 3 servers are up and accepting data, creating sequential znodes under /zkrsm. 2) 1 server dies, the other 2 continue creating sequential znodes. 3) The 1st server restarts, but instead of joining the other 2 servers, it starts an instance by itself, knowing only about the znodes created before it died. [This is a bug in our application -- it is supposed to join the other 2 servers in their cluster.] 4) Another server (#2) dies and restarts, joining the cluster of server #1. It knows about more sequential znodes under /zkrsm than server #1. 5) At this point, trying to create a new znode in the #1-#2 cluster might be problematic, because servers #1 and #2 know about different sets of znode. If #1 allocates what it thinks is a new sequential number for a new znode, it could be one already used by server #2, and hence a "node exists" code might be returned. So, in summary, our application is almost certainly using Zookeeper wrong. Sorry to waste time on the list, but maybe this thread can help someone in the future. (If this explanation sounds totally off-base though, let me know. I'm not 100% certain this is what's happening, but it definitely seems likely.) Thanks, Jeremy On 11/01/2010 02:56 PM, Jeremy Stribling wrote: Yes, every znode in /zkrsm was created with the sequence flag. We bring up a cluster of three nodes, though we do it in a slightly odd manner to support dynamism: each node starts up as a single-node instance knowing only itself, and then each node is contacted by a coordinator that kills the ZooKeeperServer object and starts a new QuorumPeer object using the full list of three servers. I know this is weird; perhaps this has something to do with it. Other than the weird setup behavior, we are just writing a few sequential records into the system (which all seems to work fine), killing one of the nodes (one that has been elected leader via the standard recommended ZK leader election algorithm), restarting it, and then trying to create more sequential znodes. I'm guessing this is pretty well-tested behavior, so there must be something weird or wrong about the way I have stuff setup. I'm happy to provide whatever logs or snapshots might help someone track this down. Thanks, Jeremy On 11/01/2010 02:42 PM, Benjamin Reed wrote: how were you able to reproduce it? all the znodes in /zkrsm were created with the sequence flag. right? ben On 11/01/2010 02:28 PM, Jeremy Stribling wrote: We were able to reproduce it. A "stat" on all three servers looks identical: [zk:(CONNECTED) 0] stat /zkrsm cZxid = 9 ctime = Mon Nov 01 13:01:57 PDT 2010 mZxid = 9 mtime = Mon Nov 01 13:01:57 PDT 2010 pZxid = 12884902218 cversion = 177 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0 dataLength = 0 numChildren = 177 Creating a sequential node through the command line also fails: [zk:(CONNECTED) 1] create -s /zkrsm/_record testdata Node already exists: /zkrsm/_record One potentially interesting thing is that numChildren above is 177, though I have sequence numbers on that record prefix up to 214 or so. There seem to be some gaps though -- I thin "ls /zkrsm" only shows about 177. Not sure if that's relevant or not. Thanks, Jeremy On 11/01/2010 12:06 PM, Jeremy Stribling wrote: Thanks for the reply. It happened every time we called create, not just once. More than that, we tried restarting each of the nodes in the system (one-by-one), including the new master, and the problem continued. Unfortunately we cleaned everything up, and it's not in that state anymore. We haven't yet tried to reproduce, but I will try and report back if I can get any "cversion" info. Jeremy On 11/01/2010 11:33 AM, Patrick Hunt wrote: Hi Jeremy, this sounds like a bug to me, I don't think you should be getting the
Running cluster behind load balancer
What would be expected behavior if a three node cluster is put behind a load balancer? It would ease deployment because all clients would be configured to target zookeeper.example.com regardless of actual cluster configuration, but I have impression that client-server connection is stateful and that jumping randomly from server to server could bring strange behavior. Cheers, -- Luka Stojanovic lu...@vast.com Platform Engineering
Re: Getting a "node exists" code on a sequence create
yes, i think you have summarized the problem nicely jeremy. i'm curious about your reasoning for running servers in standalone mode and then merging. can you explain that a bit more? thanx ben On 11/01/2010 04:51 PM, Jeremy Stribling wrote: I think this is caused by stupid behavior on our application's part, and the error message just confused me. Here's what I think is happening. 1) 3 servers are up and accepting data, creating sequential znodes under /zkrsm. 2) 1 server dies, the other 2 continue creating sequential znodes. 3) The 1st server restarts, but instead of joining the other 2 servers, it starts an instance by itself, knowing only about the znodes created before it died. [This is a bug in our application -- it is supposed to join the other 2 servers in their cluster.] 4) Another server (#2) dies and restarts, joining the cluster of server #1. It knows about more sequential znodes under /zkrsm than server #1. 5) At this point, trying to create a new znode in the #1-#2 cluster might be problematic, because servers #1 and #2 know about different sets of znode. If #1 allocates what it thinks is a new sequential number for a new znode, it could be one already used by server #2, and hence a "node exists" code might be returned. So, in summary, our application is almost certainly using Zookeeper wrong. Sorry to waste time on the list, but maybe this thread can help someone in the future. (If this explanation sounds totally off-base though, let me know. I'm not 100% certain this is what's happening, but it definitely seems likely.) Thanks, Jeremy On 11/01/2010 02:56 PM, Jeremy Stribling wrote: Yes, every znode in /zkrsm was created with the sequence flag. We bring up a cluster of three nodes, though we do it in a slightly odd manner to support dynamism: each node starts up as a single-node instance knowing only itself, and then each node is contacted by a coordinator that kills the ZooKeeperServer object and starts a new QuorumPeer object using the full list of three servers. I know this is weird; perhaps this has something to do with it. Other than the weird setup behavior, we are just writing a few sequential records into the system (which all seems to work fine), killing one of the nodes (one that has been elected leader via the standard recommended ZK leader election algorithm), restarting it, and then trying to create more sequential znodes. I'm guessing this is pretty well-tested behavior, so there must be something weird or wrong about the way I have stuff setup. I'm happy to provide whatever logs or snapshots might help someone track this down. Thanks, Jeremy On 11/01/2010 02:42 PM, Benjamin Reed wrote: how were you able to reproduce it? all the znodes in /zkrsm were created with the sequence flag. right? ben On 11/01/2010 02:28 PM, Jeremy Stribling wrote: We were able to reproduce it. A "stat" on all three servers looks identical: [zk:(CONNECTED) 0] stat /zkrsm cZxid = 9 ctime = Mon Nov 01 13:01:57 PDT 2010 mZxid = 9 mtime = Mon Nov 01 13:01:57 PDT 2010 pZxid = 12884902218 cversion = 177 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0 dataLength = 0 numChildren = 177 Creating a sequential node through the command line also fails: [zk:(CONNECTED) 1] create -s /zkrsm/_record testdata Node already exists: /zkrsm/_record One potentially interesting thing is that numChildren above is 177, though I have sequence numbers on that record prefix up to 214 or so. There seem to be some gaps though -- I thin "ls /zkrsm" only shows about 177. Not sure if that's relevant or not. Thanks, Jeremy On 11/01/2010 12:06 PM, Jeremy Stribling wrote: Thanks for the reply. It happened every time we called create, not just once. More than that, we tried restarting each of the nodes in the system (one-by-one), including the new master, and the problem continued. Unfortunately we cleaned everything up, and it's not in that state anymore. We haven't yet tried to reproduce, but I will try and report back if I can get any "cversion" info. Jeremy On 11/01/2010 11:33 AM, Patrick Hunt wrote: Hi Jeremy, this sounds like a bug to me, I don't think you should be getting the nodeexists when the sequence flag is set. Looking at the code briefly we use the parent's "cversion" (incremented each time the child list is changed, added/removed). Did you see this error each time you called create, or just once? If you look at the cversion in the Stat of the znode "/zkrsm" on each of the servers what does it show? You can use the java CLI to connect to each of your servers and access this information. It would be interesting to see if the data was out of sync only for a short period of time, or forever. Is this repeatable? Ben/Flavio do you see anything here? Patrick On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Stribling wrote: HI everyone, Is there any situation in which creating a new ZK node with the SEQUENCE flag should result in a "node exists" error?
Re: question about watcher
Definitely checkout the 4letter words then (wch*). Keep in mind getting this data may be expensive (if there's alot of it) and that watches are locak, so servers only know about the watches from sessions est through it (server 1 doesn't know about watches of sessions connected on server 2, 3, etc...). Patrick On Wed, Nov 3, 2010 at 1:13 AM, Qian Ye wrote: > thanks Patrick, I want to know all watches set by all clients. > I would open a jira and write some design think about it later. > > On Tue, Nov 2, 2010 at 11:53 PM, Patrick Hunt wrote: > >> Hi Qian Ye, yes you should open a JIRA for this. If you want to work >> on a patch we could advise you. One thing not clear to me, are you >> interested in just the watches set by the particular client, or all >> watches set by all clients? The first should be relatively easy to >> get, the second would be more involved (the difference btw getting >> local watches and having to talk to the server to get all watches). >> Does this have to be a client api or more administrative in nature? >> Also see >> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands >> specifically the "wchs,wchp,wchs" 4 letter words. >> >> Regards, >> >> Patrick >> >> On Tue, Nov 2, 2010 at 4:11 AM, Qian Ye wrote: >> > Hi all, >> > >> > Is there any progress about this issue? Should we open a new JIRA for it? >> We >> > really need a way to know who set watchers on a specific node. >> > >> > thanks~ >> > >> > On Thu, Aug 6, 2009 at 11:01 PM, Qian Ye wrote: >> > >> >> Thanks Mahadev, I think it is a useful feature for many scenarios. >> >> >> >> >> >> On Thu, Aug 6, 2009 at 12:59 PM, Mahadev Konar > >wrote: >> >> >> >>> Hi Qian, >> >>> There isnt any such api. We have been thinking abt adding an api on >> >>> cancelling a cleints watches. We have been thinking about adding a proc >> >>> filesystem wherein a cleintt will have a list of all the watches. This >> >>> data >> >>> can be used to know which clients are watching what znode, but this has >> >>> always been in the future discussions for us. We DO NOT have anything >> >>> planned in the near future for this. >> >>> >> >>> Thanks >> >>> mahadev >> >>> >> >>> >> >>> On 8/5/09 6:57 PM, "Qian Ye" wrote: >> >>> >> >>> > Hi all: >> >>> > >> >>> > Is there a client API for querying the watchers' owner for a specific >> >>> znode? >> >>> > In some situation, we want to find out who set watchers on the znode. >> >>> > >> >>> > thx >> >>> >> >>> >> >> >> >> >> >> -- >> >> With Regards! >> >> >> >> Ye, Qian >> >> Made in Zhejiang University >> >> >> >> >> > >> > >> > -- >> > With Regards! >> > >> > Ye, Qian >> > >> > > > > -- > With Regards! > > Ye, Qian >
Re: question about watcher
thanks Patrick, I want to know all watches set by all clients. I would open a jira and write some design think about it later. On Tue, Nov 2, 2010 at 11:53 PM, Patrick Hunt wrote: > Hi Qian Ye, yes you should open a JIRA for this. If you want to work > on a patch we could advise you. One thing not clear to me, are you > interested in just the watches set by the particular client, or all > watches set by all clients? The first should be relatively easy to > get, the second would be more involved (the difference btw getting > local watches and having to talk to the server to get all watches). > Does this have to be a client api or more administrative in nature? > Also see > http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands > specifically the "wchs,wchp,wchs" 4 letter words. > > Regards, > > Patrick > > On Tue, Nov 2, 2010 at 4:11 AM, Qian Ye wrote: > > Hi all, > > > > Is there any progress about this issue? Should we open a new JIRA for it? > We > > really need a way to know who set watchers on a specific node. > > > > thanks~ > > > > On Thu, Aug 6, 2009 at 11:01 PM, Qian Ye wrote: > > > >> Thanks Mahadev, I think it is a useful feature for many scenarios. > >> > >> > >> On Thu, Aug 6, 2009 at 12:59 PM, Mahadev Konar >wrote: > >> > >>> Hi Qian, > >>> There isnt any such api. We have been thinking abt adding an api on > >>> cancelling a cleints watches. We have been thinking about adding a proc > >>> filesystem wherein a cleintt will have a list of all the watches. This > >>> data > >>> can be used to know which clients are watching what znode, but this has > >>> always been in the future discussions for us. We DO NOT have anything > >>> planned in the near future for this. > >>> > >>> Thanks > >>> mahadev > >>> > >>> > >>> On 8/5/09 6:57 PM, "Qian Ye" wrote: > >>> > >>> > Hi all: > >>> > > >>> > Is there a client API for querying the watchers' owner for a specific > >>> znode? > >>> > In some situation, we want to find out who set watchers on the znode. > >>> > > >>> > thx > >>> > >>> > >> > >> > >> -- > >> With Regards! > >> > >> Ye, Qian > >> Made in Zhejiang University > >> > >> > > > > > > -- > > With Regards! > > > > Ye, Qian > > > -- With Regards! Ye, Qian