subscribe ted.dunn...@gmail.com

2009-03-03 Thread Ted Dunning
-- Ted Dunning, CTO DeepDyve

Re: How large an ensemble can one build with Zookeeper?

2009-03-03 Thread Ted Dunning
zookeeper is not really what you would call a scalable system because all transactions that are updates go through the leader for serialization. Zookeeper is, instead, a high throughput HA system. That said, the throughput of a modest zookeeper cluster is fairly prodigous so for the

Re: How large an ensemble can one build with Zookeeper?

2009-03-06 Thread Ted Dunning
Chubby and Zookeeper have very different ways at getting to similar purposes. Chubby is a locking service, while zookeeper is all about avoiding locks. Zookeeper is better described as a coordination service. Regarding performance, I am pretty sure that Zookeeper could keep up with some pretty

Re: problems on EC2?

2009-04-14 Thread Ted Dunning
=SessionExpired#query:SessionExpired+page:1+mid:gt4c2kn4n4f5s5kw+state:results Perhaps this might have something to do with what you're seeing. Cheers, -n On Tue, Apr 14, 2009 at 5:48 PM, Ted Dunning ted.dunn...@gmail.com wrote: We have been using EC2 as a substrate for our search cluster

Re: Running ZooKeeper inside my web app

2009-04-16 Thread Ted Dunning
Absolutely. Katta did this, at least initially. Just spawn a thread and mimic the launching of a Zookeeper server. On Thu, Apr 16, 2009 at 6:38 AM, David Pollak feeder.of.the.be...@gmail.com wrote: Is it possible to start ZooKeeper programatically from inside my web app? -- Ted Dunning

Re: problems on EC2?

2009-04-16 Thread Ted Dunning
Patrick, Thanks enormously. This hasn't helped yet, but that is just because it was a very large bite of the apple. Once I digest it, I can tell that it will be very helpful. I did have a chance to look at the stat output and maximum latency was 300ms. How that connects with what you are

Re: Unique Id Generation

2009-04-24 Thread Ted Dunning
I would expect Ben's method to be slightly faster, but they should be comparable. And, of course you are correct about rewind. Such are the perils of writing code in the email program. On Fri, Apr 24, 2009 at 10:01 AM, Satish Bhatti cthd2...@gmail.com wrote: ... Your approach appears to be

Re: Dynamic server addition/deletion

2009-05-01 Thread Ted Dunning
on this that they could share with us. Also, we don't want to duplicate the effort, so we would appreciate if you let us know anyone is already working on a design proposal for this feature. Thanks Raghu -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 858

Re: Moving ZooKeeper Servers

2009-05-04 Thread Ted Dunning
to be delicate for other reasons as well. On Mon, May 4, 2009 at 2:35 PM, Mahadev Konar maha...@yahoo-inc.com wrote: So, zookeeper would work fine if you are careful with above but I would vote against doing this for production since the above is pretty easy to mess up. -- Ted Dunning, CTO DeepDyve

Re: NodeChildrenChanged WatchedEvent

2009-05-08 Thread Ted Dunning
On Fri, May 8, 2009 at 1:31 PM, Javier Vegas jav...@beboinc.com wrote: Sorry, what I meant is issuing the new method watchChildren() on the parent node (basically the same as getChildren() but returning just a boolean instead of a list of children, because I already know the paths of the

Re: NodeChildrenChanged WatchedEvent

2009-05-09 Thread Ted Dunning
that do these repeated mundane tasks for you to handle those use cases where the verbosity of the API is a hinderance to quality and productivity. -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)

Re: ZooKeeper viewer

2009-06-03 Thread Ted Dunning
Thanks very much. I have found a few oversights in the code as well and will post a new version shortly (with your suggested changes). On Wed, Jun 3, 2009 at 8:17 AM, Eric Bowman ebow...@boboco.ie wrote: Ted Dunning wrote: Please add comments, suggestions and improvements to the JIRA ticket

Re: ConnectionLoss (node too big?)

2009-06-03 Thread Ted Dunning
Isn't the max file size a megabyte? On Wed, Jun 3, 2009 at 9:01 AM, Eric Bowman ebow...@boboco.ie wrote: On the client, I see this when trying to write a node with 7,641,662 bytes: -- Ted Dunning, CTO DeepDyve

Re: Authentification for Zookeeper Server

2009-06-16 Thread Ted Dunning
Remember that the patch is almost trivial. Add a configuration option acceptConnectionsOnlyFromLocalHost, and then in the server connect logic reject non-localhost attempts (and log a security note). On Tue, Jun 16, 2009 at 2:53 PM, Gustavo Niemeyer gust...@niemeyer.netwrote: but the stunnel

Re: Some questions about Zookeeper 3.2.0

2009-06-27 Thread Ted Dunning
In general for changes like this, you need to be running more than one server in a cluster to avoid losing state such as the ephemeral nodes. I can't say for certain that the 3.1.1 to 3.2 change can be done this way, but most upgrades can be done by stopping one server at a time, changing the

Re: Some questions about Zookeeper 3.2.0

2009-06-28 Thread Ted Dunning
I don't think you should be very nervous at all. There are two questions: 1) can 3.1.1 go to 3.2 with no down time. This is very likely, but a wiser head than mine should have final say 2) can 3.1.1 go to 3.2 with 1 minute of downtime. The is for sure. Neither option involves data loss. ZK

Re: Some questions about Zookeeper 3.2.0

2009-06-29 Thread Ted Dunning
A rolling update works very well for that. You can also change the number of nodes in the cluster. To do this, you replace the config files on the surviving servers and on the new server. Then take down the one that is leaving the cluster and then one by one restart the servers that will remain

Re: zookeeper on ec2

2009-07-06 Thread Ted Dunning
On Mon, Jul 6, 2009 at 12:58 PM, Gustavo Niemeyer gust...@niemeyer.netwrote: can make the ZK servers appear a bit less connected. You have to plan for ConnectionLoss events. Interesting. Note that most of these seem to be related to client issues, especially GC. If you configure in such

Re: Queue code

2009-07-17 Thread Ted Dunning
for extremely large queues of pending tasks. On Fri, Jul 17, 2009 at 1:20 PM, Mahadev Konar maha...@yahoo-inc.comwrote: Also are there any performance numbers of zookeeeper based queues. How does it compare with JMS. -- Ted Dunning, CTO DeepDyve

Re: Zookeeper WAN Configuration

2009-07-24 Thread Ted Dunning
to...@audiencescience.comwrote: Ted, could you elaborate a bit more on this? I was under the (mis) impression that each ZK server in an ensemble only needed connectivity to another member in the ensemble, not to each member in the ensemble. It sounds like you are saying the latter is true. -- Ted Dunning, CTO

Re: Zookeeper WAN Configuration

2009-07-26 Thread Ted Dunning
the performance of the ensemble, provided large blobs of traffic were not being sent across the network. -- Ted Dunning, CTO DeepDyve

Re: zkclient now has a mailing list

2009-08-13 Thread Ted Dunning
THat would be a great way to get really good feedback. On Thu, Aug 13, 2009 at 4:13 PM, Stefan Groschupf s...@101tec.com wrote: If we have something clean and stable running we might contribute it back to the apache zk project. -- Ted Dunning, CTO DeepDyve

Re: Watches

2009-08-29 Thread Ted Dunning
in receiving notifications. Cheers Avinash -- Ted Dunning, CTO DeepDyve

Re: zookeeper on ec2

2009-09-01 Thread Ted Dunning
) It's been running for about 48 hours. On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Do you have long GC delays? On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti cthd2...@gmail.com wrote: Session timeout is 30 seconds. On Tue, Sep 1, 2009 at 4

Re: Start problem of Running Replicated ZooKeeper

2009-09-23 Thread Ted Dunning
in mailing list archives, but got nothing helpful. I need your help, thanks and best regards! -- Ted Dunning, CTO DeepDyve

Re: Start problem of Running Replicated ZooKeeper

2009-09-23 Thread Ted Dunning
Good points. On the other hand, it could still be firewall issues. On Wed, Sep 23, 2009 at 8:30 AM, Benjamin Reed br...@yahoo-inc.com wrote: The connection refused message as opposed to no route to host, or unknown host, indicate that zookeeper has not been started on the other machines. are

Re: The idea behind 'myid'

2009-09-25 Thread Ted Dunning
there is a good reason for using this approach, but it is the first time I have come over this type of non-automatic way for administrating replicas. Regards, Orjan -- Ted Dunning, CTO DeepDyve

Re: feedback zkclient

2009-10-01 Thread Ted Dunning
) somewhere that totally ignores that this would reset the interrupt flag, if e is an InterruptedException. Therefore we better avoid having all of the methods throwing that exception. -- Ted Dunning, CTO DeepDyve

Re: feedback zkclient

2009-10-01 Thread Ted Dunning
is back and check if the znode is there. There is no way of knowing whether it was us who created the node or somebody else, right? -- Ted Dunning, CTO DeepDyve

Re: feedback zkclient

2009-10-01 Thread Ted Dunning
sessionid. As you say, it's highly implementation dependent. It's also something we recognize is a problem for users, we've slated it for 3.3.0 http://issues.apache.org/jira/browse/ZOOKEEPER-22 -- Ted Dunning, CTO DeepDyve

Re: How do we find the Server the client is connected to?

2009-10-01 Thread Ted Dunning
but that is not exposed. Rob Baccus 425-201-3812 -- Ted Dunning, CTO DeepDyve

Re: Cluster Configuration Issues

2009-10-22 Thread Ted Dunning
, I know it makes more sense to run an odd number of zookeeper nodes but I just want to make sure it works first). Any suggestions? -- Ted Dunning, CTO DeepDyve

Re: Restarting a single zookeeper Server on the same port within the process

2009-10-22 Thread Ted Dunning
a delay and restarting it on the same port. But the server doesn't startup. When I re-start on a different port, it starts up correctly. Can you let me know how I can make this one work. Thank you. Regards, Siddharth -- Ted Dunning, CTO DeepDyve

Re: zookeeper viewer

2009-10-25 Thread Ted Dunning
/24/09 4:18 PM, Hamoun gh hamoun...@gmail.com wrote: I am looking for the zookeeper viewer. seems the link is broken. can somebody please help? Thank you, Hamoun Ghanbari -- Ted Dunning, CTO DeepDyve

Re: API for node entry to the cluster.

2009-11-05 Thread Ted Dunning
in the future to do this? TIA A -- Ted Dunning, CTO DeepDyve

Re: API for node entry to the cluster.

2009-11-05 Thread Ted Dunning
not restarting. Start/Stop the new/old process and then start a round of consensus for adding/removing a machine. I guess if one can do that then there is stopping of process required. Am I missing something here? A On Thu, Nov 5, 2009 at 11:14 AM, Ted Dunning ted.dunn...@gmail.com wrote

Re: ZK on EC2

2009-11-09 Thread Ted Dunning
the experience there? Are there more timeouts, lead re-election, etc? Thanks, Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com -- Ted Dunning, CTO DeepDyve

Re: ZK on EC2

2009-11-09 Thread Ted Dunning
...@almaden.ibm.com Ted Dunning ted.dunn...@gmail.com wrote on 11/09/2009 04:24:16 PM: [image removed] Re: ZK on EC2 Ted Dunning to: zookeeper-user 11/09/2009 04:25 PM Please respond to zookeeper-user Worked pretty well for me. We did extend all of our timeouts. The biggest

Re: ZK on EC2

2009-11-09 Thread Ted Dunning
in the wiki page on say, EC2 small/large nodes? I'd do it myself but I've not used ec2. If anyone could try these and report I'd appreciate it. Patrick Ted Dunning wrote: Worked pretty well for me. We did extend all of our timeouts. The biggest worry for us was timeouts on the client side

Re: ZK on EC2

2009-11-10 Thread Ted Dunning
to get double what I got for incoming transfer. On Mon, Nov 9, 2009 at 9:47 PM, Patrick Hunt ph...@apache.org wrote: Could you test networking - scping data between hosts? (I was seeing 64.1MB/s for a 512mb file - the one created by dd, random data) -- Ted Dunning, CTO DeepDyve

Re: ZK on EC2

2009-11-10 Thread Ted Dunning
on the wiki for others interested in running in EC2. -- Ted Dunning, CTO DeepDyve

Re: ZK on EC2

2009-11-10 Thread Ted Dunning
collector? Patrick Ted Dunning wrote: The server side is a fairly standard (but old) config: tickTime=2000 dataDir=/home/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 Most of our clients now use 5 seconds as the timeout, but I think that we went to longer timeouts in the past. Without

Re: Observers!

2009-11-18 Thread Ted Dunning
13:06:39 -0600 (Wed, 18 Nov 2009) | 1 line ZOOKEEPER-368. Observers: core functionality (henry robinson via mahadev) Sweet! Congratulations, and thanks Henry. -- Gustavo Niemeyer http://niemeyer.net -- Ted Dunning, CTO DeepDyve

Re: SLF4J for logging

2009-12-04 Thread Ted Dunning
? Solr now uses it, as does Avro I believe, and other parts of Hadoop. -Yonik http://www.lucidimagination.com -- Ted Dunning, CTO DeepDyve

Re: Share Zookeeper instance and Connection Limits

2009-12-16 Thread Ted Dunning
). Well, the disk IO or network first limits the throughput? Thanks for you quick response. I'm studding Zookeeper in my master thesis, for coordinate distributed index structures. -- Ted Dunning, CTO DeepDyve

Re: Share Zookeeper instance and Connection Limits

2009-12-18 Thread Ted Dunning
only a idea. The world are changing to SSD's too! -- Ted Dunning, CTO DeepDyve

Re: Can zookeeper achive the IBM TSA function?

2010-01-19 Thread Ted Dunning
, but with a database, I would wonder if there are others. On Tue, Jan 19, 2010 at 11:30 PM, xeoshow xeos...@gmail.com wrote: I am wondering can this monitor part be replaced by zookeeper, using zookeeper watch or something else? -- Ted Dunning, CTO DeepDyve

Re: Can zookeeper achive the IBM TSA function?

2010-01-20 Thread Ted Dunning
Take a look here at the recipes: http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html On Wed, Jan 20, 2010 at 12:15 AM, xeoshow xeos...@gmail.com wrote: Ted, thank you very much for your reply. I think A will exit and so ZK can help .. Not sure if any further link can help on how to

Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)

2010-01-23 Thread Ted Dunning
processing the corresponding task (if something goes wrong, just kill itself and the node will be gone) if not, we go back to wait for watcher. Will this work? -- Ted Dunning, CTO DeepDyve

Re: Server exception when closing session

2010-01-25 Thread Ted Dunning
-- Ted Dunning, CTO DeepDyve

Re: Q about ZK internal: how commit is being remembered

2010-01-28 Thread Ted Dunning
according to Zab's FIFO nature...just want to hear some clarification about it. Thanks alot! -- With Regards! Ye, Qian Made in Zhejiang University -- With Regards! Ye, Qian Made in Zhejiang University -- Ted Dunning

Re: question regarding connectionloss

2010-02-01 Thread Ted Dunning
!? Thanks for any help. Cheers, Michael -- Michael Bauland michael.baul...@knipp.de bauland.tel -- Ted Dunning, CTO DeepDyve

Re: question regarding connectionloss

2010-02-02 Thread Ted Dunning
: For example Hardware misconfiguration - NIC caused one system to basically work, but with huge numbers of connection loss, esp whenever there was load (and I've seen this particular issue twice now). -- Ted Dunning, CTO DeepDyve

Re: ephemeral node after server bounce

2010-02-04 Thread Ted Dunning
On Thu, Feb 4, 2010 at 2:20 PM, Yonik Seeley yo...@lucidimagination.comwrote: There's no way to hand over responsibility for an ephemeral znode, right? Right. We have solr nodes create ephemeral znodes (name based on host and port). The ephemeral znode takes some time to remove of course,

Re: ephemeral node after server bounce

2010-02-04 Thread Ted Dunning
, Patrick Hunt ph...@apache.org wrote: Ah, excellent idea [jvm shutdownhooks], won't always work but may help. I think in this case (ephemerals) all Yonik would need to do is close the session. That will remove all ephemerals. -- Ted Dunning, CTO DeepDyve

Re: ZooKeeper packages for Ubuntu

2010-02-16 Thread Ted Dunning
/+archive/ppahttps://launchpad.net/%7Ettx/+archive/ppa This is a Personal Package Archive at the moment, but these packages may end up being promoted depending on how relevant they are. Please let me know if these work or do not work for you. -- Gustavo Niemeyer http://niemeyer.net -- Ted

Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Ted Dunning
Not sure this helps at all, but these times are remarkably asymmetrical. I would expect members of a ZK cluster to have very comparable times. Additionally, 345 ms is nowhere near large enough to cause a session to expire. My take is that ZK doesn't think it caused the timeout. On Mon, Feb

Re: how to lock one-of-many ?

2010-02-23 Thread Ted Dunning
for a relatively short time (1 second on average), and by time I have blundered through all the possible locks, ids that were locked at the start might be available by time I finished. -- Ted Dunning, CTO DeepDyve

Re: how to lock one-of-many ?

2010-02-24 Thread Ted Dunning
. No feature but it does sound interesting. Are there any tools that allow one to setup slow pipes ala stunnel but here for latency not encryp? I believe freebsd has this feature at the os (firewall?) level, I don't know if linux does. -- Ted Dunning, CTO DeepDyve

Re: how to lock one-of-many ?

2010-02-24 Thread Ted Dunning
. -- Ted Dunning, CTO DeepDyve

Re: how to lock one-of-many ?

2010-02-24 Thread Ted Dunning
Waite waite@googlemail.comwrote: I really do not follow the delegator approach. Is this something I would patch into Zookeeper ? Or the client ? -- Ted Dunning, CTO DeepDyve

Re: is there a good pattern for leases ?

2010-02-24 Thread Ted Dunning
locks to keep the size of the lock table small. The trouble with managing these locks in a database is that the tables are getting hot and becoming one of the main sources of contention. Also, SQL is not necessarily fast for doing the required updates. -- Ted Dunning, CTO DeepDyve

Re: is there a good pattern for leases ?

2010-02-25 Thread Ted Dunning
to ensure the node FN was up-to-date - assuming I do not know if I am connected to a primary ZK instance ? Would 10K sync calls within a 2 minute period be excessive ? -- Ted Dunning, CTO DeepDyve

Re: is there a good pattern for leases ?

2010-02-25 Thread Ted Dunning
That is one of the strengths of ZK. Your client would do this: 1) create node, if success client has lock 2) get current node (you get the current version when you do this), if lease is current and ours, we have the lock, if lease is current and not ours, we have failed to get the lock 3) try to

Re: zookeeper utils

2010-03-02 Thread Ted Dunning
What other examples are you looking for? On Tue, Mar 2, 2010 at 1:04 PM, David Rosenstrauch dar...@darose.netwrote: Is there a library of higher-level zookeeper utilities that people have contributed, beyond the barrier and queue examples provided in the docs? -- Ted Dunning, CTO DeepDyve

Re: Managing multi-site clusters with Zookeeper

2010-03-06 Thread Ted Dunning
I taking Zookeeper out of its application domain and just asking for trouble ? -- Ted Dunning, CTO DeepDyve

Re: network requirements

2010-03-06 Thread Ted Dunning
Your network admin is correct. Multicast often doesn't work. ZK does not use multicast at the network level. Where events or notifications must go to many places (that SOUNDS like multicast, I know) it uses very standard TCP connections. For almost any known modern network, ZK should be just

Re: Managing multi-site clusters with Zookeeper

2010-03-07 Thread Ted Dunning
If you can stand the latency for updates then zk should work well for you. It is unlikely that you will be able to better than zk does and still maintain correctness. Do note that you can, probalbly bias client to use a local server. That should make things more efficient. Sent from my

Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread Ted Dunning
I have used 5 and 3 in different clusters. Moderate amounts of sharing is reasonable, but sharing with less intensive applications is definitely better. Sharing with the job tracker, for instance is likely fine since it doesn't abuse disk so much. The namenode is similar, but not quite as nice.

Re: java heap size

2010-03-15 Thread Ted Dunning
Your understanding is correct. But if you set a heap size nearly as big as your physical memory (or larger) then java may allocate that heap which will cause swapping. So swapping is definitely done by the OS, but it is the applications like Java that can cause the OS to do it. On Mon, Mar 15,

Re: persistent storage and node recovery

2010-03-15 Thread Ted Dunning
I don't think that you have considered the impact of ordered updates here. On Mon, Mar 15, 2010 at 6:19 PM, Maxime Caron maxime.ca...@gmail.comwrote: So this is all about the operation log so if a node is in minority but have more recent committed value this node is in Veto over the other

Re: persistent storage and node recovery

2010-03-15 Thread Ted Dunning
I like to say that the cost of now goes up dramatically with diameter. On Mon, Mar 15, 2010 at 7:50 PM, Henry Robinson he...@cloudera.com wrote: There is a fundamental tension between synchronicity of updates and scale.

Re: permanent ZSESSIONMOVED

2010-03-16 Thread Ted Dunning
Hmm... this inspires me to have a thought as well. Łukasz, there isn't any fancy network stuff going on here is there? No NATing or fancy load balancing or reassignment of IP addresses of servers, right? On Tue, Mar 16, 2010 at 4:51 PM, Patrick Hunt ph...@apache.org wrote: It will be good to

Re: Modify ZooKeeper Java client to hold weak references to Watcher objects

2010-03-18 Thread Ted Dunning
This kind of sounds strange to me. My typical idiom is to create a watcher but not retain any references to it outside the client. It sounds to me like your change will cause my watchers to be collected and deactivated when GC happens. On Thu, Mar 18, 2010 at 3:32 AM, Dominic Williams

Re: How to ensure trasaction create-and-update

2010-03-29 Thread Ted Dunning
This is not a good thing. ZK gains lots of its power and reliability by not trying to do atomic updates to multiple znodes at once. Can you say more about the update that you want to do? It is common for updates like to be such that you can order the updates and do without a truly atomic

Re: How to ensure trasaction create-and-update

2010-03-29 Thread Ted Dunning
I perhaps should not have said power, except insofar as ZK's strengths are in reliability which derives from simplicity. There are essentially two common ways to implement multi-node update. The first is the tradtional db style with begin-transaction paired with either a commit or a rollback

Re: Re: How to ensure trasaction create-and-update

2010-03-29 Thread Ted Dunning
as a whole. 2010-03-30 Will 发件人: Ted Dunning ted.dunn...@gmail.com 发送时间: 2010-03-30 10:11 主 题: Re: How to ensure trasaction create-and-update 收件人: zookeeper-user@hadoop.apache.org This is not a good thing. ZK gains lots of its power and reliability by not trying to do atomic updates

Re: How to ensure trasaction create-and-update

2010-03-30 Thread Ted Dunning
As usual, Ben says better what I was trying to say. Henry's point that a very limited multi-update would be useful is also true, though. If somebody can come up with a way to do that without making things unreasonably complicated, it would be really nice to have. In the meantime, I will try to

Re: the error

2010-03-31 Thread Ted Dunning
Suppose a machine has probability of soft-failure p_1 and catastrophic p_2 p_1. Assume that two machines have independent failure modes. Probably of soft failure of a one machine cluster = p_1, two machine cluster = probability of soft failure of 1 or 2 machines + probability of one machine

Re: the error

2010-03-31 Thread Ted Dunning
As I pointed out in my response, you should distinguish hard and soft failures. If one machine fails even catastrophically, you can provide a new machine to replace it, thus converting a hard failure into a soft one. The conclusion is the same. Three machines is vastly better than one or two.

Re: user cousult

2010-04-01 Thread Ted Dunning
On Thu, Apr 1, 2010 at 7:27 PM, li li liqiyuan...@gmail.com wrote: Now I can handle about 300 clients with one server,when I set the session time out is 3. In your opinion , the session time out is set in which value more suitable? 5-30 seconds is a much more typically value.

odd error message

2010-04-20 Thread Ted Dunning
We have just done an upgrade of ZK to 3.3.0. Previous to this, ZK has been up for about a year with no problems. On two nodes, we killed the previous instance and started the 3.3.0 instance. The first node was a follower and the second a leader. All went according to plan and no clients seemed

Re: Would this work?

2010-04-20 Thread Ted Dunning
I can't comment on the details of your code (but I have run in-process ZK's in the past without problem) Operationally, however, this isn't a great idea. The problem is two-fold: a) firstly, somebody would probably like to look at Zookeeper to understand the state of your service. If the

Re: Embedding ZK in another application

2010-04-23 Thread Ted Dunning
It is, of course, your decision, but a key coordination function is to determine whether your application is up or not. That is very hard to do if Zookeeper is inside your application. On Fri, Apr 23, 2010 at 10:28 AM, Asankha C. Perera asan...@apache.orgwrote: However, I believe that both the

Re: Using Zookeeper to distribute tasks

2010-04-27 Thread Ted Dunning
The general way to do this is either a) have lots of watchers who all try to create a single file when a watched file changes. This is very simple to code, but leads to a lot of notifications when you have thousands of watchers. b) arrange the watchers in a chain. This is similar to the

Re: Bizarre ZooKeeper Client Behaviour

2010-04-27 Thread Ted Dunning
Lei, A contrary question for you is why you don't just share zk sessions within a single process. On Tue, Apr 27, 2010 at 5:17 PM, Lei Zhang lzvoya...@gmail.com wrote: I am in the process of changing to each thread of each daemon maintaining a zk session. That means we will hit this 10

Re: zookeeper consistency model?

2010-04-29 Thread Ted Dunning
In general, the guarantee is that B will do exactly as you say it will read the new value or the old value. Your question depends on a definition of now that spans several machines. That is a dangerous concept and if your reasoning requires it, you are headed for trouble. On Thu, Apr 29,

Re: zookeeper consistency model?

2010-04-29 Thread Ted Dunning
, this is my browser homepage ;-) http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing Patrick On 04/29/2010 09:14 AM, Ted Dunning wrote: In general, the guarantee is that B will do exactly as you say it will read the new value or the old value. Your question depends

Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Ted Dunning
and Slave(s) are broken while all other connections are still alive, would my system hang after some point? Because no new leader election will be initiated by slaves and the leader can't get the work to slave(s). Thanks, Lei On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com

Re: ZKClient

2010-05-04 Thread Ted Dunning
This is used as part of katta where it gets a fair bit of exercise at low update rates with small data. It is used for managing the state of the search cluster. I don't think it has had much external review or use for purposes apart from katta. Katta generally has pretty decent code, though.

Re: ZKClient

2010-05-04 Thread Ted Dunning
I don't think that zk is hard to get right. What is hard is to layer a very different model on top of ZK that changes the semantics significantly and that that translation right. One of the very cool things about ZK is how easy it is to write correct code. I know that Ben and co put a lot of

Re: ZKClient

2010-05-04 Thread Ted Dunning
, 2010 at 2:21 PM, Ted Dunning ted.dunn...@gmail.com wrote: In general, writing this sort of layer on top of ZK is very, very hard to get really right for general use. In a simple use-case, you can probably nail it but distributed systems are a Zoo, to coin a phrase. The problem

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Ted Dunning
Impressive number here, especially at your quoted few per second rate. Are you sure that you haven't inadvertently synchronized GC on multiple machines? On Wed, May 12, 2010 at 8:30 PM, Aaron Crow dirtyvagab...@yahoo.com wrote: Right now we're at 1.9 million. This isn't a bug of our

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Ted Dunning
Yes. That is roughly what I mean. If one server starts a GC, it can effectively go offline. That might pressure the other servers enough that one of them starts a GC. This is unlikely with your GC settings, but you should turn on the verbose GC logging to be sure. On Wed, May 12, 2010 at

Re: Ping and client session timeouts

2010-05-21 Thread Ted Dunning
You may actually be swapping. That can be even worse than GC! On Fri, May 21, 2010 at 11:32 AM, Stephen Green eelstretch...@gmail.comwrote: Right. The system can be very memory-intensive, but at the time these are occurring, it's not under a really heavy load, and there's plenty of heap

Re: Zookeeper, Maven and dependencies on javax jar files

2010-05-24 Thread Ted Dunning
Which version of maven do you have? I have heard some versions don't follow redirects well. You can try deleting these defective files in your local repository under .m2 and try again. You may need to try with a newer maven to get things right. Another option is to explicitly remove those

Re: Zookeeper, Maven and dependencies on javax jar files

2010-05-24 Thread Ted Dunning
The only one that I think is important is the jmx which enables monitoring of the servers. On Mon, May 24, 2010 at 2:51 PM, Jack Orenstein j...@akiban.com wrote: This at least gets me through the build/install phase. My usage of zookeeper is pretty minimal right now -- just one a single node.

Re: Zookeeper, Maven and dependencies on javax jar files

2010-05-24 Thread Ted Dunning
Same version I use. On Mon, May 24, 2010 at 2:51 PM, Jack Orenstein j...@akiban.com wrote: Ted Dunning wrote: Which version of maven do you have? 2.2.1.

Re: zookeeper crash

2010-06-02 Thread Ted Dunning
This looks a bit like a small bobble we had when upgrading a bit ago. I THINK that the answer here is to mind-wipe the misbehaving node and have it resynch from scratch from the other nodes. Wait for confirmation from somebody real. On Wed, Jun 2, 2010 at 11:11 AM, Charity Majors

Re: zookeeper crash

2010-06-02 Thread Ted Dunning
I knew Patrick would remember to add an important detail. On Wed, Jun 2, 2010 at 11:49 AM, Patrick Hunt ph...@apache.org wrote: As Ted suggested you can remove the datadir -- *only on the effected server* -- and then restart it.

  1   2   >