Re: Return data size in zkpython
Hey Henry, Thanks for the quick response and patch. I was going to cobble together a quick patch but this is a much more robust solution. I'll implement this fix and let you know if I run into any problems with it. Thanks again! Rich On Dec 15, 2009, at 6:41 PM, Henry Robinson wrote: Hi - See https://issues.apache.org/jira/browse/ZOOKEEPER-627, and the attached patch. I've upped the limit to a 1Mb buffer. Also I've added a fourth parameter to zookeeper.get - if you set this integer parameter to the size of the buffer you are expecting, zkpython will return no more than this many bytes. Thanks again for flagging this up. cheers, Henry On Tue, Dec 15, 2009 at 4:43 PM, Henry Robinson he...@cloudera.com wrote: Hey Rich - That's a really dumb restriction :) I'll open a JIRA and get it fixed asap. Thanks for the report! Henry On Tue, Dec 15, 2009 at 4:38 PM, Rich Schumacher rich.s...@gmail.comwrote: Hey all, I'm working on using ZooKeeper for an internal application at Digg. I've been using the zkpython package and I just noticed that the data I was receiving from a zookeeper.get() call was being truncated. After some quick digging I found that zookeeper.c limits the data returned to 512 characters (see http://svn.apache.org/viewvc/hadoop/zookeeper/tags/release-3.2.2/src/contrib/zkpython/src/c/zookeeper.c?view=markupline 855). Is there a reason for this? The only information regarding node size that I've read is that it should not exceed 1MB so this limit seems a bit arbitrary and restrictive. Thanks for the great work! Rich
Share Zookeeper instance and Connection Limits
I read the documentation at zoo site and can't find some text about sharing/limits of zoo clients connections. I only see the parameter in .conf file about the max number of connections per client. Can someone point me some documentation about sharing the zookeeper connections? Can I do this among different threads? And about client connections limits and how much throughput decreases when the number of connections increase? Thanks, -- Thiago Borges
Re: Share Zookeeper instance and Connection Limits
Thiago Borges wrote: I read the documentation at zoo site and can't find some text about sharing/limits of zoo clients connections. No limits in particular to ZK itself (given enough memory) - usually the limitations are due to the max number of file descriptors the host OS allows. Often this is on the order of 1-8k, check your ulimit. I only see the parameter in .conf file about the max number of connections per client. This is to limit DOS attacks - it was added after we saw issues with buggy client implementations that would create infinite numbers of sessions with the ZK service. Eventually running into the FD limit problem I mentioned. Can someone point me some documentation about sharing the zookeeper connections? Can I do this among different threads? The API docs have those details: http://hadoop.apache.org/zookeeper/docs/current/api/index.html generally the client interface is thread safe though. And about client connections limits and how much throughput decreases when the number of connections increase? This test has 910 clients (sessions) involved: http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance We have users with 10k sessions accessing a single 5 node ZK ensemble. That's the largest I know about that's in production. I've personally tested up to 20k sessions attaching to a 3 node ensemble with 10 second session timeout and it was fine (although I didn't do much other than test session establishment and teardown). Also see this: http://bit.ly/4ekN8G Patrick
Re: Share Zookeeper instance and Connection Limits
On 16/12/2009 16:45, Patrick Hunt wrote: This test has 910 clients (sessions) involved: http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance We have users with 10k sessions accessing a single 5 node ZK ensemble. That's the largest I know about that's in production. I've personally tested up to 20k sessions attaching to a 3 node ensemble with 10 second session timeout and it was fine (although I didn't do much other than test session establishment and teardown). Also see this: http://bit.ly/4ekN8G The network of this test is a gigabit ethernet, ok? You know someone with was running ensembles in 100 Mbit/s ethernet? Can Zookeeper ensemble runs only in memory rather than write in both memory and disk? This makes senses since I have a high reliable system? (Of course at some time we need a dump to shutdown and restart the entire system). Well, the disk IO or network first limits the throughput? Thanks for you quick response. I'm studding Zookeeper in my master thesis, for coordinate distributed index structures. -- Thiago Borges
Re: Share Zookeeper instance and Connection Limits
I think that htis would be a very bad idea because of restart issues. As it stands, ZK reads from disk snapshots on startup to avoid moving as much data from other members of the cluster. You might consider putting the snapshots and log on a tmpfs file system if you really, really want this. On Wed, Dec 16, 2009 at 1:08 PM, Thiago Borges thbor...@gmail.com wrote: Can Zookeeper ensemble runs only in memory rather than write in both memory and disk? This makes senses since I have a high reliable system? (Of course at some time we need a dump to shutdown and restart the entire system). Well, the disk IO or network first limits the throughput? Thanks for you quick response. I'm studding Zookeeper in my master thesis, for coordinate distributed index structures. -- Ted Dunning, CTO DeepDyve
Re: Share Zookeeper instance and Connection Limits
I agree with Ted, it doesn't seem like a good idea to do in practice. however, you do have a couple of options if you are just testing things: 1) use tmpfs 2) you can set forceSync to no in the configuration file to disable syncing to disk before acknowledging responses 3) if you really want to make the disk write go away, you can modify the SyncRequestProcessor in the code ben Ted Dunning wrote: I think that htis would be a very bad idea because of restart issues. As it stands, ZK reads from disk snapshots on startup to avoid moving as much data from other members of the cluster. You might consider putting the snapshots and log on a tmpfs file system if you really, really want this. On Wed, Dec 16, 2009 at 1:08 PM, Thiago Borges thbor...@gmail.com wrote: Can Zookeeper ensemble runs only in memory rather than write in both memory and disk? This makes senses since I have a high reliable system? (Of course at some time we need a dump to shutdown and restart the entire system). Well, the disk IO or network first limits the throughput? Thanks for you quick response. I'm studding Zookeeper in my master thesis, for coordinate distributed index structures.