Re: Return data size in zkpython

2009-12-16 Thread Rich Schumacher
Hey Henry,

Thanks for the quick response and patch.  I was going to cobble together a 
quick patch but this is a much more robust solution.  I'll implement this fix 
and let you know if I run into any problems with it.

Thanks again!

Rich

On Dec 15, 2009, at 6:41 PM, Henry Robinson wrote:

 Hi -
 
 See https://issues.apache.org/jira/browse/ZOOKEEPER-627, and the attached
 patch. I've upped the limit to a 1Mb buffer. Also I've added a fourth
 parameter to zookeeper.get - if you set this integer parameter to the size
 of the buffer you are expecting, zkpython will return no more than this many
 bytes.
 
 Thanks again for flagging this up.
 
 cheers,
 Henry
 
 On Tue, Dec 15, 2009 at 4:43 PM, Henry Robinson he...@cloudera.com wrote:
 
 Hey Rich -
 
 That's a really dumb restriction :) I'll open a JIRA and get it fixed asap.
 
 Thanks for the report!
 
 Henry
 
 
 On Tue, Dec 15, 2009 at 4:38 PM, Rich Schumacher rich.s...@gmail.comwrote:
 
 Hey all,
 
 I'm working on using ZooKeeper for an internal application at Digg.  I've
 been using the zkpython package and I just noticed that the data I was
 receiving from a zookeeper.get() call was being truncated.  After some quick
 digging I found that zookeeper.c limits the data returned to 512 characters
 (see
 http://svn.apache.org/viewvc/hadoop/zookeeper/tags/release-3.2.2/src/contrib/zkpython/src/c/zookeeper.c?view=markupline
  855).
 
 Is there a reason for this?  The only information regarding node size that
 I've read is that it should not exceed 1MB so this limit seems a bit
 arbitrary and restrictive.
 
 Thanks for the great work!
 
 Rich
 
 
 



Share Zookeeper instance and Connection Limits

2009-12-16 Thread Thiago Borges
I read the documentation at zoo site and can't find some text about 
sharing/limits of zoo clients connections.


I only see the parameter in .conf file about the max number of 
connections per client.


Can someone point me some documentation about sharing the zookeeper 
connections? Can I do this among different threads?


And about client connections limits and how much throughput decreases 
when the number of connections increase?


Thanks,

--
Thiago Borges


Re: Share Zookeeper instance and Connection Limits

2009-12-16 Thread Patrick Hunt


Thiago Borges wrote:
I read the documentation at zoo site and can't find some text about 
sharing/limits of zoo clients connections.


No limits in particular to ZK itself (given enough memory) - usually the 
limitations are due to the max number of file descriptors the host OS 
allows. Often this is on the order of 1-8k, check your ulimit.


I only see the parameter in .conf file about the max number of 
connections per client.


This is to limit DOS attacks - it was added after we saw issues with 
buggy client implementations that would create infinite numbers of 
sessions with the ZK service. Eventually running into the FD limit 
problem I mentioned.


Can someone point me some documentation about sharing the zookeeper 
connections? Can I do this among different threads?


The API docs have those details:
http://hadoop.apache.org/zookeeper/docs/current/api/index.html
generally the client interface is thread safe though.

And about client connections limits and how much throughput decreases 
when the number of connections increase?


This test has 910 clients (sessions) involved:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance

We have users with 10k sessions accessing a single 5 node ZK ensemble. 
That's the largest I know about that's in production. I've personally 
tested up to 20k sessions attaching to a 3 node ensemble with 10 second 
session timeout and it was fine (although I didn't do much other than 
test session establishment and teardown).


Also see this: http://bit.ly/4ekN8G

Patrick


Re: Share Zookeeper instance and Connection Limits

2009-12-16 Thread Thiago Borges

On 16/12/2009 16:45, Patrick Hunt wrote:

This test has 910 clients (sessions) involved:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance 



We have users with 10k sessions accessing a single 5 node ZK ensemble. 
That's the largest I know about that's in production. I've personally 
tested up to 20k sessions attaching to a 3 node ensemble with 10 
second session timeout and it was fine (although I didn't do much 
other than test session establishment and teardown).


Also see this: http://bit.ly/4ekN8G


The network of this test is a gigabit ethernet, ok? You know someone 
with was running ensembles in 100 Mbit/s ethernet?


Can Zookeeper ensemble runs only in memory rather than write in both 
memory and disk? This makes senses since I have a high reliable system? 
(Of course at some time we need a dump to shutdown and restart the 
entire system).


Well, the disk IO or network first limits the throughput?

Thanks for you quick response. I'm studding Zookeeper in my master 
thesis, for coordinate distributed index structures.


--
Thiago Borges


Re: Share Zookeeper instance and Connection Limits

2009-12-16 Thread Ted Dunning
I think that htis would be a very bad idea because of restart issues.  As it
stands, ZK reads from disk snapshots on startup to avoid moving as much data
from other members of the cluster.

You might consider putting the snapshots and log on a tmpfs file system if
you really, really want this.

On Wed, Dec 16, 2009 at 1:08 PM, Thiago Borges thbor...@gmail.com wrote:

 Can Zookeeper ensemble runs only in memory rather than write in both memory
 and disk? This makes senses since I have a high reliable system? (Of course
 at some time we need a dump to shutdown and restart the entire system).

 Well, the disk IO or network first limits the throughput?

 Thanks for you quick response. I'm studding Zookeeper in my master thesis,
 for coordinate distributed index structures.




-- 
Ted Dunning, CTO
DeepDyve


Re: Share Zookeeper instance and Connection Limits

2009-12-16 Thread Benjamin Reed
I agree with Ted, it doesn't seem like a good idea to do in practice. 
however, you do have a couple of options if you are just testing things:


1) use tmpfs
2) you can set forceSync to no in the configuration file to disable 
syncing to disk before acknowledging responses
3) if you really want to make the disk write go away, you can modify the 
SyncRequestProcessor in the code


ben

Ted Dunning wrote:

I think that htis would be a very bad idea because of restart issues.  As it
stands, ZK reads from disk snapshots on startup to avoid moving as much data
from other members of the cluster.

You might consider putting the snapshots and log on a tmpfs file system if
you really, really want this.

On Wed, Dec 16, 2009 at 1:08 PM, Thiago Borges thbor...@gmail.com wrote:

  

Can Zookeeper ensemble runs only in memory rather than write in both memory
and disk? This makes senses since I have a high reliable system? (Of course
at some time we need a dump to shutdown and restart the entire system).

Well, the disk IO or network first limits the throughput?

Thanks for you quick response. I'm studding Zookeeper in my master thesis,
for coordinate distributed index structures.