Re: How large an ensemble can one build with Zookeeper?

2009-03-06 Thread Benjamin Reed
I realize this is discussion is over, but i did want to make one quick 
clarification. when we talk about ensembles, we are talking about the 
servers that make up the zookeeper service. we refer to the servers that 
use the zookeeper service as clients. we have systems here that use 
ensembles of five servers to provide zookeeper service to thousands of 
client servers without problem.


ben

Chad Harrington wrote:

Clearly Zookeeper can handle ensembles of a dozen or so servers.  How large
an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
Are there limitations that make the system unusable at large numbers of
servers?

Thanks,

  




Re: How large an ensemble can one build with Zookeeper?

2009-03-06 Thread Ted Dunning
Chubby and Zookeeper have very different ways at getting to similar
purposes.  Chubby is a locking service, while zookeeper is all about
avoiding locks.  Zookeeper is better described as a coordination service.

Regarding performance, I am pretty sure that Zookeeper could keep up with
some pretty enormous clusters quite easily.  I would expect that the
performance of the underlying file system is more like to be the critical
performance issue.

On Wed, Mar 4, 2009 at 6:00 AM, David Pollak
feeder.of.the.be...@gmail.comwrote:


 I understand that Google uses Chubby (a ZooKeeper clone... or vice versa
 :-)
 ) as the coordination mechanism for Big Table.  Do you have any insight
 into
 Chubby's performance characteristics... and if it would be possible to
 build
 a Big Table clone that had scalability characteristics of Big Table with
 ZooKeeper as the underlying coordinator?




Re: How large an ensemble can one build with Zookeeper?

2009-03-04 Thread David Pollak
On Tue, Mar 3, 2009 at 9:33 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 zookeeper is not really what you would call a scalable system because all
 transactions that are updates go through the leader for serialization.
 Zookeeper is, instead, a high throughput HA system. That said, the
 throughput of a modest zookeeper  cluster is fairly prodigous so for the
 normal application of coordinating a large cluster, these limits are beyond
 what just about anyone needs.

 For other uses, though, 50 K updates per second wouldn't cut it.


I understand that Google uses Chubby (a ZooKeeper clone... or vice versa :-)
) as the coordination mechanism for Big Table.  Do you have any insight into
Chubby's performance characteristics... and if it would be possible to build
a Big Table clone that had scalability characteristics of Big Table with
ZooKeeper as the underlying coordinator?





 Sent from my iPhone


 On Mar 3, 2009, at 17:30, Chad Harrington charring...@datascaler.com
 wrote:

  Clearly Zookeeper can handle ensembles of a dozen or so servers.  How
 large
 an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
 Are there limitations that make the system unusable at large numbers of
 servers?

 Thanks,

 --
 Chad Harrington
 CEO
 DataScaler, Inc.
 charring...@datascaler.com
 201A Ravendale Dr.
 Mountain View, CA  94043
 Phone: 650-515-3437
 Fax: 650-887-1544




-- 
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp


Re: How large an ensemble can one build with Zookeeper?

2009-03-04 Thread Jean-Daniel Cryans
David,

This is exactly what we are doing in the HBase project (www.hbase.org).
Zookeeper is currently being integrated for our next major version and some
parts are already in place.

Regards,

J-D

On Wed, Mar 4, 2009 at 9:00 AM, David Pollak
feeder.of.the.be...@gmail.comwrote:

 On Tue, Mar 3, 2009 at 9:33 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  zookeeper is not really what you would call a scalable system because all
  transactions that are updates go through the leader for serialization.
  Zookeeper is, instead, a high throughput HA system. That said, the
  throughput of a modest zookeeper  cluster is fairly prodigous so for the
  normal application of coordinating a large cluster, these limits are
 beyond
  what just about anyone needs.
 
  For other uses, though, 50 K updates per second wouldn't cut it.


 I understand that Google uses Chubby (a ZooKeeper clone... or vice versa
 :-)
 ) as the coordination mechanism for Big Table.  Do you have any insight
 into
 Chubby's performance characteristics... and if it would be possible to
 build
 a Big Table clone that had scalability characteristics of Big Table with
 ZooKeeper as the underlying coordinator?


 
 
 
  Sent from my iPhone
 
 
  On Mar 3, 2009, at 17:30, Chad Harrington charring...@datascaler.com
  wrote:
 
   Clearly Zookeeper can handle ensembles of a dozen or so servers.  How
  large
  an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
  Are there limitations that make the system unusable at large numbers of
  servers?
 
  Thanks,
 
  --
  Chad Harrington
  CEO
  DataScaler, Inc.
  charring...@datascaler.com
  201A Ravendale Dr.
  Mountain View, CA  94043
  Phone: 650-515-3437
  Fax: 650-887-1544
 
 


 --
 Lift, the simply functional web framework http://liftweb.net
 Beginning Scala http://www.apress.com/book/view/1430219890
 Follow me: http://twitter.com/dpp
 Git some: http://github.com/dpp



Re: How large an ensemble can one build with Zookeeper?

2009-03-04 Thread David Pollak
JD,
When I last looked at HBase (about a year ago), the performance was lacking.
 Have there been material improvements in HBase's performance in the last
year?

Thanks,

David

PS -- If this is not the correct list for such questions, I pre-apologize.
 Just whack me with a 2x4 and I'll take the discussion off the ZooKeeper
list.

On Wed, Mar 4, 2009 at 6:02 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 David,

 This is exactly what we are doing in the HBase project (www.hbase.org).
 Zookeeper is currently being integrated for our next major version and some
 parts are already in place.

 Regards,

 J-D

 On Wed, Mar 4, 2009 at 9:00 AM, David Pollak
 feeder.of.the.be...@gmail.comwrote:

  On Tue, Mar 3, 2009 at 9:33 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
   zookeeper is not really what you would call a scalable system because
 all
   transactions that are updates go through the leader for serialization.
   Zookeeper is, instead, a high throughput HA system. That said, the
   throughput of a modest zookeeper  cluster is fairly prodigous so for
 the
   normal application of coordinating a large cluster, these limits are
  beyond
   what just about anyone needs.
  
   For other uses, though, 50 K updates per second wouldn't cut it.
 
 
  I understand that Google uses Chubby (a ZooKeeper clone... or vice versa
  :-)
  ) as the coordination mechanism for Big Table.  Do you have any insight
  into
  Chubby's performance characteristics... and if it would be possible to
  build
  a Big Table clone that had scalability characteristics of Big Table with
  ZooKeeper as the underlying coordinator?
 
 
  
  
  
   Sent from my iPhone
  
  
   On Mar 3, 2009, at 17:30, Chad Harrington charring...@datascaler.com
   wrote:
  
Clearly Zookeeper can handle ensembles of a dozen or so servers.  How
   large
   an ensemble can one build with Zookeeper?  100 servers?  10,000
 servers?
   Are there limitations that make the system unusable at large numbers
 of
   servers?
  
   Thanks,
  
   --
   Chad Harrington
   CEO
   DataScaler, Inc.
   charring...@datascaler.com
   201A Ravendale Dr.
   Mountain View, CA  94043
   Phone: 650-515-3437
   Fax: 650-887-1544
  
  
 
 
  --
  Lift, the simply functional web framework http://liftweb.net
  Beginning Scala http://www.apress.com/book/view/1430219890
  Follow me: http://twitter.com/dpp
  Git some: http://github.com/dpp
 




-- 
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp


How large an ensemble can one build with Zookeeper?

2009-03-03 Thread Chad Harrington
Clearly Zookeeper can handle ensembles of a dozen or so servers.  How large
an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
Are there limitations that make the system unusable at large numbers of
servers?

Thanks,

-- 
Chad Harrington
CEO
DataScaler, Inc.
charring...@datascaler.com
201A Ravendale Dr.
Mountain View, CA  94043
Phone: 650-515-3437
Fax: 650-887-1544


Re: How large an ensemble can one build with Zookeeper?

2009-03-03 Thread Mahadev Konar
HI Chad,
 The maximum number of zookeeper servers we have tested with is 13. Even
with 13 the performance starts to degrade very quickly (compared to ensemble
of 5 and 7). I am not sure we have the current numbers (we have made 3x or
so performance improvements) but with the old number in zookeeper.pdf on
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations

The slide is at the end.

You can see that the performance drops with 13 servers. We usually suggest 5
or 7 servers for ZooKeeper. We can get around 20K-30K writes per second and
more than 50K reads per second from an ensemble of 5 servers (as of now with
performance enhancements). With 5 servers you can tolerate a failure of 2
nodes. 
Please take a look at zookeeper presentations -
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations
To find out more about Zookeeper.

What is the rationale behind having such a huge amount of zookeeper servers?

Thanks
mahadev


On 3/3/09 5:30 PM, Chad Harrington charring...@datascaler.com wrote:

 Clearly Zookeeper can handle ensembles of a dozen or so servers.  How large
 an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
 Are there limitations that make the system unusable at large numbers of
 servers?
 
 Thanks,



Re: How large an ensemble can one build with Zookeeper?

2009-03-03 Thread Ted Dunning
zookeeper is not really what you would call a scalable system because  
all transactions that are updates go through the leader for  
serialization. Zookeeper is, instead, a high throughput HA system.  
That said, the throughput of a modest zookeeper  cluster is fairly  
prodigous so for the normal application of coordinating a large  
cluster, these limits are beyond what just about anyone needs.


For other uses, though, 50 K updates per second wouldn't cut it.


Sent from my iPhone

On Mar 3, 2009, at 17:30, Chad Harrington charring...@datascaler.com  
wrote:


Clearly Zookeeper can handle ensembles of a dozen or so servers.   
How large
an ensemble can one build with Zookeeper?  100 servers?  10,000  
servers?
Are there limitations that make the system unusable at large numbers  
of

servers?

Thanks,

--
Chad Harrington
CEO
DataScaler, Inc.
charring...@datascaler.com
201A Ravendale Dr.
Mountain View, CA  94043
Phone: 650-515-3437
Fax: 650-887-1544