On Feb 28, 2011, at 3:39 AM, niall el-assaad wrote: > Hi Isaac, > > Thanks for the reply. I've put some more info inline: > >> >> >> * Replication topology: is the plan to have replication from the branch >> office >> nodes to your centralized data center? (n:1) >> > > We would have a single box at each branch office replicating with two boxes > at the data centre (for resilience). > The data centre boxes would then replicate with each other. > > So if some data was inserted at the remote branch, it would be replicated to > the data centre, and the data centre would replicate it to the other 1999 > branches. > > >> * Replication type: continuous or triggered manually/programatically? >> > > The ideal would be continuous. > > >> * Scope of data set: I would be more concerned with writes than reads. >> You'll need to have an idea of what your current aggregate average and >> peak writes per second are, how much data is written for a given >> period of time, and how far you think you will need this rate to scale in >> the future. >> > > I would expect the average to be around 10 writes per second, with the peak > at about 100 writes per second. > > >> * Why Couch: is CouchDB going to be addressing a brand new need, or >> is it going to replace existing systems for known reasons? If it's the >> latter, what is it about your current systems that aren't meeting your >> demands, and what do you hope Couch will provide that will fill the gap? >> (Specifically looking for performance data that you might have already >> collected, and if Couch is going to be living on your existing hardware >> or new hardware.) >> > > Its for a completely new project, the main driver for looking at CouchDB is > the ability to have a very large scale cluster with write capabilities in > each branch. Mainly so if their is a failure to communications between the > branch and the data centre everything can continue to work, then sync up > later. > >> >> I haven't dealt with large distributed Couch systems, but my instinct >> would be that Couch wouldn't have any problem with a 2000:1 replicated >> system. (See Ubuntu One as an example of a large CouchDB system with >> many external replicators.) The ability to handle it would come down >> to how well the aggregate data set matches the size of hardware and >> replication layout in your data center, and of course available >> ingress bandwidth. >> > > Understand, we would scale the hardware and bandwidth accordingly based upon > testing of the application. > > thanks, > > niall
Hi Niall, I think the key part is that with this topology your central servers are going to need to support a sustained throughput of 20,000 reads/second in order to distribute the updates to all 2,000 servers. Granted, each read is repeated 2,000 times, so you'll mostly be reading from page cache, but a cached read from CouchDB is not nearly as cheap as reading from e.g. Varnish. Adam
