On Sun, Feb 27, 2011 at 4:13 PM, niall el-assaad <[email protected]> wrote: > I'm looking at developing an application that will have a couple of central > nodes (data centre) and around 2,000 remote nodes (in branch offices). > > I'd like to know if couchdb can scale to having this many nodes working in a > cluster.
For the list to write a more meaningful answer to address your needs, it would be helpful if you gave more detail about the nature of the problem you want to solve. * Replication topology: is the plan to have replication from the branch office nodes to your centralized data center? (n:1) * Replication type: continuous or triggered manually/programatically? * Scope of data set: I would be more concerned with writes than reads. You'll need to have an idea of what your current aggregate average and peak writes per second are, how much data is written for a given period of time, and how far you think you will need this rate to scale in the future. * Why Couch: is CouchDB going to be addressing a brand new need, or is it going to replace existing systems for known reasons? If it's the latter, what is it about your current systems that aren't meeting your demands, and what do you hope Couch will provide that will fill the gap? (Specifically looking for performance data that you might have already collected, and if Couch is going to be living on your existing hardware or new hardware.) I haven't dealt with large distributed Couch systems, but my instinct would be that Couch wouldn't have any problem with a 2000:1 replicated system. (See Ubuntu One as an example of a large CouchDB system with many external replicators.) The ability to handle it would come down to how well the aggregate data set matches the size of hardware and replication layout in your data center, and of course available ingress bandwidth. -Isaac
