I'm part of the core Federal Learning Registry dev team [http://www.learningregistry.org], and we're using CouchDB to store and replicate contents of the registry within our network.
One of the questions that has come up as we are starting to make plans for our initial production release is the scalability strategy of CouchDB? We expect long term, we are going to have an enormous amount of data from activity streams and metadata inserted into the network, and I'd like to have an idea what we need to work towards now so theres no big surprise when we start getting close to hitting some limits. As part of our infrastructure strategy - we've chosen Amazon Web Services EC2 & EBS as our hosting provider for the first rollout. EBS currently has an upper limit of 1TB per volume, other cloud or non-cloud solutions may have similar or different limitations, however I'm only concerned right now with how we might deal with this on EC2 and EBS. 1. Are there CouchDB limits that we are going to run into before we hit 1TB? 2. Is there a strategy to for disk spanning to go beyond the 1TB limit by incorporating multiple volumes or do we need to leverage a solution like BigCouch which seems to require us to spin up multiple CouchDB's and do some sort of sharding/partitioning of data? I'm curious on how queries that span shards/partitions works or if this is transparent. Thanks, - Jim Jim Klo Senior Software Engineer Center for Software Engineering SRI International
