Jeffrey,

Randal makes several good points and covers many of the issues you will need to handle however I'd like to chime in with some the lessons I have learned from my experiences.

The estimate that your maximum database size should be less than 1/2 of your free disk space is a good starting point but you need to also consider the disk space consumed by your views. They also will require a maximum of twice their size to compact. If your view sizes are on the same order as your database size, then you can expect your maximum database size to be 1/4 of your free disk space. This doesn't take into account the current issue in CouchDB where some initial view sizes may be 10-20 times of their final compacted size.

Regularly compacting your database *and* views is critical to limiting your maximum disk usage. Until the issue where compaction leaves file handles open for deleted old copies of files is resolved you will also need to periodically restart your CouchDB server in order to free the space from the old versions of the files. Monitoring not only the database and view sizes but also the actual free space reported by the system is important. If you see the free space continuing to decrease to a dangerous level after repeated compactions you need to restart the database or risk running out of space on the entire machine.

The replication strategy to bigger machines will work up to a point (see below) as long as the load on your database is not too great and the database and views do not need to be compacted too often. However replicating a large database with millions of documents will take a long time and you may not have sufficient time to move to a larger machine before you run out of space if the database and views need to be compacted several times during the replication.

Finally, once your database views grow large enough you will run into the issue where CouchDB will crash after compacting your views, resulting in the view being deleted and having to be recreated from the beginning. This view creation-compaction-crash-creation cycle can take more than a day with a large database, will leave any parts of your application which depend on these views unusable and won't be resolved through replication to a machine with a larger disk.

In summary I think the initial free disk space should be 4 times the expected size of your database and, depending on your views, that there is currently an absolute limit beyond which CouchDB will become unusable. In my case it was a compacted database of 40G of about 10 million documents.

bc

On 1/8/11 12:31 PM, Randall Leeds wrote:
It's hard to estimate how big the compacted database will be given the
size of the original. In the worst case (when your database is already
compacted), compacting it again will double your usage, since it
creates a whole new, optimized copy of the database file.

More likely is that the original is not compact and so the new file
will be much smaller.

Clearly, then, the answer is that if you want to be ultra safe no
single database should exceed 50% of your capacity. However, it is
safe to have many small databases such that the total disk consumption
is much higher.

The best solution is to regularly compact your databases and track the
usage and size differences so you get a good sense of how fast you're
growing. And remember, if you find yourself in a sticky situation
where you can't compact you probably still have plenty of time to
replicate to a bigger machine or a hosted cluster such as offered by
Cloudant. Good monitoring is the best way to avoid disaster.

On Sat, Jan 8, 2011 at 10:39, Jeffrey M. Barber<[email protected]>  wrote:
If I'm running CouchDB with 100GB of disk space, what is the maximum CouchDB
database size such that I'm still able to optimize?

I remember running out of room on a rackspace machine, and I got the
strangest of error codes when trying to run CouchDB.

-J



Reply via email to