Jeffrey,
Randal makes several good points and covers many of the issues you will
need to handle however I'd like to chime in with some the lessons I have
learned from my experiences.
The estimate that your maximum database size should be less than 1/2 of
your free disk space is a good starting point but you need to also
consider the disk space consumed by your views. They also will require a
maximum of twice their size to compact. If your view sizes are on the
same order as your database size, then you can expect your maximum
database size to be 1/4 of your free disk space. This doesn't take into
account the current issue in CouchDB where some initial view sizes may
be 10-20 times of their final compacted size.
Regularly compacting your database *and* views is critical to limiting
your maximum disk usage. Until the issue where compaction leaves file
handles open for deleted old copies of files is resolved you will also
need to periodically restart your CouchDB server in order to free the
space from the old versions of the files. Monitoring not only the
database and view sizes but also the actual free space reported by the
system is important. If you see the free space continuing to decrease to
a dangerous level after repeated compactions you need to restart the
database or risk running out of space on the entire machine.
The replication strategy to bigger machines will work up to a point (see
below) as long as the load on your database is not too great and the
database and views do not need to be compacted too often. However
replicating a large database with millions of documents will take a long
time and you may not have sufficient time to move to a larger machine
before you run out of space if the database and views need to be
compacted several times during the replication.
Finally, once your database views grow large enough you will run into
the issue where CouchDB will crash after compacting your views,
resulting in the view being deleted and having to be recreated from the
beginning. This view creation-compaction-crash-creation cycle can take
more than a day with a large database, will leave any parts of your
application which depend on these views unusable and won't be resolved
through replication to a machine with a larger disk.
In summary I think the initial free disk space should be 4 times the
expected size of your database and, depending on your views, that there
is currently an absolute limit beyond which CouchDB will become
unusable. In my case it was a compacted database of 40G of about 10
million documents.
bc
On 1/8/11 12:31 PM, Randall Leeds wrote:
It's hard to estimate how big the compacted database will be given the
size of the original. In the worst case (when your database is already
compacted), compacting it again will double your usage, since it
creates a whole new, optimized copy of the database file.
More likely is that the original is not compact and so the new file
will be much smaller.
Clearly, then, the answer is that if you want to be ultra safe no
single database should exceed 50% of your capacity. However, it is
safe to have many small databases such that the total disk consumption
is much higher.
The best solution is to regularly compact your databases and track the
usage and size differences so you get a good sense of how fast you're
growing. And remember, if you find yourself in a sticky situation
where you can't compact you probably still have plenty of time to
replicate to a bigger machine or a hosted cluster such as offered by
Cloudant. Good monitoring is the best way to avoid disaster.
On Sat, Jan 8, 2011 at 10:39, Jeffrey M. Barber<[email protected]> wrote:
If I'm running CouchDB with 100GB of disk space, what is the maximum CouchDB
database size such that I'm still able to optimize?
I remember running out of room on a rackspace machine, and I got the
strangest of error codes when trying to run CouchDB.
-J