Re: operational file size

Jeffrey M. Barber Sun, 09 Jan 2011 14:13:42 -0800

Thank you both.

On Sun, Jan 9, 2011 at 4:05 PM, Randall Leeds <[email protected]>wrote:


> On Sun, Jan 9, 2011 at 12:44, Bob Clary <[email protected]> wrote:
> > Jeffrey,
> >
> > Randal makes several good points and covers many of the issues you will
> need
> > to handle however I'd like to chime in with some the lessons I have
> learned
> > from my experiences.
> >
> > The estimate that your maximum database size should be less than 1/2 of
> your
> > free disk space is a good starting point but you need to also consider
> the
> > disk space consumed by your views. They also will require a maximum of
> twice
> > their size to compact. If your view sizes are on the same order as your
> > database size, then you can expect your maximum database size to be 1/4
> of
> > your free disk space. This doesn't take into account the current issue in
> > CouchDB where some initial view sizes may be 10-20 times of their final
> > compacted size.
> >
> > Regularly compacting your database *and* views is critical to limiting
> your
> > maximum disk usage. Until the issue where compaction leaves file handles
> > open for deleted old copies of files is resolved you will also need to
> > periodically restart your CouchDB server in order to free the space from
> the
> > old versions of the files. Monitoring not only the database and view
> sizes
> > but also the actual free space reported by the system is important. If
> you
> > see the free space continuing to decrease to a dangerous level after
> > repeated compactions you need to restart the database or risk running out
> of
> > space on the entire machine.
> >
>
> The issue you refer to is here[1] and it's been fixed for the upcoming
> 1.0.2 and 1.1 releases.
>
> > The replication strategy to bigger machines will work up to a point (see
> > below) as long as the load on your database is not too great and the
> > database and views do not need to be compacted too often. However
> > replicating a large database with millions of documents will take a long
> > time and you may not have sufficient time to move to a larger machine
> before
> > you run out of space if the database and views need to be compacted
> several
> > times during the replication.
> >
> > Finally, once your database views grow large enough you will run into the
> > issue where CouchDB will crash after compacting your views, resulting in
> the
> > view being deleted and having to be recreated from the beginning. This
> view
> > creation-compaction-crash-creation cycle can take more than a day with a
> > large database, will leave any parts of your application which depend on
> > these views unusable and won't be resolved through replication to a
> machine
> > with a larger disk.
> >
>
> That's a more disturbing issue and it looks like no one's addressed it
> yet. I'll comment on the JIRA ticket and see if we can get some
> movement on it. I know it hasn't been around forever, since older
> releases did not exhibit this behavior. I bet we can track it down.
>
> > In summary I think the initial free disk space should be 4 times the
> > expected size of your database and, depending on your views, that there
> is
> > currently an absolute limit beyond which CouchDB will become unusable. In
> my
> > case it was a compacted database of 40G of about 10 million documents.
> >
> > bc
> >
> > On 1/8/11 12:31 PM, Randall Leeds wrote:
> >>
> >> It's hard to estimate how big the compacted database will be given the
> >> size of the original. In the worst case (when your database is already
> >> compacted), compacting it again will double your usage, since it
> >> creates a whole new, optimized copy of the database file.
> >>
> >> More likely is that the original is not compact and so the new file
> >> will be much smaller.
> >>
> >> Clearly, then, the answer is that if you want to be ultra safe no
> >> single database should exceed 50% of your capacity. However, it is
> >> safe to have many small databases such that the total disk consumption
> >> is much higher.
> >>
> >> The best solution is to regularly compact your databases and track the
> >> usage and size differences so you get a good sense of how fast you're
> >> growing. And remember, if you find yourself in a sticky situation
> >> where you can't compact you probably still have plenty of time to
> >> replicate to a bigger machine or a hosted cluster such as offered by
> >> Cloudant. Good monitoring is the best way to avoid disaster.
> >>
> >> On Sat, Jan 8, 2011 at 10:39, Jeffrey M. Barber<[email protected]>
> >>  wrote:
> >>>
> >>> If I'm running CouchDB with 100GB of disk space, what is the maximum
> >>> CouchDB
> >>> database size such that I'm still able to optimize?
> >>>
> >>> I remember running out of room on a rackspace machine, and I got the
> >>> strangest of error codes when trying to run CouchDB.
> >>>
> >>> -J
> >>>
> >>
> >
> >
>
> [1]https://issues.apache.org/jira/browse/COUCHDB-926
>

Re: operational file size

Reply via email to