Re: Why do my CouchDB databases grow so fast?

Nicholas Orr Mon, 31 May 2010 17:40:25 -0700

Ahh yes, I see that now...

Is that what the grey faint line on the end of those graphs represent?
I actually didn't notice that the first time, just thought the black
line going up was it...


On Tue, Jun 1, 2010 at 10:13 AM, Randall Leeds <[email protected]> wrote:
> If you look at the script the compaction is only performed at the end
> and not on each iteration.
>
> On Mon, May 31, 2010 at 16:58, Nicholas Orr <[email protected]> wrote:
>> This all makes sense except the OP says a compaction step is being
>> performed.
>> A compaction is essentially a copy/paste/delete/rename operation, so the on
>> disk size should be fairly constant as the data copied is just the info
>> required isn't it?
>>
>> Nick
>>
>> On Tue, Jun 1, 2010 at 9:39 AM, Randall Leeds <[email protected]>wrote:
>>
>>> Hi Konrad,
>>>
>>> I'll take a stab at this and if I'm wrong hopefully someone will correct
>>> me.
>>>
>>> The on disk BTree is written in an append only fashion rather than
>>> modified in place. Append only updates mean that every inner node of
>>> the BTree along the path from the root to the new update has to be
>>> re-written each time. Initially, when there are very few inner nodes,
>>> the amount of disk space used for each new update is relatively
>>> constant. Since the tree has a large fan-out the depth does not change
>>> much at first. In the second graph you are seeing a tree that has a
>>> depth of 1 (just the root) being written over an over again to disk
>>> and the corresponding expected linear growth results. However, when
>>> you have a higher revision limit the old revisions are kept in the
>>> tree and the tree grows taller and fatter with each update. As you
>>> make more updates more inner nodes need to be rewritten for each
>>> update which causes the growth to accelerate. Eventually, you hit the
>>> revision limit and old revisions are discarded, the tree stops getting
>>> any taller or fatter and the number of inner nodes that need to be
>>> changed for each update remains relatively constant (but greater than
>>> in the case of rev_limit=1). I suspect that the first graph becomes
>>> linear above 1000 updates and does not continue to accelerate.
>>>
>>> Cheers,
>>> Randall
>>>
>>> 2010/5/31 Konrad Förstner <[email protected]>:
>>> > Hi,
>>> >
>>> > I have an issue with CouchDB and posted the question on stackoverflow
>>> > [1] but did not get any helpful answer. I would be great if somebody
>>> > could answer this here or a stackoverflow (there I also had a problem
>>> > with the compaction which was just a timing issue as explaint in the
>>> > comment)
>>> >
>>> > I was wondering why my CouchDB database was growing to fast so I wrote
>>> > a little test script [2]. This script changes an attributed of a CouchDB
>>> > document 1200 times and takes the size of the database after each
>>> > change. After performing these 1200 writing steps the database is
>>> > doing a compaction step and the db size is measured again. In the end
>>> > the script plots the databases size against the revision numbers. The
>>> > benchmarking is run twice:
>>> >
>>> > * The first time the default number of document revision (=1000) is used
>>> (_revs_limit).
>>> >
>>> > * The second time the number of document revisions is set to 1.
>>> >
>>> > The first run produces the following plot
>>> > http://www.flickr.com/photos/konradfoerstner/4656011444/
>>> >
>>> > The second run produces this plot second run
>>> > http://www.flickr.com/photos/konradfoerstner/4656012732/
>>> >
>>> > For me this is quite an unexpected behavior. In the first run I would
>>> > have expected a linear growth as every change produces a new
>>> > revision. When the 1000 revisions are reached the size value should be
>>> > constant as the older revisions are discarded.
>>> >
>>> > In the second run the first revision should result in certain database
>>> > size that is then keeps during the following writing steps as every
>>> > new revision leads to the deletion of the previous one.
>>> >
>>> > I could understand if there is a little bit of overhead needed to
>>> > manage the changes but this growth behavior seems weird to me. Can
>>> > anybody explain this phenomenon or correct my assumptions that lead to
>>> > the wrong expectations?
>>> >
>>> > Many thanks in advance
>>> >
>>> > Konrad
>>> >
>>> > [1]
>>> http://stackoverflow.com/questions/2921151/why-do-my-couchdb-databases-grow-so-fast
>>> > [2] http://github.com/konrad/couchdb-benchmarking
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>
>

Re: Why do my CouchDB databases grow so fast?

Reply via email to