On May 26, 2009, at 6:10 PM, Jeff Macdonald wrote:

On Tue, May 26, 2009 at 5:36 PM, Chris Anderson <[email protected]> wrote:

On Tue, May 26, 2009 at 2:31 PM, Jeff Macdonald <[email protected] >
wrote:
Hi all,
I've been experimenting with CouchDB. I'm use Net::CouchDB to batch
insert
20 docs at a time and I'm simply setting _id to a sequence that is
incremented for each doc. For just over 9 million rows where each row is just 6 small fields the resulting DB is 3.4G. When I was letting CouchDB
set
the _id, the resulting database was over 20G. The input source as a tab
delimited file is just over 500MB.

So is it normal for CouchDB to create such a large database file when it
assigns document ids?


yes, currently couchdb docids are random which means more of the btree
must be rewritten, than if they were concentrated, such as you see
with sequential ids. for high performance applications, sequential ids
is faster as well.

Compacting may shrink your databases so they are roughly equal size.
You an trigger compaction from Futon. I'd be interested to see what
results you get.


Well, it took over a day to do it before. I was however only inserting 10 docs at a time then. So, right now I'm not motivated to find out how well
the compaction would be. :)

I'd be _very_ surprised if the two compacted DBs differed substantially in size. They should both weigh in smaller than 3.4G, since the compactor writes documents in larger blocks than you appear to be doing.

I don't know anything about your server setup, but an order-of- magnitude estimate for compacting a DB that size these days would be 1 hour, not 1 day. Best,

Adam

Reply via email to