Issues with terabytes databases

Jean-Yves Moulin Tue, 22 Apr 2014 07:19:15 -0700

Hi everybody,

we use CouchDB in production for more than two years now. And we are almost 
happy with it :-) We have a heavy writing workload, with very few update, and 
we never delete data. Some of our databases are terabytes with billions of 
documents (sometimes 20 millions of doc per day). But we are experiencing some 
issues, and the only solution was to split our data: today we create a new 
database each week, with even and odd on two different servers (thus we have 
on-line and off-line servers). This is not perfect, and we look forward 
BigCouch :-)


Below is some of our current problems with these big databases. For the record, 
we use couchdb-1.2 and couchdb-1.4 on twelve servers running FreeBSD (because 
we like ZFS).

I don't know if these issues are known or not (or specific to us).

* Overall speed: we are far from our real server performance: it seems that 
CouchDB is not able to use the full potential of the system. Even with 24 disks 
in RAID10, we can't go faster that 2000 doc/sec (with an average document size 
of 1k, that's only a few MB/s on disk) on replication or compaction. CPU and 
disk are almost idle. Tweaking the number of Erlang I/O thread doesn't help.

* Insert time: At 1000 PUT/sec the insert time is good, even without bulk. But 
it collapses when launching view calculation, replication or compaction. So, we 
use stale view in our applications and views are processed regularly by a 
crontab scripts. We avoid compaction on live servers. Compaction are launched 
manually on off-line servers only. We also avoid replication on heavy loaded 
servers.

* Compaction: When size of database increase, compaction time can be really 
really long. It will be great if compaction process can run faster on already 
compressed doc. This is our biggest drawback, which implies the database split 
each week. And the speed decreases slowly: compaction starts fast (>2000 
doc/sec) but slow down to ~100 doc/sec after hundred of millions of documents.

Is there other people using CouchDB this kind of database ? How do you handle a 
write-heavy workload ?

Sorry for my english and thank you for the reading.

Best,

Issues with terabytes databases

Reply via email to