As some know from my pestering on irc, I've been trying to figure out
how fast CouchDB is for inserts.
I've tried using programs in JS, Ruby and Java, and a script using
curl, and each time I can't do better than 6-8 inserts / sec. I get
about 6/sec on my laptop, and 8/sec on my 8GB, 8 core desktop mac ( so
I don't think I haven't thrown enough hardware at the problem...)
At first I thought I was an idiot, and was doing something wrong. But
now I'm not so sure.
On IRC chat, it was mentioned that erlang 5.6.5 fixed some fsync()
problem, so now it does a full sync to be sure data is safe. Ok (more
on this in a moment).
I tried to duplicate this rate using a simple C program, and compare
fsync(), which only flushes to devices, with fcntl(F_FULLFSYNC), which
is supposed to get the storage devices to flush their own buffers and
do a physical write.
In a loop, I'd do a fprintf() with a 100 or so byte string, and then
either a fsync() or a fcntl() right after.
I'd loop 1000 times.
For fsync(), the loop time was trivial - on the order of 50k usec.
For fcntl(), the loop time was painful - between 3 and 4 seconds.
So the "deep fsync()" is slow - but I was still getting > 250 / sec.
I realize that my trivial program isn't doing everything that Couch
might be doing, but still - it's 2 orders of magnitude faster.
So the questions are :
1) What exactly is CouchDB doing to get only 6 inserts per second?
2) Is anyone using CouchDB in a manner that really requires this level
of data security? I appreciate having the *option* to turn on a mode
like this, but I don't think I need it all the time. I can use RAID
systems that give me battery-backed cache, or I can make the design
decision that I am happy to lose X seconds of data in a tradeoff (e.g.
do the deep fnctl() every X seconds.....)
Has this been discussed anywhere? Always happy to catch up on a mail
archive...
geir