As some know from my pestering on irc, I've been trying to figure out how fast CouchDB is for inserts.

I've tried using programs in JS, Ruby and Java, and a script using curl, and each time I can't do better than 6-8 inserts / sec. I get about 6/sec on my laptop, and 8/sec on my 8GB, 8 core desktop mac ( so I don't think I haven't thrown enough hardware at the problem...)

At first I thought I was an idiot, and was doing something wrong. But now I'm not so sure.

On IRC chat, it was mentioned that erlang 5.6.5 fixed some fsync() problem, so now it does a full sync to be sure data is safe. Ok (more on this in a moment).

I tried to duplicate this rate using a simple C program, and compare fsync(), which only flushes to devices, with fcntl(F_FULLFSYNC), which is supposed to get the storage devices to flush their own buffers and do a physical write.

In a loop, I'd do a fprintf() with a 100 or so byte string, and then either a fsync() or a fcntl() right after.

I'd loop 1000 times.

For fsync(), the loop time was trivial - on the order of 50k usec.

For fcntl(), the loop time was painful - between 3 and 4 seconds.

So the "deep fsync()" is slow - but I was still getting > 250 / sec.

I realize that my trivial program isn't doing everything that Couch might be doing, but still - it's 2 orders of magnitude faster.

So the questions are :

1) What exactly is CouchDB doing to get only 6 inserts per second?

2) Is anyone using CouchDB in a manner that really requires this level of data security? I appreciate having the *option* to turn on a mode like this, but I don't think I need it all the time. I can use RAID systems that give me battery-backed cache, or I can make the design decision that I am happy to lose X seconds of data in a tradeoff (e.g. do the deep fnctl() every X seconds.....)

Has this been discussed anywhere? Always happy to catch up on a mail archive...

geir




Reply via email to