After a couple more days of benchmarking and trying all the suggestions, here is what I found out:
On a dual core pentium 3.0ghz with erlang 5.6 and couch 0.8.0 using bulk writes, I get throughput of 95 writes / second . I didn't get the 2000 per second that Michael did, but that is likely due to the fact that his documents are considerably smaller than mine (each of my docs has a 4K-10K attachment). By upgrading to the latest couchdb from svn, this improved from 95 / second to about 150 / second. On a quad core xeon 3.0ghz with erlang 5.6 and couch 0.8.0 using bulk writes, I get throughput of 120 writes / second . Upgrading to svn head pushed this up to about 200 / second. (woohoo, we are down from 2 weeks to 4 days do import my data). With both boxes I tried using a ram disk to verify writes were not bounded by IO and got the exact same performance. On both boxes, I also tried parallelizing the writes among multiple databases (same couch instance), and got the exact same throughput. However, on the quad core, if I fired up two copies of couch running on two separate ports, and parallelized across the two ports, throughput rose to just under 400/second, and all 4 cores were utilized. I understand why couchdb serializes writes through a single updater thread for a single database file. Clearly letting to threads write to the same file can break consistency. However, it seems to me (and I make this comment knowing very little about erlang), that each database should be able to get it's own updater thread, or at least have as many updater threads as there are cpu's on the box. Is there a reason this wasn't the design? Also, are there any major gotcha's I should be concerned about in terms of file formats between versions. If we start importing data using the 0.8 branch, how hard will an upgrade to 0.9 be? The reason I ask is that I am dealing with about 300 GB of data. If the upgrade process will require running some conversion process over the old tables, I would like to start putting together some estimate of how much time that will require. Thanks again for the response. Josh Chris Anderson wrote: > On Wed, Jan 7, 2009 at 5:47 PM, Josh Bryan <[email protected]> wrote: > >> if I partition the data into two DBs and fire up two copies of couch, I >> should be able to make use of another processor on the same machine? I'll >> test this tomorrow along with the newer versions. >> >> > > Please do share your results. I am aware of some multi-core testing > that's been done on Solaris and exotic Sun boxes, but knowing how this > works for you (and making it work better) is important to us. > > Community: this is the perfect time where a standard benchmarking > suite would be sweet. If anyone steps up to the plate on this, they > win log(1000) internets. > > >
