Josh, I found that bulk loading is significantly faster, if you can format your documents into a file. Sometimes that is not so handy to do. Of course, substitute your IP, and your db name for 'test'
$ curl -X POST --data @file_of_docs http://127.0.0.1:5984/test/_bulk_docs where file_of_docs looks, literally, like { "docs" : [ {"name":"test_one" , "date":"Sun Jan 4, 2008" , "place":"Portland" } , more documents ... {"name":"test_n" , "date":"Tue Jan 6, 2008" , "place":"Portland" } ] } I say "literally" as in, the quotes you see are the quotes you need, no escaping and no extra quotes before/after the leading/trailing { brackets } Nothing needed escaping in the above. Possibly some characters would need escaping, but not whitespace. Using Erlang R12B-5 and couchdb - Apache CouchDB 0.9.0a730600-incubating I loaded 10,000 of the above short docs in about 5 seconds. Both Erlang and couchdb compiled from scratch on the following machine. Linux version 2.6.24-21-server (bui...@palmer) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Wed Oct 22 00:18:13 UTC 2008 (Ubuntu 2.6.24-21.43-server) dmesg says server has: Memory: 1538064k/1563840k available CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01 CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01 Total of 2 processors activated (11973.88 BogoMIPS) ~Michael On Wed, Jan 07, 2009 at 06:37:32PM -0600, Josh Bryan wrote: > Hi, > > I am looking into CouchDB as a solution to store a bunch (approx 70 > million) archived documents. While planning for the import process, I > did some benchmarking to figure out how long the import will take. I > get about 50-70 inserts per second on average. However, when I looked > for the bottleneck, I couldn't figure it out. I am connected to the > database via a fast lan and can verify that the network is not > saturated. I can also verify that disk IO is not saturated. The only > clue is that of the 4 cpus on the server, it seems that only one is > getting fully loaded. Also, of the 5 erlang processes I can see > running, only one of them seems to be getting most of the cpu time. I > know that erlang is built with smp enabled, so if it is cpu bound, why > can't it make use of the other 3 processors? > > I thought that perhaps there was some internal write lock issue per > database that allowed only one thread to write to a db at a time, so I > tried running the benchmarks while hitting multiple databases, but still > got the same write rate across the databases. Is there some globally > shared resource in couchdb that limits all writes to a single thread? > > Thanks, > Josh -- Michael McDaniel Portland, Oregon, USA http://trip.autosys.us http://autosys.us
