> Just to follow up on this. I've pushed a test case written in Scala to > Github (https://github.com/jlcheng/couchdb-test). With this test case, I am > using Apache HttpClient to perform testing. Since I am more familiar with > Java, I was able to optimize the test case more. With Scala and CouchDB > 0.10.0 (default install from Ubuntu 10.04 LTS), I am seeing 94 inserts per > second, document size of 100k (using batch=ok). I think this pretty close to > being optimized. > However, going to a custom compiled CouchDB 1.1.0, I am seeing a mere 20 > inserts per second. I also see disk activity drop from ~7mb writes per > second to ~2mb/sec. There must be something wrong with the way I have it > compiled or configured. Any ideas where I should start? > By luck, I noticed that Apache provides an async HttpClient library. By switching to the async library, where the code does not wait on shared HTTP connections, I was able to get more reasonable performance out of CouchDB 1.1.0. Now I am seeing 200 inserts/sec at 100 kb doc size, and my CPU and disks are both heavily utilized.
This makes me wonder what changed between 0.10.0 and 1.1.0 that requires such a change in how clients should access CouchDB. Why did the traditional HttpClient code get only 20 inserts per second? Where was the bottleneck? It also points out that client code for CouchDB access may not be easy to implement (I'm not familiar with the equivalent of java.nio API in Python, for example). It makes me hesitant to recommend using Python with CouchDB right now.
