Hi Huw, thanks for this detailed report. I'll respond with a few suggestions inline:
On Dec 1, 2010, at 6:30 AM, Huw Selley wrote: > Hi, > > I have been doing some performance testing with couch and am hoping someone > here will be able to help me ascertain if/how I can get higher throughput. > > Scenario: > > I am trying to measure max couch throughput - for these tests im happy with > just repeatedly requesting the same document. > I have some reasonable boxes to perform these tests - they have dual quad > core X5550 CPUs with HyperThreading enabled and 24GB RAM. So the Erlang VM starts 16 schedulers by default, right? Some people have reported improvements in Erlang application performance with HyperThreading disabled, but I've not heard of any CouchDB-specific tests of that option yet. > These boxes have a stock install of oracle enterprise linux 5 on them (which > is pretty much RHEL5). > The oracle supplied erlang version is R12B5 and I am using couch 1.0.1 built > from source. Newer versions of Erlang have much much better symmetric multiprocessing performance, so not too surprising you saw a big boost when you upgraded. > The database is pretty small (just under 100K docs) and I am querying a view > that includes some other docs (the request contains include_docs=true) and > using jmeter on another identical box to generate the traffic. include_docs=true is definitely more work at read time than embedding the docs in the view index. I'm not sure about your application design constraints, but given that your database and index seem to fit entirely in RAM at the moment you could experiment with emitting the doc in your map function instead ... > The total amount of data returned from the request is 1467 bytes. ... especially when the documents are this small. > For all of my tests I capture system state using sadc and there is nothing > else happening on these boxes. > > In my initial round of testing I found that I was only getting ~126 > requests/s throughput which surprised me somewhat. Looking at the generated > graphs from the test run there were plenty of resources to go round - the > disk controller was nowhere near busy and neither was the cpu. > > Before coming here to question my findings I took a 3rd box (same spec) and > built couch from the tip of the 1.1.x branch (rev 1040477). After compiling > couch and installing it I found that it didn't start up (or log anything > useful). After a bit of digging I figured it's probably due to the age of the > erlang version being used - I upgraded to OTP R14B and rebuilt couch against > it. This gave me a working install again. Hmm, I've heard that we did something to break compatibility with 12B-5 recently. We should either fix it or bump the required version. Thanks for the note. > I got an immediate throughput increase to ~500 requests/s which was nice but > the data being collected via sadc still showed that the cpu was at most 20% > utilised and the disk controller was doing next to nothing (I assume the OS > cache already has the data requested so no trip to disk required?) > > At this point I started to wonder if jmeter is unable to send in enough > requests to stress couch so I started up another jmeter instance on another > box and had it also send in requests to couch. What i noticed was that the > total throughput didn't increase - it was just split over both jmeter > instances. How many concurrent requests are submitted by each jmeter instance? > This made me start to think maybe there is something going on in the erlang > vm that's stopping me getting higher throughput. Did some digging around and > read this: > > http://erlang.2086793.n4.nabble.com/Some-facts-about-Erlang-and-SMP-td2108770.html > > Granted the information is a bit stale but that post made me start thinking > that maybe I am seeing contention around the run-queue. > I see that in R14B I can pass the erlang vm the '+S N:N' flag to control the > number of run-queues and how many of them are active. I did a bit of tweaking > and ended up getting 700 requests/s by using '+S 16:2". I don't seem to be > able to get any more than this though and the system is still not really > stressed - CPU is just under 20% and very little disk i/o. > > Can anyone offer up any advice/suggestions on where to go next? Do you know if the CPU load was spread across cores or concentrated on a single one? One thing Kenneth did not mention in that thread is that you can now bind Erlang schedulers to specific cores. By default the schedulers are unbound; maybe RHEL is doing a poor job of distributing them. You can bind them using the default strategy for your CPUs by starting the VM with the "+sbt db" option. So, in short, I would experiment with disabling HyperThreading and binding schedulers on the OS side of things, and if it makes sense for your application try emitting the document body in the view index. Let us know how it goes. Regards, Adam > > Thanks in advance > Huw
