Hello,
I chose to do some benchmarking/profiling on view calls to try to
understand the slow down we are experiencing. We are trying to better
understand the numbers that couchdb is putting out so we may tune and
optimize couchdb. Include docs is a big subject of our curiosity as it
causes a 5x slowdown. I am interested in any reads or documents on
scaling couchdb, hardware tips or any good discussion on performance.
I am also curious if any effort has been put into native profiling
within erlang? As my method of measurement was aimed towards a system
diagnostic and not code level as I do not know erlang very well and I
hear the curve is on the high end.
Our setup:
We are makign calls from a isolated load test environment, from a
web-server to a couchdb. I am doing system level profiling. The
web-server machine has a 100mbps link to the couchdb server,
accessable via a single network hop. Both machines have identical
hardware, the CPU is a 8 core XEON 2.13GHz, 50gigz of memory with a
disk array capable of 330MB of disk reads, roughly 7GB from cache. The
machines were completely idle between requests. I chose sar, so please
do not expect any pretty pictures =) I may give the raw output files
on request to be graphed however. The sar command used, was something
like: sar -n ALL -P ALL -urRwWbBd -p.
Our couchdb:
The total documents in the database is 170 thousand, amounting to
301.2 MB of disk usage. Currently with a single view, getByModel, a
very simple example-level view.
Here is what the view we benchmarked looks like, getByModel:
function(doc) {
if(doc.model) {
emit(doc.model, null);
} else {
emit(null);
}
}
Here is a example of a document, this is test data with no real meaning:
{
"_id": "<ID>",
"_rev": "<REV>",
"data": {
"C365": "GORGEOUS BINGO",
"C6263": "A Action-Packed Display of a Sumo Wrestler And a Car
who must Overcome a Waitress in A Baloon Factory",
"C9636": "2006",
"C12589": "4",
"C15835": "2.99",
"C1842": "108",
"C21963": "26.99",
"C24935": "R",
"C27126": "Deleted Scenes,Behind the Scenes",
"C30950": "Children"
},
"type": "model",
"model": "<UNIQID>"
}
Calling our view results in 13.7MB of data, taking roughly 7.7 seconds
on average, giving us a throughput of 1.7 mbs. This is a reasonable
amount, even if we desire slightly more. Below is the gathered system
data throughout the lifespan of the call, with a half second buffer at
end and beginning of request. A couple things to notice is the very
large amount of CPU usage and a reasonable (in comparison to our next
test) of network throughput.
04:59:50 PM cswch/s
Average: 12130.19
04:59:50 PM CPU %user %nice %system %iowait
%steal %idle
Average: all 11.55 0.00 6.67 1.87
0.00 79.91
Average: 0 12.87 0.00 2.33 0.00
0.00 84.79
Average: 1 14.44 0.00 32.33 15.00
0.00 38.22
Average: 2 7.43 0.00 2.55 0.00
0.00 90.02
Average: 3 11.10 0.00 5.77 0.00
0.00 83.13
Average: 4 13.01 0.00 1.89 0.00
0.00 85.09
Average: 5 26.50 0.00 7.10 0.00
0.00 66.41
Average: 6 6.89 0.00 1.22 0.00
0.00 91.89
Average: 7 0.22 0.00 0.44 0.00
0.00 99.33
04:59:50 PM tps rtps wtps bread/s bwrtn/s
Average: 14599.33 0.00 14599.33 0.00 223554.27
04:59:50 PM frmpg/s bufpg/s campg/s
Average: -10.77 0.44 -9.99
Average: IFACE rxpck/s txpck/s rxbyt/s txbyt/s
rxcmp/s txcmp/s rxmcst/s
Average: eth0 7241.51 19118.98 506945.06 2871571.70
0.00 0.00 0.00
Average: IFACE rxerr/s txerr/s coll/s rxdrop/s
txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
Average: eth0 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
04:59:57 PM DEV tps rd_sec/s wr_sec/s avgrq-sz
avgqu-sz await svctm %util
Average: sda2 2642.29 0.00 74518.09 28.20
0.27 0.10 0.10 26.80
04:59:50 PM call/s retrans/s read/s write/s access/s getatt/s
Average: 0.00 0.00 0.00 0.00 0.00 0.00
04:59:50 PM scall/s badcall/s packet/s udp/s tcp/s
hit/s miss/s sread/s swrite/s saccess/s
Average: 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00
04:59:50 PM pgpgin/s pgpgout/s fault/s majflt/s
Average: 0.00 37266.59 6216.32 0.00
04:59:50 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbswpfree kbswpused %swpused kbswpcad
Average: 724271 48709493 98.53 324688 46261891
8388432 168 0.00 0
Below is the same view, but with include_docs = true. We see we are
still extremely hungry for CPU, but notice a extreme drop in our
network throughput. Our download speed dropped to roughly 400kbs.
05:03:47 PM cswch/s
Average: 2185.17
05:03:47 PM CPU %user %nice %system %iowait
%steal %idle
Average: all 11.76 0.00 1.62 0.00
0.00 86.61
Average: 0 9.89 0.00 0.50 0.00
0.00 89.60
Average: 1 0.15 0.00 0.45 0.00
0.00 99.39
Average: 2 9.32 0.00 0.45 0.00
0.00 90.23
Average: 3 7.65 0.00 0.94 0.00
0.00 91.41
Average: 4 44.35 0.00 2.36 0.00
0.00 53.30
Average: 5 5.78 0.00 7.33 0.00
0.00 86.90
Average: 6 16.90 0.00 0.79 0.00
0.00 82.31
Average: 7 0.06 0.00 0.16 0.00
0.00 99.77
05:03:47 PM pswpin/s pswpout/s
Average: 0.00 0.00
05:03:47 PM tps rtps wtps bread/s bwrtn/s
Average: 25.42 0.00 25.42 0.00 363.22
05:03:47 PM frmpg/s bufpg/s campg/s
Average: 0.30 0.38 1.35
05:03:47 PM IFACE rxpck/s txpck/s rxbyt/s txbyt/s
rxcmp/s txcmp/s rxmcst/s
Average: eth0 1401.63 1403.10 98196.90 722379.00
0.00 0.00 0.00
05:03:47 PM IFACE rxerr/s txerr/s coll/s rxdrop/s
txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
Average: eth0 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
05:03:47 PM DEV tps rd_sec/s wr_sec/s avgrq-sz
avgqu-sz await svctm %util
Average: sda2 5.14 0.00 121.07 23.55
0.00 0.26 0.05 0.02
05:03:47 PM call/s retrans/s read/s write/s access/s getatt/s
Average: 0.00 0.00 0.00 0.00 0.00 0.00
05:03:47 PM scall/s badcall/s packet/s udp/s tcp/s
hit/s miss/s sread/s swrite/s saccess/s
Average: 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00
05:03:47 PM pgpgin/s pgpgout/s fault/s majflt/s
Average: 0.00 60.54 68.39 0.00
05:03:47 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbswpfree kbswpused %swpused kbswpcad
Average: 730009 48703755 98.52 324807 46262164
8388432 168 0.00 0
Now this test is 4x of the above request in paralell, this one worries
us the most. We skyrocket to 190% cpu utilization across all cores,
but notice it seems the total throughput is shared among all the
requests, with only a total of 100K more bytes available then a single
request. I.E. Above we put out 400kbs, we now are to 125kbs per
connection, this saturation increases and persist to 8, 16, etc.
05:10:56 PM cswch/s
Average: 55929.10
05:10:56 PM CPU %user %nice %system %iowait
%steal %idle
Average: all 21.32 0.00 4.49 0.00
0.00 74.18
Average: 0 25.81 0.00 2.65 0.00
0.00 71.54
Average: 1 9.74 0.00 4.31 0.02
0.00 85.93
Average: 2 22.06 0.00 2.18 0.00
0.00 75.76
Average: 3 22.78 0.00 7.32 0.00
0.00 69.90
Average: 4 22.13 0.00 2.08 0.00
0.00 75.79
Average: 5 22.93 0.00 7.88 0.00
0.00 69.19
Average: 6 22.59 0.00 2.20 0.00
0.00 75.21
Average: 7 22.56 0.00 7.33 0.00
0.00 70.11
05:10:56 PM pswpin/s pswpout/s
Average: 0.00 0.00
05:10:56 PM tps rtps wtps bread/s bwrtn/s
Average: -1.98 0.37 1.37
05:10:56 PM IFACE rxpck/s txpck/s rxbyt/s txbyt/s
rxcmp/s txcmp/s rxmcst/s
Average: eth0 1642.34 1648.99 115044.53 847697.31
0.00 0.00 0.00
05:10:56 PM IFACE rxerr/s txerr/s coll/s rxdrop/s
txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
Average: eth0 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
05:10:56 PM DEV tps rd_sec/s wr_sec/s avgrq-sz
avgqu-sz await svctm %util
Average: sda2 5.14 0.00 120.68 23.47
0.00 0.31 0.15 0.08
05:10:56 PM call/s retrans/s read/s write/s access/s getatt/s
Average: 0.00 0.00 0.00 0.00 0.00 0.00
05:10:56 PM scall/s badcall/s packet/s udp/s tcp/s
hit/s miss/s sread/s swrite/s saccess/s sgetatt/s
Average: 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00
05:10:56 PM pgpgin/s pgpgout/s fault/s majflt/s
Average: 0.00 60.34 77.00 0.00
05:17:38 PM kbmemfree kbmemused %memused kbbuffers kbcached
kbswpfree kbswpused %swpused kbswpcad
Average: 726990 48706774 98.53 325232 46263715
8388432 168 0.00 0
I am open to offering other statistics or benchmarks if such
information is requested.
Kind Regards,
-Chris