A simple comparative benchmark and look at include_docs performance.

Chris Stockton Tue, 20 Apr 2010 10:37:06 -0700

Hello,

I chose to do some benchmarking/profiling on view calls to try to
understand the slow down we are experiencing. We are trying to better
understand the numbers that couchdb is putting out so we may tune and
optimize couchdb. Include docs is a big subject of our curiosity as it
causes a 5x slowdown. I am interested in any reads or documents on
scaling couchdb, hardware tips or any good discussion on performance.
I am also curious if any effort has been put into native profiling
within erlang? As my method of measurement was aimed towards a system
diagnostic and not code level as I do not know erlang very well and I
hear the curve is on the high end.


Our setup:
We are makign calls from a isolated load test environment, from a
web-server to a couchdb. I am doing system level profiling. The
web-server machine has a 100mbps link to the couchdb server,
accessable via a single network hop. Both machines have identical
hardware, the CPU is a 8 core XEON 2.13GHz, 50gigz of memory with a
disk array capable of 330MB of disk reads, roughly 7GB from cache. The
machines were completely idle between requests. I chose sar, so please
do not expect any pretty pictures =) I may give the raw output files
on request to be graphed however. The sar command used, was something
like: sar -n ALL -P ALL -urRwWbBd -p.

Our couchdb:
The total documents in the database is 170 thousand, amounting to
301.2 MB of disk usage. Currently with a single view, getByModel, a
very simple example-level view.

Here is what the view we benchmarked looks like, getByModel:
function(doc) {
  if(doc.model) {
    emit(doc.model, null);
  } else {
    emit(null);
  }
}

Here is a example of a document, this is test data with no real meaning:
{
   "_id": "<ID>",
   "_rev": "<REV>",
   "data": {
       "C365": "GORGEOUS BINGO",
       "C6263": "A Action-Packed Display of a Sumo Wrestler And a Car
who must Overcome a Waitress in A Baloon Factory",
       "C9636": "2006",
       "C12589": "4",
       "C15835": "2.99",
       "C1842": "108",
       "C21963": "26.99",
       "C24935": "R",
       "C27126": "Deleted Scenes,Behind the Scenes",
       "C30950": "Children"
   },
   "type": "model",
   "model": "<UNIQID>"
}


Calling our view results in 13.7MB of data, taking roughly 7.7 seconds
on average, giving us a throughput of 1.7 mbs. This is a reasonable
amount, even if we desire slightly more. Below is the gathered system
data throughout the lifespan of the call, with a half second buffer at
end and beginning of request. A couple things to notice is the very
large amount of CPU usage and a reasonable (in comparison to our next
test) of network throughput.

  04:59:50 PM   cswch/s
  Average:     12130.19

  04:59:50 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
  Average:          all     11.55      0.00      6.67      1.87
0.00     79.91
  Average:            0     12.87      0.00      2.33      0.00
0.00     84.79
  Average:            1     14.44      0.00     32.33     15.00
0.00     38.22
  Average:            2      7.43      0.00      2.55      0.00
0.00     90.02
  Average:            3     11.10      0.00      5.77      0.00
0.00     83.13
  Average:            4     13.01      0.00      1.89      0.00
0.00     85.09
  Average:            5     26.50      0.00      7.10      0.00
0.00     66.41
  Average:            6      6.89      0.00      1.22      0.00
0.00     91.89
  Average:            7      0.22      0.00      0.44      0.00
0.00     99.33

  04:59:50 PM       tps      rtps      wtps   bread/s   bwrtn/s
  Average:     14599.33      0.00  14599.33      0.00 223554.27

  04:59:50 PM   frmpg/s   bufpg/s   campg/s
  Average:       -10.77      0.44     -9.99

  Average:        IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s
rxcmp/s   txcmp/s  rxmcst/s
  Average:         eth0   7241.51  19118.98 506945.06 2871571.70
0.00      0.00      0.00

  Average:        IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s
txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
  Average:         eth0      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00      0.00

  04:59:57 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
avgqu-sz     await     svctm     %util
  Average:         sda2   2642.29      0.00  74518.09     28.20
0.27      0.10      0.10     26.80

  04:59:50 PM    call/s retrans/s    read/s   write/s  access/s  getatt/s
  Average:         0.00      0.00      0.00      0.00      0.00      0.00

  04:59:50 PM   scall/s badcall/s  packet/s     udp/s     tcp/s
hit/s    miss/s   sread/s  swrite/s saccess/s
  Average:         0.00      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00      0.00    0.00

  04:59:50 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
  Average:         0.00  37266.59   6216.32      0.00

  04:59:50 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
kbswpfree kbswpused  %swpused  kbswpcad
  Average:       724271  48709493     98.53    324688  46261891
8388432       168      0.00         0


Below is the same view, but with include_docs = true. We see we are
still extremely hungry for CPU, but notice a extreme drop in our
network throughput. Our download speed dropped to roughly 400kbs.

  05:03:47 PM   cswch/s
  Average:      2185.17

  05:03:47 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
  Average:          all     11.76      0.00      1.62      0.00
0.00     86.61
  Average:            0      9.89      0.00      0.50      0.00
0.00     89.60
  Average:            1      0.15      0.00      0.45      0.00
0.00     99.39
  Average:            2      9.32      0.00      0.45      0.00
0.00     90.23
  Average:            3      7.65      0.00      0.94      0.00
0.00     91.41
  Average:            4     44.35      0.00      2.36      0.00
0.00     53.30
  Average:            5      5.78      0.00      7.33      0.00
0.00     86.90
  Average:            6     16.90      0.00      0.79      0.00
0.00     82.31
  Average:            7      0.06      0.00      0.16      0.00
0.00     99.77

  05:03:47 PM  pswpin/s pswpout/s
  Average:         0.00      0.00

  05:03:47 PM       tps      rtps      wtps   bread/s   bwrtn/s
  Average:        25.42      0.00     25.42      0.00    363.22

  05:03:47 PM   frmpg/s   bufpg/s   campg/s
  Average:         0.30      0.38      1.35

  05:03:47 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s
rxcmp/s   txcmp/s  rxmcst/s
  Average:         eth0   1401.63   1403.10  98196.90 722379.00
0.00      0.00      0.00

  05:03:47 PM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s
txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
  Average:         eth0      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00      0.00

  05:03:47 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
avgqu-sz     await     svctm     %util
  Average:         sda2      5.14      0.00    121.07     23.55
0.00      0.26      0.05      0.02

  05:03:47 PM    call/s retrans/s    read/s   write/s  access/s  getatt/s
  Average:         0.00      0.00      0.00      0.00      0.00      0.00

  05:03:47 PM   scall/s badcall/s  packet/s     udp/s     tcp/s
hit/s    miss/s   sread/s  swrite/s saccess/s
  Average:         0.00      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00      0.00      0.00

  05:03:47 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
  Average:         0.00     60.54     68.39      0.00

  05:03:47 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
kbswpfree kbswpused  %swpused  kbswpcad
  Average:       730009  48703755     98.52    324807  46262164
8388432       168      0.00         0


Now this test is 4x of the above request in paralell, this one worries
us the most. We skyrocket to 190% cpu utilization across all cores,
but notice it seems the total throughput is shared among all the
requests, with only a total of 100K more bytes available then a single
request. I.E. Above we put out 400kbs, we now are to 125kbs per
connection, this saturation increases and persist to 8, 16, etc.

  05:10:56 PM   cswch/s
  Average:     55929.10

  05:10:56 PM       CPU     %user     %nice   %system   %iowait
%steal     %idle
  Average:          all     21.32      0.00      4.49      0.00
0.00     74.18
  Average:            0     25.81      0.00      2.65      0.00
0.00     71.54
  Average:            1      9.74      0.00      4.31      0.02
0.00     85.93
  Average:            2     22.06      0.00      2.18      0.00
0.00     75.76
  Average:            3     22.78      0.00      7.32      0.00
0.00     69.90
  Average:            4     22.13      0.00      2.08      0.00
0.00     75.79
  Average:            5     22.93      0.00      7.88      0.00
0.00     69.19
  Average:            6     22.59      0.00      2.20      0.00
0.00     75.21
  Average:            7     22.56      0.00      7.33      0.00
0.00     70.11

  05:10:56 PM  pswpin/s pswpout/s
  Average:         0.00      0.00

  05:10:56 PM       tps      rtps      wtps   bread/s   bwrtn/s
  Average:        -1.98      0.37      1.37

  05:10:56 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s
rxcmp/s   txcmp/s  rxmcst/s
  Average:         eth0   1642.34   1648.99 115044.53 847697.31
0.00      0.00      0.00

  05:10:56 PM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s
txdrop/s  txcarr/s  rxfram/s  rxfifo/s  txfifo/s
  Average:         eth0      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00      0.00

  05:10:56 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
avgqu-sz     await     svctm     %util
  Average:         sda2      5.14      0.00    120.68     23.47
0.00      0.31      0.15      0.08

  05:10:56 PM    call/s retrans/s    read/s   write/s  access/s  getatt/s
  Average:         0.00      0.00      0.00      0.00      0.00      0.00

  05:10:56 PM   scall/s badcall/s  packet/s     udp/s     tcp/s
hit/s    miss/s   sread/s  swrite/s saccess/s sgetatt/s
  Average:         0.00      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00      0.00      0.00

  05:10:56 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
  Average:         0.00     60.34     77.00      0.00

  05:17:38 PM kbmemfree kbmemused  %memused kbbuffers  kbcached
kbswpfree kbswpused  %swpused  kbswpcad
  Average:       726990  48706774     98.53    325232  46263715
8388432       168      0.00         0


I am open to offering other statistics or benchmarks if such
information is requested.

Kind Regards,

-Chris

A simple comparative benchmark and look at include_docs performance.

Reply via email to