Re: [osv-dev] OSv vs Docker vs Linux networking performance comparison

Dor Laor Tue, 26 Mar 2019 17:32:55 -0700

While the performance numbers indicate something, a mac book is a horrible
environment for performance
testing. There are effects of other desktop apps, hyperthreading, etc.
Also 1gbps network can be a bottle neck. Every benchmark case should have a
matching performance
analysis and point to the bottleneck reason - cpu/networking/contect
switching/locking/filesystem/..
Just hyperthread vs a different thread in another core is very significant
change.
Need to pin the qemu threads in the host to the right physical threads.


Better to run on a good physical server (like i3.metal on AWS or similar,
could be smaller but not 2 cores) and
track all the metrics appropriately. Best is to isolate workloads (and make
sure they scale linearly too)  in terms of cpu/mem/net/disk and only then
show how a more complex workload performs.

On Tue, Mar 26, 2019 at 3:29 PM Waldek Kozaczuk <jwkozac...@gmail.com>
wrote:

> Last week I spent some time investigating OSv performance and comparing it
> to Docker and Linux guests. To that end I adopted
> "unikernels-v-containers"' repo by Tom Goethals and extended it with 2 new
> apps (Rust and Node.js) and new scripts to build and deploy OSv apps on
> QEMU/KVM - https://github.com/wkozaczuk/unikernels-v-containers. So as
> you can see my focus was on OSv on QEMU/KVM and firecracker vs Linux on
> firecracker vs Docker whereas Tom's paper was comparing OSv on Xen vs
> Docker (details of discussion around it and the link to the paper you can
> find here - https://groups.google.com/forum/#!topic/osv-dev/lhkqFfzbHwk).
>
> Specifically I wanted to compare networking performance in terms of number
> of REST API requests per second processed by a typical microservice app
> implemented in Rust (built using hyper), Golang and Java (built using
> vertx.io) and running on following:
>
>    - OSv on QEMU/KVM
>    - OSv on firecracker
>    - Docker container
>    - Linux on firecracker
>
> Each app in essence implements simple todo REST api returning a json
> payload 100-200 characters long (for example see here Java one -
> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi/java-osv/src/main/java/rest/SimpleREST.java).
> The source code of all apps is under this subtree -
> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi.
> One thing to not was that each request would return always the same payload
> (I wonder if that may cause the response gets cached and affects results).
>
> The test setup looked like this:
>
> *Host:*
>
>    - MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus
>    reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it
>    - firecracker 0.15.0
>    - QEMU 2.12.0
>
>
> *Client machine:*
>
>    - similar to the one above with wrk as a test client firing requests
>    using 10 threads and 100 open connections for 30 seconds in 3 series one by
>    one (please see this test script -
>    
> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh
>    ).
>    - wrk by default uses Keep-Alive for http connections so TCP handshake
>    is minimal
>
> The host and client machine were connected directly to 1 GBit ethernet
> switch and host exposed guest IP using a bridged TAP nic (please see the
> script used -
> https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh
> ).
>
> You can find scripts to start applications on OSv and docker here -
> https://github.com/wkozaczuk/unikernels-v-containers (run* scripts).
> Please note --cpu-set parameter used in docker script to limit number of
> CPUs.
>
> You can find detailed results under
> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote
> .
>
> Here are just requests per seconds numbers (full example -
> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk
> )
>
> OSv on QEMU
> *Golang*
> *1 CPU*
> Requests/sec:  24313.06
> Requests/sec:  23874.74
> Requests/sec:  23300.26
>
> *2 CPUs*
> Requests/sec:  37089.26
> Requests/sec:  35475.22
> Requests/sec:  33581.87
>
> *4 CPUs*
> Requests/sec:  42747.11
> Requests/sec:  43057.99
> Requests/sec:  42346.27
>
> *Java*
> *1 CPU*
> Requests/sec:  41049.41
> Requests/sec:  43622.81
> Requests/sec:  44777.60
> *2 CPUs*
> Requests/sec:  46245.95
> Requests/sec:  45746.48
> Requests/sec:  46224.42
> *4 CPUs*
> Requests/sec:  48128.33
> Requests/sec:  45467.53
> Requests/sec:  45776.45
>
> *Rust*
>
> *1 CPU*
> Requests/sec:  43455.34
> Requests/sec:  43927.73
> Requests/sec:  41100.07
>
> *2 CPUs*
> Requests/sec:  49120.31
> Requests/sec:  49298.28
> Requests/sec:  48076.98
> *4 CPUs*
> Requests/sec:  51477.57
> Requests/sec:  51587.92
> Requests/sec:  49118.68
>
> OSv on firecracker
> *Golang*
>
> *1 cpu*
> Requests/sec:  16721.56
> Requests/sec:  16422.33
> Requests/sec:  16540.24
>
> *2 cpus*
> Requests/sec:  28538.35
> Requests/sec:  26676.68
> Requests/sec:  28100.00
>
> *4 cpus*
> Requests/sec:  36448.57
> Requests/sec:  33808.45
> Requests/sec:  34383.20
>
>
> *Java*
> *1 cpu*
> Requests/sec:  20191.95
> Requests/sec:  21384.60
> Requests/sec:  21705.82
>
> *2 cpus*
> Requests/sec:  40876.17
> Requests/sec:  40625.69
> Requests/sec:  43766.45
> 4 cpus
> Requests/sec:  46336.07
> Requests/sec:  45933.35
> Requests/sec:  45467.22
>
>
> *Rust*
> *1 cpu*
> Requests/sec:  23604.27
> Requests/sec:  23379.86
> Requests/sec:  23477.19
>
> *2 cpus*
> Requests/sec:  46973.84
> Requests/sec:  46590.41
> Requests/sec:  46128.15
>
> *4 cpus*
> Requests/sec:  49491.98
> Requests/sec:  50255.20
> Requests/sec:  50183.11
>
> Linux on firecracker
> *Golang*
>
> *1 CPU*
> Requests/sec:  14498.02
> Requests/sec:  14373.21
> Requests/sec:  14213.61
>
> *2 CPU*
> Requests/sec:  28201.27
> Requests/sec:  28600.92
> Requests/sec:  28558.33
>
> *4 CPU*
> Requests/sec:  48983.83
> Requests/sec:  47590.97
> Requests/sec:  45758.82
>
> *Java*
>
> *1 CPU*
> Requests/sec:  18217.58
> Requests/sec:  17709.30
> Requests/sec:  19829.01
>
> *2 CPU*
> Requests/sec:  33188.75
> Requests/sec:  33233.55
> Requests/sec:  36951.05
>
> *4 CPU*
> Requests/sec:  47718.13
> Requests/sec:  46456.51
> Requests/sec:  48408.99
>
> *Rust*
> Could not get same rust on Alpine linux that uses musl
>
> Docker
> *Golang*
>
> *1 CPU*
> Requests/sec:  24568.70
> Requests/sec:  24621.82
> Requests/sec:  24451.52
>
> *2 CPU*
> Requests/sec:  49366.54
> Requests/sec:  48510.87
> Requests/sec:  43809.97
>
> *4 CPU*
> Requests/sec:  53613.09
> Requests/sec:  53033.38
> Requests/sec:  51422.59
>
> *Java*
>
> *1 CPU*
> Requests/sec:  40078.52
> Requests/sec:  43850.54
> Requests/sec:  44588.22
>
> *2 CPUs*
> Requests/sec:  48792.39
> Requests/sec:  51170.05
> Requests/sec:  52033.04
>
> *4 CPUs*
> Requests/sec:  51409.24
> Requests/sec:  52756.73
> Requests/sec:  47126.19
>
> *Rust*
>
> *1 CPU*Requests/sec:  40220.04
> Requests/sec:  44601.38
> Requests/sec:  44419.06
>
> *2 CPUs*
> Requests/sec:  53420.56
> Requests/sec:  53490.33
> Requests/sec:  53320.99
>
> *4 CPUs*
> Requests/sec:  53892.23
> Requests/sec:  52814.93
> Requests/sec:  54050.13
>
> Full example (Rust 4 CPUs -
> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk
> ):
> [{"name":"Write
> presentation","completed":false,"due":"2019-03-23T15:30:40.579556117+00:00"},{"name":"Host
> meetup","completed":false,"due":"2019-03-23T15:30:40.579599959+00:00"},{"name":"Run
> tests","completed":false,"due":"2019-03-23T15:30:40.579600610+00:00"},{"name":"Stand
> in
> traffic","completed":false,"due":"2019-03-23T15:30:40.579601081+00:00"},{"name":"Learn
> Rust","completed":false,"due":"2019-03-23T15:30:40.579601548+00:00"}]-----------------------------------
> Running 30s test @ http://192.168.1.73:8080/todos
>   10 threads and 100 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.86ms    1.20ms  30.81ms   62.92%
>     Req/Sec     5.42k   175.14     5.67k    87.71%
>   1622198 requests in 30.10s, 841.55MB read
> Requests/sec:  53892.23
> Transfer/sec:     27.96MB
> -----------------------------------
> Running 30s test @ http://192.168.1.73:8080/todos
>   10 threads and 100 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.90ms    1.19ms   8.98ms   58.18%
>     Req/Sec     5.31k   324.18     5.66k    90.10%
>   1589778 requests in 30.10s, 824.73MB read
> Requests/sec:  52814.93
> Transfer/sec:     27.40MB
> -----------------------------------
> Running 30s test @ http://192.168.1.73:8080/todos
>   10 threads and 100 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.85ms    1.14ms   8.39ms   54.70%
>     Req/Sec     5.44k   204.22     7.38k    92.12%
>   1626902 requests in 30.10s, 843.99MB read
> Requests/sec:  54050.13
> Transfer/sec:     28.04MB
>
> I am also enclosing an example of iperf run between client and server
> machine to illustrate type of raw network bandwidth (BTW I test against
> iperf running on host natively and on OSv on qemu and firecracker I got
> pretty much identical results ~ 940 MBits/sec - see
> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote
> ).
>
> Connecting to host 192.168.1.102, port 5201
> [  5] local 192.168.1.98 port 65179 connected to 192.168.1.102 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   111 MBytes   930 Mbits/sec
> [  5]   1.00-2.00   sec   111 MBytes   932 Mbits/sec
> [  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec
> [  5]   3.00-4.00   sec   112 MBytes   939 Mbits/sec
> [  5]   4.00-5.00   sec   112 MBytes   940 Mbits/sec
> [  5]   5.00-6.00   sec   111 MBytes   933 Mbits/sec
> [  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec
> [  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec
> [  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec
> [  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
> [  5]  10.00-11.00  sec   112 MBytes   939 Mbits/sec
> [  5]  11.00-12.00  sec   112 MBytes   941 Mbits/sec
> [  5]  12.00-13.00  sec   112 MBytes   941 Mbits/sec
> [  5]  13.00-14.00  sec   112 MBytes   942 Mbits/sec
> [  5]  14.00-15.00  sec   112 MBytes   941 Mbits/sec
> [  5]  15.00-16.00  sec   111 MBytes   927 Mbits/sec
> [  5]  16.00-17.00  sec   112 MBytes   941 Mbits/sec
> [  5]  17.00-18.00  sec   112 MBytes   942 Mbits/sec
> [  5]  18.00-19.00  sec   112 MBytes   941 Mbits/sec
> [  5]  19.00-20.00  sec   112 MBytes   941 Mbits/sec
> [  5]  20.00-21.00  sec   112 MBytes   936 Mbits/sec
> [  5]  21.00-22.00  sec   112 MBytes   940 Mbits/sec
> [  5]  22.00-23.00  sec   112 MBytes   941 Mbits/sec
> [  5]  23.00-24.00  sec   112 MBytes   941 Mbits/sec
> [  5]  24.00-25.00  sec   112 MBytes   941 Mbits/sec
> [  5]  25.00-26.00  sec   112 MBytes   941 Mbits/sec
> [  5]  26.00-27.00  sec   112 MBytes   940 Mbits/sec
> [  5]  27.00-28.00  sec   112 MBytes   941 Mbits/sec
> [  5]  28.00-29.00  sec   112 MBytes   940 Mbits/sec
> [  5]  29.00-30.00  sec   112 MBytes   941 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec
> sender
> [  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec
> receiver
>
> iperf Done.
>
>
> Observations/Conclusions
>
>    - OSv fares a little better on QEMU/KVM than firecracker and that
>    varies from ~5% to ~20% (Golang). Also please note vast difference between
>    1 cpu test results on firecracker and QEMU (hyperthreading is handled
>    differently). On QEMU there is a small bump from 1 to 2 to 4 cpus except
>    for Golang, on firecracker there is almost ~90-100% bump from 1 to 2 cpus.
>       - To that end I have opened firecracker issue -
>       https://github.com/firecracker-microvm/firecracker/issues/1034.
>       - When you compare OSv on firecracker vs Linux on firecracker
>    (comparing OSv on QEMU would be I guess unfair) you can see that:
>       - Golang app on OSv was ~ 15% better vs on Linux with 1 cpu, almost
>       identical with 2 cpus and app being faster on Linux ~30% with 4 CPUs (I 
> did
>       check that Golang runtime properly detects number of cpus)
>       - Java app on OSv was ~ 5% faster with 1 CPU, ~ 20% faster with 2
>       CPUs and slightly slower with 4 CPUs
>       - Could not run Rust app on Linux because it was alpine
>       distribution built with musl and I did not have time to get Rust build
>       properly for that scenario
>    - When you compare OSv on QEMU/KVM vs Docker you can see that:
>       - All apps running with single CPU fares almost the same with OSv
>       being sometimes a little faster
>       - Java and Rust apps performed only a little better (2-10%) on
>       Docker vs OSv
>       - Golang on OSv scaled well with number of CPUs but performed much
>       worse on OSv (20-30%) with 2 and 4 cpus
>    - There seems to be a bottleneck around 40-50K requests per seconds
>    somewhere. Looking at one result, the raw network rate reported was around
>    26-28MB per second. GIven that HTTP requests require sending request and
>    response possibly that is what is the maximum the network - combination of
>    ethernet switch and server and client machines - can handle?
>
>
> Questions
>
>    - Are there any flaws in this test setup?
>    - Why does OSv not scale in some scenarios - especially when bumping
>    from 2 to 4 cpus?? Networking bottleneck? Scheduler? Locks?
>    - Could we further optimize OSv running with single CPU (skip global
>    cross-CPU page allocator, etc)?
>
>
> To get even more insight I also compared how OSv on QEMU would fare
> against same app running in Docker with wrk running on the host and firing
> requests locally. You can find the results under
> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/host
> .
>
> OSv on QEMU
> *Golang*
>
> *1 CPU*
> Requests/sec:  25188.60
> Requests/sec:  24664.43
> Requests/sec:  23935.77
> *2 CPUs*
> Requests/sec:  37118.95
> Requests/sec:  37108.96
> Requests/sec:  35997.58
>
> *4 CPUs*
> Requests/sec:  49987.20
> Requests/sec:  48710.74
> Requests/sec:  44789.96
>
>
> *Java*
> *1 CPU*
> Requests/sec:  43648.02
> Requests/sec:  45457.98
> Requests/sec:  41818.13
>
> *2 CPUs*
> Requests/sec:  76224.39
> Requests/sec:  75734.63
> Requests/sec:  70597.35
>
> *4 CPUs*
> Requests/sec:  80543.30
> Requests/sec:  75187.46
> Requests/sec:  72986.93
>
>
> *Rust*
> *1 CPU*
> Requests/sec:  42392.75
> Requests/sec:  39679.21
> Requests/sec:  37871.49
>
> *2 CPUs*
> Requests/sec:  82484.67
> Requests/sec:  83272.65
> Requests/sec:  71671.13
>
> *4 CPUs*
> Requests/sec:  95910.23
> Requests/sec:  86811.76
> Requests/sec:  83213.93
>
>
> Docker
>
> *Golang*
> *1 CPU*
> Requests/sec:  24191.63
> Requests/sec:  23574.89
> Requests/sec:  23716.33
>
> *2 CPUs*
> Requests/sec:  34889.01
> Requests/sec:  34487.01
> Requests/sec:  34468.03
>
> *4 CPUs*
> Requests/sec:  48850.24
> Requests/sec:  48690.09
> Requests/sec:  48356.66
>
>
> *Java*
> *1 CPU*
> Requests/sec:  32267.09
> Requests/sec:  34670.41
> Requests/sec:  34828.68
>
> *2 CPUs*
> Requests/sec:  47533.94
> Requests/sec:  50734.05
> Requests/sec:  50203.98
>
> *4 CPUs*
> Requests/sec:  69644.61
> Requests/sec:  72704.40
> Requests/sec:  70805.84
>
>
> *Rust*
> *1 CPU*
> Requests/sec:  37061.52
> Requests/sec:  36637.62
> Requests/sec:  33154.57
>
> *2 CPUs*
> Requests/sec:  51743.94
> Requests/sec:  51476.78
> Requests/sec:  50934.27
>
> *4 CPUs*
> Requests/sec:  75125.41
> Requests/sec:  74051.27
> Requests/sec:  74434.78
>
>    - Does this test even make sense?
>    - As you can see OSv outperforms docker in this scenario to various
>    degree by 5-20%.  Can anybody explain why? Is it because in this case iboth
>    wrk and apps are on the same machine and number of context switches are
>    fewer between kernel and user mode in favor of OSv? Does it mean that we
>    could benefit from a setup with a load balancer (for example like haproxy
>    or squid) that would be running on the same host in user mode and
>    forwarding to single-CPU OSv instances vs single OSv with multiple CPUs?
>
> Looking forward to hear what others think.
>
> Waldek
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [osv-dev] OSv vs Docker vs Linux networking performance comparison

Reply via email to