Re: [osv-dev] OSv vs Docker vs Linux networking performance comparison

Matias Vara Wed, 27 Mar 2019 03:59:53 -0700

Hello Waldek,

The experiments are very interesting. I showed something similar at
OSSumit'18 (see
https://github.com/torokernel/papers/blob/master/OSSummit18.pdf). What I do
not understand from your conclusions is why do you expect that OSv scales
with the number of cores? Maybe I did not understand something.


Matias

El mar., 26 mar. 2019 a las 23:29, Waldek Kozaczuk (<[email protected]>)
escribió:

> Last week I spent some time investigating OSv performance and comparing it
> to Docker and Linux guests. To that end I adopted
> "unikernels-v-containers"' repo by Tom Goethals and extended it with 2 new
> apps (Rust and Node.js) and new scripts to build and deploy OSv apps on
> QEMU/KVM - https://github.com/wkozaczuk/unikernels-v-containers. So as
> you can see my focus was on OSv on QEMU/KVM and firecracker vs Linux on
> firecracker vs Docker whereas Tom's paper was comparing OSv on Xen vs
> Docker (details of discussion around it and the link to the paper you can
> find here - https://groups.google.com/forum/#!topic/osv-dev/lhkqFfzbHwk).
>
> Specifically I wanted to compare networking performance in terms of number
> of REST API requests per second processed by a typical microservice app
> implemented in Rust (built using hyper), Golang and Java (built using
> vertx.io) and running on following:
>
>    - OSv on QEMU/KVM
>    - OSv on firecracker
>    - Docker container
>    - Linux on firecracker
>
> Each app in essence implements simple todo REST api returning a json
> payload 100-200 characters long (for example see here Java one -
> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi/java-osv/src/main/java/rest/SimpleREST.java).
> The source code of all apps is under this subtree -
> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi.
> One thing to not was that each request would return always the same payload
> (I wonder if that may cause the response gets cached and affects results).
>
> The test setup looked like this:
>
> *Host:*
>
>    - MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus
>    reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it
>    - firecracker 0.15.0
>    - QEMU 2.12.0
>
>
> *Client machine:*
>
>    - similar to the one above with wrk as a test client firing requests
>    using 10 threads and 100 open connections for 30 seconds in 3 series one by
>    one (please see this test script -
>    
> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh
>    ).
>    - wrk by default uses Keep-Alive for http connections so TCP handshake
>    is minimal
>
> The host and client machine were connected directly to 1 GBit ethernet
> switch and host exposed guest IP using a bridged TAP nic (please see the
> script used -
> https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh
> ).
>
> You can find scripts to start applications on OSv and docker here -
> https://github.com/wkozaczuk/unikernels-v-containers (run* scripts).
> Please note --cpu-set parameter used in docker script to limit number of
> CPUs.
>
> You can find detailed results under
> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote
> .
>
> Here are just requests per seconds numbers (full example -
> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk
> )
>
> OSv on QEMU
> *Golang*
> *1 CPU*
> Requests/sec:  24313.06
> Requests/sec:  23874.74
> Requests/sec:  23300.26
>
> *2 CPUs*
> Requests/sec:  37089.26
> Requests/sec:  35475.22
> Requests/sec:  33581.87
>
> *4 CPUs*
> Requests/sec:  42747.11
> Requests/sec:  43057.99
> Requests/sec:  42346.27
>
> *Java*
> *1 CPU*
> Requests/sec:  41049.41
> Requests/sec:  43622.81
> Requests/sec:  44777.60
> *2 CPUs*
> Requests/sec:  46245.95
> Requests/sec:  45746.48
> Requests/sec:  46224.42
> *4 CPUs*
> Requests/sec:  48128.33
> Requests/sec:  45467.53
> Requests/sec:  45776.45
>
> *Rust*
>
> *1 CPU*
> Requests/sec:  43455.34
> Requests/sec:  43927.73
> Requests/sec:  41100.07
>
> *2 CPUs*
> Requests/sec:  49120.31
> Requests/sec:  49298.28
> Requests/sec:  48076.98
> *4 CPUs*
> Requests/sec:  51477.57
> Requests/sec:  51587.92
> Requests/sec:  49118.68
>
> OSv on firecracker
> *Golang*
>
> *1 cpu*
> Requests/sec:  16721.56
> Requests/sec:  16422.33
> Requests/sec:  16540.24
>
> *2 cpus*
> Requests/sec:  28538.35
> Requests/sec:  26676.68
> Requests/sec:  28100.00
>
> *4 cpus*
> Requests/sec:  36448.57
> Requests/sec:  33808.45
> Requests/sec:  34383.20
>
>
> *Java*
> *1 cpu*
> Requests/sec:  20191.95
> Requests/sec:  21384.60
> Requests/sec:  21705.82
>
> *2 cpus*
> Requests/sec:  40876.17
> Requests/sec:  40625.69
> Requests/sec:  43766.45
> 4 cpus
> Requests/sec:  46336.07
> Requests/sec:  45933.35
> Requests/sec:  45467.22
>
>
> *Rust*
> *1 cpu*
> Requests/sec:  23604.27
> Requests/sec:  23379.86
> Requests/sec:  23477.19
>
> *2 cpus*
> Requests/sec:  46973.84
> Requests/sec:  46590.41
> Requests/sec:  46128.15
>
> *4 cpus*
> Requests/sec:  49491.98
> Requests/sec:  50255.20
> Requests/sec:  50183.11
>
> Linux on firecracker
> *Golang*
>
> *1 CPU*
> Requests/sec:  14498.02
> Requests/sec:  14373.21
> Requests/sec:  14213.61
>
> *2 CPU*
> Requests/sec:  28201.27
> Requests/sec:  28600.92
> Requests/sec:  28558.33
>
> *4 CPU*
> Requests/sec:  48983.83
> Requests/sec:  47590.97
> Requests/sec:  45758.82
>
> *Java*
>
> *1 CPU*
> Requests/sec:  18217.58
> Requests/sec:  17709.30
> Requests/sec:  19829.01
>
> *2 CPU*
> Requests/sec:  33188.75
> Requests/sec:  33233.55
> Requests/sec:  36951.05
>
> *4 CPU*
> Requests/sec:  47718.13
> Requests/sec:  46456.51
> Requests/sec:  48408.99
>
> *Rust*
> Could not get same rust on Alpine linux that uses musl
>
> Docker
> *Golang*
>
> *1 CPU*
> Requests/sec:  24568.70
> Requests/sec:  24621.82
> Requests/sec:  24451.52
>
> *2 CPU*
> Requests/sec:  49366.54
> Requests/sec:  48510.87
> Requests/sec:  43809.97
>
> *4 CPU*
> Requests/sec:  53613.09
> Requests/sec:  53033.38
> Requests/sec:  51422.59
>
> *Java*
>
> *1 CPU*
> Requests/sec:  40078.52
> Requests/sec:  43850.54
> Requests/sec:  44588.22
>
> *2 CPUs*
> Requests/sec:  48792.39
> Requests/sec:  51170.05
> Requests/sec:  52033.04
>
> *4 CPUs*
> Requests/sec:  51409.24
> Requests/sec:  52756.73
> Requests/sec:  47126.19
>
> *Rust*
>
> *1 CPU*Requests/sec:  40220.04
> Requests/sec:  44601.38
> Requests/sec:  44419.06
>
> *2 CPUs*
> Requests/sec:  53420.56
> Requests/sec:  53490.33
> Requests/sec:  53320.99
>
> *4 CPUs*
> Requests/sec:  53892.23
> Requests/sec:  52814.93
> Requests/sec:  54050.13
>
> Full example (Rust 4 CPUs -
> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk
> ):
> [{"name":"Write
> presentation","completed":false,"due":"2019-03-23T15:30:40.579556117+00:00"},{"name":"Host
> meetup","completed":false,"due":"2019-03-23T15:30:40.579599959+00:00"},{"name":"Run
> tests","completed":false,"due":"2019-03-23T15:30:40.579600610+00:00"},{"name":"Stand
> in
> traffic","completed":false,"due":"2019-03-23T15:30:40.579601081+00:00"},{"name":"Learn
> Rust","completed":false,"due":"2019-03-23T15:30:40.579601548+00:00"}]-----------------------------------
> Running 30s test @ http://192.168.1.73:8080/todos
>   10 threads and 100 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.86ms    1.20ms  30.81ms   62.92%
>     Req/Sec     5.42k   175.14     5.67k    87.71%
>   1622198 requests in 30.10s, 841.55MB read
> Requests/sec:  53892.23
> Transfer/sec:     27.96MB
> -----------------------------------
> Running 30s test @ http://192.168.1.73:8080/todos
>   10 threads and 100 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.90ms    1.19ms   8.98ms   58.18%
>     Req/Sec     5.31k   324.18     5.66k    90.10%
>   1589778 requests in 30.10s, 824.73MB read
> Requests/sec:  52814.93
> Transfer/sec:     27.40MB
> -----------------------------------
> Running 30s test @ http://192.168.1.73:8080/todos
>   10 threads and 100 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.85ms    1.14ms   8.39ms   54.70%
>     Req/Sec     5.44k   204.22     7.38k    92.12%
>   1626902 requests in 30.10s, 843.99MB read
> Requests/sec:  54050.13
> Transfer/sec:     28.04MB
>
> I am also enclosing an example of iperf run between client and server
> machine to illustrate type of raw network bandwidth (BTW I test against
> iperf running on host natively and on OSv on qemu and firecracker I got
> pretty much identical results ~ 940 MBits/sec - see
> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote
> ).
>
> Connecting to host 192.168.1.102, port 5201
> [  5] local 192.168.1.98 port 65179 connected to 192.168.1.102 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   111 MBytes   930 Mbits/sec
> [  5]   1.00-2.00   sec   111 MBytes   932 Mbits/sec
> [  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec
> [  5]   3.00-4.00   sec   112 MBytes   939 Mbits/sec
> [  5]   4.00-5.00   sec   112 MBytes   940 Mbits/sec
> [  5]   5.00-6.00   sec   111 MBytes   933 Mbits/sec
> [  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec
> [  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec
> [  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec
> [  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
> [  5]  10.00-11.00  sec   112 MBytes   939 Mbits/sec
> [  5]  11.00-12.00  sec   112 MBytes   941 Mbits/sec
> [  5]  12.00-13.00  sec   112 MBytes   941 Mbits/sec
> [  5]  13.00-14.00  sec   112 MBytes   942 Mbits/sec
> [  5]  14.00-15.00  sec   112 MBytes   941 Mbits/sec
> [  5]  15.00-16.00  sec   111 MBytes   927 Mbits/sec
> [  5]  16.00-17.00  sec   112 MBytes   941 Mbits/sec
> [  5]  17.00-18.00  sec   112 MBytes   942 Mbits/sec
> [  5]  18.00-19.00  sec   112 MBytes   941 Mbits/sec
> [  5]  19.00-20.00  sec   112 MBytes   941 Mbits/sec
> [  5]  20.00-21.00  sec   112 MBytes   936 Mbits/sec
> [  5]  21.00-22.00  sec   112 MBytes   940 Mbits/sec
> [  5]  22.00-23.00  sec   112 MBytes   941 Mbits/sec
> [  5]  23.00-24.00  sec   112 MBytes   941 Mbits/sec
> [  5]  24.00-25.00  sec   112 MBytes   941 Mbits/sec
> [  5]  25.00-26.00  sec   112 MBytes   941 Mbits/sec
> [  5]  26.00-27.00  sec   112 MBytes   940 Mbits/sec
> [  5]  27.00-28.00  sec   112 MBytes   941 Mbits/sec
> [  5]  28.00-29.00  sec   112 MBytes   940 Mbits/sec
> [  5]  29.00-30.00  sec   112 MBytes   941 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec
> sender
> [  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec
> receiver
>
> iperf Done.
>
>
> Observations/Conclusions
>
>    - OSv fares a little better on QEMU/KVM than firecracker and that
>    varies from ~5% to ~20% (Golang). Also please note vast difference between
>    1 cpu test results on firecracker and QEMU (hyperthreading is handled
>    differently). On QEMU there is a small bump from 1 to 2 to 4 cpus except
>    for Golang, on firecracker there is almost ~90-100% bump from 1 to 2 cpus.
>       - To that end I have opened firecracker issue -
>       https://github.com/firecracker-microvm/firecracker/issues/1034.
>       - When you compare OSv on firecracker vs Linux on firecracker
>    (comparing OSv on QEMU would be I guess unfair) you can see that:
>       - Golang app on OSv was ~ 15% better vs on Linux with 1 cpu, almost
>       identical with 2 cpus and app being faster on Linux ~30% with 4 CPUs (I 
> did
>       check that Golang runtime properly detects number of cpus)
>       - Java app on OSv was ~ 5% faster with 1 CPU, ~ 20% faster with 2
>       CPUs and slightly slower with 4 CPUs
>       - Could not run Rust app on Linux because it was alpine
>       distribution built with musl and I did not have time to get Rust build
>       properly for that scenario
>    - When you compare OSv on QEMU/KVM vs Docker you can see that:
>       - All apps running with single CPU fares almost the same with OSv
>       being sometimes a little faster
>       - Java and Rust apps performed only a little better (2-10%) on
>       Docker vs OSv
>       - Golang on OSv scaled well with number of CPUs but performed much
>       worse on OSv (20-30%) with 2 and 4 cpus
>    - There seems to be a bottleneck around 40-50K requests per seconds
>    somewhere. Looking at one result, the raw network rate reported was around
>    26-28MB per second. GIven that HTTP requests require sending request and
>    response possibly that is what is the maximum the network - combination of
>    ethernet switch and server and client machines - can handle?
>
>
> Questions
>
>    - Are there any flaws in this test setup?
>    - Why does OSv not scale in some scenarios - especially when bumping
>    from 2 to 4 cpus?? Networking bottleneck? Scheduler? Locks?
>    - Could we further optimize OSv running with single CPU (skip global
>    cross-CPU page allocator, etc)?
>
>
> To get even more insight I also compared how OSv on QEMU would fare
> against same app running in Docker with wrk running on the host and firing
> requests locally. You can find the results under
> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/host
> .
>
> OSv on QEMU
> *Golang*
>
> *1 CPU*
> Requests/sec:  25188.60
> Requests/sec:  24664.43
> Requests/sec:  23935.77
> *2 CPUs*
> Requests/sec:  37118.95
> Requests/sec:  37108.96
> Requests/sec:  35997.58
>
> *4 CPUs*
> Requests/sec:  49987.20
> Requests/sec:  48710.74
> Requests/sec:  44789.96
>
>
> *Java*
> *1 CPU*
> Requests/sec:  43648.02
> Requests/sec:  45457.98
> Requests/sec:  41818.13
>
> *2 CPUs*
> Requests/sec:  76224.39
> Requests/sec:  75734.63
> Requests/sec:  70597.35
>
> *4 CPUs*
> Requests/sec:  80543.30
> Requests/sec:  75187.46
> Requests/sec:  72986.93
>
>
> *Rust*
> *1 CPU*
> Requests/sec:  42392.75
> Requests/sec:  39679.21
> Requests/sec:  37871.49
>
> *2 CPUs*
> Requests/sec:  82484.67
> Requests/sec:  83272.65
> Requests/sec:  71671.13
>
> *4 CPUs*
> Requests/sec:  95910.23
> Requests/sec:  86811.76
> Requests/sec:  83213.93
>
>
> Docker
>
> *Golang*
> *1 CPU*
> Requests/sec:  24191.63
> Requests/sec:  23574.89
> Requests/sec:  23716.33
>
> *2 CPUs*
> Requests/sec:  34889.01
> Requests/sec:  34487.01
> Requests/sec:  34468.03
>
> *4 CPUs*
> Requests/sec:  48850.24
> Requests/sec:  48690.09
> Requests/sec:  48356.66
>
>
> *Java*
> *1 CPU*
> Requests/sec:  32267.09
> Requests/sec:  34670.41
> Requests/sec:  34828.68
>
> *2 CPUs*
> Requests/sec:  47533.94
> Requests/sec:  50734.05
> Requests/sec:  50203.98
>
> *4 CPUs*
> Requests/sec:  69644.61
> Requests/sec:  72704.40
> Requests/sec:  70805.84
>
>
> *Rust*
> *1 CPU*
> Requests/sec:  37061.52
> Requests/sec:  36637.62
> Requests/sec:  33154.57
>
> *2 CPUs*
> Requests/sec:  51743.94
> Requests/sec:  51476.78
> Requests/sec:  50934.27
>
> *4 CPUs*
> Requests/sec:  75125.41
> Requests/sec:  74051.27
> Requests/sec:  74434.78
>
>    - Does this test even make sense?
>    - As you can see OSv outperforms docker in this scenario to various
>    degree by 5-20%.  Can anybody explain why? Is it because in this case iboth
>    wrk and apps are on the same machine and number of context switches are
>    fewer between kernel and user mode in favor of OSv? Does it mean that we
>    could benefit from a setup with a load balancer (for example like haproxy
>    or squid) that would be running on the same host in user mode and
>    forwarding to single-CPU OSv instances vs single OSv with multiple CPUs?
>
> Looking forward to hear what others think.
>
> Waldek
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [osv-dev] OSv vs Docker vs Linux networking performance comparison

Reply via email to