Last week I spent some time investigating OSv performance and comparing it 
to Docker and Linux guests. To that end I adopted 
"unikernels-v-containers"' repo by Tom Goethals and extended it with 2 new 
apps (Rust and Node.js) and new scripts to build and deploy OSv apps on 
QEMU/KVM - https://github.com/wkozaczuk/unikernels-v-containers. So as you 
can see my focus was on OSv on QEMU/KVM and firecracker vs Linux on 
firecracker vs Docker whereas Tom's paper was comparing OSv on Xen vs 
Docker (details of discussion around it and the link to the paper you can 
find here - https://groups.google.com/forum/#!topic/osv-dev/lhkqFfzbHwk).

Specifically I wanted to compare networking performance in terms of number 
of REST API requests per second processed by a typical microservice app 
implemented in Rust (built using hyper), Golang and Java (built using 
vertx.io) and running on following:

   - OSv on QEMU/KVM
   - OSv on firecracker
   - Docker container
   - Linux on firecracker

Each app in essence implements simple todo REST api returning a json 
payload 100-200 characters long (for example see here Java one - 
https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi/java-osv/src/main/java/rest/SimpleREST.java).
 
The source code of all apps is under this subtree - 
https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi. 
One thing to not was that each request would return always the same payload 
(I wonder if that may cause the response gets cached and affects results).

The test setup looked like this:

*Host:*

   - MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus 
   reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it 
   - firecracker 0.15.0
   - QEMU 2.12.0


*Client machine:*

   - similar to the one above with wrk as a test client firing requests 
   using 10 threads and 100 open connections for 30 seconds in 3 series one by 
   one (please see this test script - 
   
https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh).
   - wrk by default uses Keep-Alive for http connections so TCP handshake 
   is minimal 

The host and client machine were connected directly to 1 GBit ethernet 
switch and host exposed guest IP using a bridged TAP nic (please see the 
script used - 
https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh).

You can find scripts to start applications on OSv and docker here 
- https://github.com/wkozaczuk/unikernels-v-containers (run* scripts). 
Please note --cpu-set parameter used in docker script to limit number of 
CPUs.

You can find detailed results 
under 
https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote.

Here are just requests per seconds numbers (full example 
-  
https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk)

OSv on QEMU
*Golang*
*1 CPU*
Requests/sec:  24313.06
Requests/sec:  23874.74
Requests/sec:  23300.26

*2 CPUs*
Requests/sec:  37089.26
Requests/sec:  35475.22
Requests/sec:  33581.87

*4 CPUs*
Requests/sec:  42747.11
Requests/sec:  43057.99
Requests/sec:  42346.27

*Java*
*1 CPU*
Requests/sec:  41049.41
Requests/sec:  43622.81
Requests/sec:  44777.60
*2 CPUs*
Requests/sec:  46245.95
Requests/sec:  45746.48
Requests/sec:  46224.42
*4 CPUs*
Requests/sec:  48128.33
Requests/sec:  45467.53
Requests/sec:  45776.45

*Rust*

*1 CPU*
Requests/sec:  43455.34
Requests/sec:  43927.73
Requests/sec:  41100.07

*2 CPUs*
Requests/sec:  49120.31
Requests/sec:  49298.28
Requests/sec:  48076.98
*4 CPUs*
Requests/sec:  51477.57
Requests/sec:  51587.92
Requests/sec:  49118.68

OSv on firecracker
*Golang*

*1 cpu*
Requests/sec:  16721.56
Requests/sec:  16422.33
Requests/sec:  16540.24

*2 cpus*
Requests/sec:  28538.35
Requests/sec:  26676.68
Requests/sec:  28100.00

*4 cpus*
Requests/sec:  36448.57
Requests/sec:  33808.45
Requests/sec:  34383.20


*Java*
*1 cpu*
Requests/sec:  20191.95
Requests/sec:  21384.60
Requests/sec:  21705.82

*2 cpus*
Requests/sec:  40876.17
Requests/sec:  40625.69
Requests/sec:  43766.45
4 cpus
Requests/sec:  46336.07
Requests/sec:  45933.35
Requests/sec:  45467.22


*Rust*
*1 cpu*
Requests/sec:  23604.27
Requests/sec:  23379.86
Requests/sec:  23477.19

*2 cpus*
Requests/sec:  46973.84
Requests/sec:  46590.41
Requests/sec:  46128.15

*4 cpus*
Requests/sec:  49491.98
Requests/sec:  50255.20
Requests/sec:  50183.11

Linux on firecracker
*Golang*

*1 CPU*
Requests/sec:  14498.02
Requests/sec:  14373.21
Requests/sec:  14213.61

*2 CPU*
Requests/sec:  28201.27
Requests/sec:  28600.92
Requests/sec:  28558.33

*4 CPU*
Requests/sec:  48983.83
Requests/sec:  47590.97
Requests/sec:  45758.82

*Java*

*1 CPU*
Requests/sec:  18217.58
Requests/sec:  17709.30
Requests/sec:  19829.01

*2 CPU*
Requests/sec:  33188.75
Requests/sec:  33233.55
Requests/sec:  36951.05

*4 CPU*
Requests/sec:  47718.13
Requests/sec:  46456.51
Requests/sec:  48408.99

*Rust*
Could not get same rust on Alpine linux that uses musl

Docker
*Golang*

*1 CPU*
Requests/sec:  24568.70
Requests/sec:  24621.82
Requests/sec:  24451.52

*2 CPU*
Requests/sec:  49366.54
Requests/sec:  48510.87
Requests/sec:  43809.97

*4 CPU*
Requests/sec:  53613.09
Requests/sec:  53033.38
Requests/sec:  51422.59

*Java*

*1 CPU*
Requests/sec:  40078.52
Requests/sec:  43850.54
Requests/sec:  44588.22

*2 CPUs*
Requests/sec:  48792.39
Requests/sec:  51170.05
Requests/sec:  52033.04

*4 CPUs*
Requests/sec:  51409.24
Requests/sec:  52756.73
Requests/sec:  47126.19

*Rust*

*1 CPU*Requests/sec:  40220.04
Requests/sec:  44601.38
Requests/sec:  44419.06

*2 CPUs*
Requests/sec:  53420.56
Requests/sec:  53490.33
Requests/sec:  53320.99

*4 CPUs*
Requests/sec:  53892.23
Requests/sec:  52814.93
Requests/sec:  54050.13

Full example (Rust 4 CPUs - 
https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk):
[{"name":"Write 
presentation","completed":false,"due":"2019-03-23T15:30:40.579556117+00:00"},{"name":"Host
 
meetup","completed":false,"due":"2019-03-23T15:30:40.579599959+00:00"},{"name":"Run
 
tests","completed":false,"due":"2019-03-23T15:30:40.579600610+00:00"},{"name":"Stand
 
in 
traffic","completed":false,"due":"2019-03-23T15:30:40.579601081+00:00"},{"name":"Learn
 
Rust","completed":false,"due":"2019-03-23T15:30:40.579601548+00:00"}]-----------------------------------
Running 30s test @ http://192.168.1.73:8080/todos
  10 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.86ms    1.20ms  30.81ms   62.92%
    Req/Sec     5.42k   175.14     5.67k    87.71%
  1622198 requests in 30.10s, 841.55MB read
Requests/sec:  53892.23
Transfer/sec:     27.96MB
-----------------------------------
Running 30s test @ http://192.168.1.73:8080/todos
  10 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.90ms    1.19ms   8.98ms   58.18%
    Req/Sec     5.31k   324.18     5.66k    90.10%
  1589778 requests in 30.10s, 824.73MB read
Requests/sec:  52814.93
Transfer/sec:     27.40MB
-----------------------------------
Running 30s test @ http://192.168.1.73:8080/todos
  10 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.85ms    1.14ms   8.39ms   54.70%
    Req/Sec     5.44k   204.22     7.38k    92.12%
  1626902 requests in 30.10s, 843.99MB read
Requests/sec:  54050.13
Transfer/sec:     28.04MB

I am also enclosing an example of iperf run between client and server 
machine to illustrate type of raw network bandwidth (BTW I test against 
iperf running on host natively and on OSv on qemu and firecracker I got 
pretty much identical results ~ 940 MBits/sec - 
see 
https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote).
 

Connecting to host 192.168.1.102, port 5201
[  5] local 192.168.1.98 port 65179 connected to 192.168.1.102 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   111 MBytes   930 Mbits/sec
[  5]   1.00-2.00   sec   111 MBytes   932 Mbits/sec
[  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec
[  5]   3.00-4.00   sec   112 MBytes   939 Mbits/sec
[  5]   4.00-5.00   sec   112 MBytes   940 Mbits/sec
[  5]   5.00-6.00   sec   111 MBytes   933 Mbits/sec
[  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec
[  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec
[  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
[  5]  10.00-11.00  sec   112 MBytes   939 Mbits/sec
[  5]  11.00-12.00  sec   112 MBytes   941 Mbits/sec
[  5]  12.00-13.00  sec   112 MBytes   941 Mbits/sec
[  5]  13.00-14.00  sec   112 MBytes   942 Mbits/sec
[  5]  14.00-15.00  sec   112 MBytes   941 Mbits/sec
[  5]  15.00-16.00  sec   111 MBytes   927 Mbits/sec
[  5]  16.00-17.00  sec   112 MBytes   941 Mbits/sec
[  5]  17.00-18.00  sec   112 MBytes   942 Mbits/sec
[  5]  18.00-19.00  sec   112 MBytes   941 Mbits/sec
[  5]  19.00-20.00  sec   112 MBytes   941 Mbits/sec
[  5]  20.00-21.00  sec   112 MBytes   936 Mbits/sec
[  5]  21.00-22.00  sec   112 MBytes   940 Mbits/sec
[  5]  22.00-23.00  sec   112 MBytes   941 Mbits/sec
[  5]  23.00-24.00  sec   112 MBytes   941 Mbits/sec
[  5]  24.00-25.00  sec   112 MBytes   941 Mbits/sec
[  5]  25.00-26.00  sec   112 MBytes   941 Mbits/sec
[  5]  26.00-27.00  sec   112 MBytes   940 Mbits/sec
[  5]  27.00-28.00  sec   112 MBytes   941 Mbits/sec
[  5]  28.00-29.00  sec   112 MBytes   940 Mbits/sec
[  5]  29.00-30.00  sec   112 MBytes   941 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec                  sender
[  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec                  
receiver

iperf Done.


Observations/Conclusions

   - OSv fares a little better on QEMU/KVM than firecracker and that varies 
   from ~5% to ~20% (Golang). Also please note vast difference between 1 cpu 
   test results on firecracker and QEMU (hyperthreading is handled 
   differently). On QEMU there is a small bump from 1 to 2 to 4 cpus except 
   for Golang, on firecracker there is almost ~90-100% bump from 1 to 2 cpus. 
      - To that end I have opened firecracker issue 
      - https://github.com/firecracker-microvm/firecracker/issues/1034.
      - When you compare OSv on firecracker vs Linux on firecracker 
   (comparing OSv on QEMU would be I guess unfair) you can see that:
      - Golang app on OSv was ~ 15% better vs on Linux with 1 cpu, almost 
      identical with 2 cpus and app being faster on Linux ~30% with 4 CPUs (I 
did 
      check that Golang runtime properly detects number of cpus)
      - Java app on OSv was ~ 5% faster with 1 CPU, ~ 20% faster with 2 
      CPUs and slightly slower with 4 CPUs
      - Could not run Rust app on Linux because it was alpine distribution 
      built with musl and I did not have time to get Rust build properly for 
that 
      scenario
   - When you compare OSv on QEMU/KVM vs Docker you can see that:
      - All apps running with single CPU fares almost the same with OSv 
      being sometimes a little faster
      - Java and Rust apps performed only a little better (2-10%) on Docker 
      vs OSv
      - Golang on OSv scaled well with number of CPUs but performed much 
      worse on OSv (20-30%) with 2 and 4 cpus
   - There seems to be a bottleneck around 40-50K requests per seconds 
   somewhere. Looking at one result, the raw network rate reported was around 
   26-28MB per second. GIven that HTTP requests require sending request and 
   response possibly that is what is the maximum the network - combination of 
   ethernet switch and server and client machines - can handle?


Questions

   - Are there any flaws in this test setup?
   - Why does OSv not scale in some scenarios - especially when bumping 
   from 2 to 4 cpus?? Networking bottleneck? Scheduler? Locks?
   - Could we further optimize OSv running with single CPU (skip global 
   cross-CPU page allocator, etc)?


To get even more insight I also compared how OSv on QEMU would fare against 
same app running in Docker with wrk running on the host and firing requests 
locally. You can find the results under  
https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/host.

OSv on QEMU
*Golang*

*1 CPU*
Requests/sec:  25188.60
Requests/sec:  24664.43
Requests/sec:  23935.77
*2 CPUs*
Requests/sec:  37118.95
Requests/sec:  37108.96
Requests/sec:  35997.58

*4 CPUs*
Requests/sec:  49987.20
Requests/sec:  48710.74
Requests/sec:  44789.96


*Java*
*1 CPU*
Requests/sec:  43648.02
Requests/sec:  45457.98
Requests/sec:  41818.13

*2 CPUs*
Requests/sec:  76224.39
Requests/sec:  75734.63
Requests/sec:  70597.35

*4 CPUs*
Requests/sec:  80543.30
Requests/sec:  75187.46
Requests/sec:  72986.93


*Rust*
*1 CPU*
Requests/sec:  42392.75
Requests/sec:  39679.21
Requests/sec:  37871.49

*2 CPUs*
Requests/sec:  82484.67
Requests/sec:  83272.65
Requests/sec:  71671.13

*4 CPUs*
Requests/sec:  95910.23
Requests/sec:  86811.76
Requests/sec:  83213.93


Docker

*Golang*
*1 CPU*
Requests/sec:  24191.63
Requests/sec:  23574.89
Requests/sec:  23716.33

*2 CPUs*
Requests/sec:  34889.01
Requests/sec:  34487.01
Requests/sec:  34468.03

*4 CPUs*
Requests/sec:  48850.24
Requests/sec:  48690.09
Requests/sec:  48356.66


*Java*
*1 CPU*
Requests/sec:  32267.09
Requests/sec:  34670.41
Requests/sec:  34828.68

*2 CPUs*
Requests/sec:  47533.94
Requests/sec:  50734.05
Requests/sec:  50203.98

*4 CPUs*
Requests/sec:  69644.61
Requests/sec:  72704.40
Requests/sec:  70805.84


*Rust*
*1 CPU*
Requests/sec:  37061.52
Requests/sec:  36637.62
Requests/sec:  33154.57

*2 CPUs*
Requests/sec:  51743.94
Requests/sec:  51476.78
Requests/sec:  50934.27

*4 CPUs*
Requests/sec:  75125.41
Requests/sec:  74051.27
Requests/sec:  74434.78

   - Does this test even make sense?
   - As you can see OSv outperforms docker in this scenario to various 
   degree by 5-20%.  Can anybody explain why? Is it because in this case iboth 
   wrk and apps are on the same machine and number of context switches are 
   fewer between kernel and user mode in favor of OSv? Does it mean that we 
   could benefit from a setup with a load balancer (for example like haproxy 
   or squid) that would be running on the same host in user mode and 
   forwarding to single-CPU OSv instances vs single OSv with multiple CPUs? 

Looking forward to hear what others think. 

Waldek




-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to