On Wed, Mar 27, 2019 at 3:59 AM Matias Vara <[email protected]> wrote:

> Hello Waldek,
>
> The experiments are very interesting. I showed something similar at
> OSSumit'18 (see
> https://github.com/torokernel/papers/blob/master/OSSummit18.pdf). What I
> do not understand from your conclusions is why do you expect that OSv
> scales with the number of cores? Maybe I did not understand something.
>

Because it's designed to scale and scale most of the time with a proper
good setup.
There are some times issues related to scheduling with spin locks that
effect scaling a lot but OSv
should handle them well, in the past we've done good amount of tests and
shared results.


>
> Matias
>
> El mar., 26 mar. 2019 a las 23:29, Waldek Kozaczuk (<[email protected]>)
> escribió:
>
>> Last week I spent some time investigating OSv performance and comparing
>> it to Docker and Linux guests. To that end I adopted
>> "unikernels-v-containers"' repo by Tom Goethals and extended it with 2 new
>> apps (Rust and Node.js) and new scripts to build and deploy OSv apps on
>> QEMU/KVM - https://github.com/wkozaczuk/unikernels-v-containers. So as
>> you can see my focus was on OSv on QEMU/KVM and firecracker vs Linux on
>> firecracker vs Docker whereas Tom's paper was comparing OSv on Xen vs
>> Docker (details of discussion around it and the link to the paper you can
>> find here - https://groups.google.com/forum/#!topic/osv-dev/lhkqFfzbHwk).
>>
>> Specifically I wanted to compare networking performance in terms of
>> number of REST API requests per second processed by a typical microservice
>> app implemented in Rust (built using hyper), Golang and Java (built using
>> vertx.io) and running on following:
>>
>>    - OSv on QEMU/KVM
>>    - OSv on firecracker
>>    - Docker container
>>    - Linux on firecracker
>>
>> Each app in essence implements simple todo REST api returning a json
>> payload 100-200 characters long (for example see here Java one -
>> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi/java-osv/src/main/java/rest/SimpleREST.java).
>> The source code of all apps is under this subtree -
>> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi.
>> One thing to not was that each request would return always the same payload
>> (I wonder if that may cause the response gets cached and affects results).
>>
>> The test setup looked like this:
>>
>> *Host:*
>>
>>    - MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus
>>    reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it
>>    - firecracker 0.15.0
>>    - QEMU 2.12.0
>>
>>
>> *Client machine:*
>>
>>    - similar to the one above with wrk as a test client firing requests
>>    using 10 threads and 100 open connections for 30 seconds in 3 series one 
>> by
>>    one (please see this test script -
>>    
>> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh
>>    ).
>>    - wrk by default uses Keep-Alive for http connections so TCP
>>    handshake is minimal
>>
>> The host and client machine were connected directly to 1 GBit ethernet
>> switch and host exposed guest IP using a bridged TAP nic (please see the
>> script used -
>> https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh
>> ).
>>
>> You can find scripts to start applications on OSv and docker here -
>> https://github.com/wkozaczuk/unikernels-v-containers (run* scripts).
>> Please note --cpu-set parameter used in docker script to limit number of
>> CPUs.
>>
>> You can find detailed results under
>> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote
>> .
>>
>> Here are just requests per seconds numbers (full example -
>> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk
>> )
>>
>> OSv on QEMU
>> *Golang*
>> *1 CPU*
>> Requests/sec:  24313.06
>> Requests/sec:  23874.74
>> Requests/sec:  23300.26
>>
>> *2 CPUs*
>> Requests/sec:  37089.26
>> Requests/sec:  35475.22
>> Requests/sec:  33581.87
>>
>> *4 CPUs*
>> Requests/sec:  42747.11
>> Requests/sec:  43057.99
>> Requests/sec:  42346.27
>>
>> *Java*
>> *1 CPU*
>> Requests/sec:  41049.41
>> Requests/sec:  43622.81
>> Requests/sec:  44777.60
>> *2 CPUs*
>> Requests/sec:  46245.95
>> Requests/sec:  45746.48
>> Requests/sec:  46224.42
>> *4 CPUs*
>> Requests/sec:  48128.33
>> Requests/sec:  45467.53
>> Requests/sec:  45776.45
>>
>> *Rust*
>>
>> *1 CPU*
>> Requests/sec:  43455.34
>> Requests/sec:  43927.73
>> Requests/sec:  41100.07
>>
>> *2 CPUs*
>> Requests/sec:  49120.31
>> Requests/sec:  49298.28
>> Requests/sec:  48076.98
>> *4 CPUs*
>> Requests/sec:  51477.57
>> Requests/sec:  51587.92
>> Requests/sec:  49118.68
>>
>> OSv on firecracker
>> *Golang*
>>
>> *1 cpu*
>> Requests/sec:  16721.56
>> Requests/sec:  16422.33
>> Requests/sec:  16540.24
>>
>> *2 cpus*
>> Requests/sec:  28538.35
>> Requests/sec:  26676.68
>> Requests/sec:  28100.00
>>
>> *4 cpus*
>> Requests/sec:  36448.57
>> Requests/sec:  33808.45
>> Requests/sec:  34383.20
>>
>>
>> *Java*
>> *1 cpu*
>> Requests/sec:  20191.95
>> Requests/sec:  21384.60
>> Requests/sec:  21705.82
>>
>> *2 cpus*
>> Requests/sec:  40876.17
>> Requests/sec:  40625.69
>> Requests/sec:  43766.45
>> 4 cpus
>> Requests/sec:  46336.07
>> Requests/sec:  45933.35
>> Requests/sec:  45467.22
>>
>>
>> *Rust*
>> *1 cpu*
>> Requests/sec:  23604.27
>> Requests/sec:  23379.86
>> Requests/sec:  23477.19
>>
>> *2 cpus*
>> Requests/sec:  46973.84
>> Requests/sec:  46590.41
>> Requests/sec:  46128.15
>>
>> *4 cpus*
>> Requests/sec:  49491.98
>> Requests/sec:  50255.20
>> Requests/sec:  50183.11
>>
>> Linux on firecracker
>> *Golang*
>>
>> *1 CPU*
>> Requests/sec:  14498.02
>> Requests/sec:  14373.21
>> Requests/sec:  14213.61
>>
>> *2 CPU*
>> Requests/sec:  28201.27
>> Requests/sec:  28600.92
>> Requests/sec:  28558.33
>>
>> *4 CPU*
>> Requests/sec:  48983.83
>> Requests/sec:  47590.97
>> Requests/sec:  45758.82
>>
>> *Java*
>>
>> *1 CPU*
>> Requests/sec:  18217.58
>> Requests/sec:  17709.30
>> Requests/sec:  19829.01
>>
>> *2 CPU*
>> Requests/sec:  33188.75
>> Requests/sec:  33233.55
>> Requests/sec:  36951.05
>>
>> *4 CPU*
>> Requests/sec:  47718.13
>> Requests/sec:  46456.51
>> Requests/sec:  48408.99
>>
>> *Rust*
>> Could not get same rust on Alpine linux that uses musl
>>
>> Docker
>> *Golang*
>>
>> *1 CPU*
>> Requests/sec:  24568.70
>> Requests/sec:  24621.82
>> Requests/sec:  24451.52
>>
>> *2 CPU*
>> Requests/sec:  49366.54
>> Requests/sec:  48510.87
>> Requests/sec:  43809.97
>>
>> *4 CPU*
>> Requests/sec:  53613.09
>> Requests/sec:  53033.38
>> Requests/sec:  51422.59
>>
>> *Java*
>>
>> *1 CPU*
>> Requests/sec:  40078.52
>> Requests/sec:  43850.54
>> Requests/sec:  44588.22
>>
>> *2 CPUs*
>> Requests/sec:  48792.39
>> Requests/sec:  51170.05
>> Requests/sec:  52033.04
>>
>> *4 CPUs*
>> Requests/sec:  51409.24
>> Requests/sec:  52756.73
>> Requests/sec:  47126.19
>>
>> *Rust*
>>
>> *1 CPU*Requests/sec:  40220.04
>> Requests/sec:  44601.38
>> Requests/sec:  44419.06
>>
>> *2 CPUs*
>> Requests/sec:  53420.56
>> Requests/sec:  53490.33
>> Requests/sec:  53320.99
>>
>> *4 CPUs*
>> Requests/sec:  53892.23
>> Requests/sec:  52814.93
>> Requests/sec:  54050.13
>>
>> Full example (Rust 4 CPUs -
>> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk
>> ):
>> [{"name":"Write
>> presentation","completed":false,"due":"2019-03-23T15:30:40.579556117+00:00"},{"name":"Host
>> meetup","completed":false,"due":"2019-03-23T15:30:40.579599959+00:00"},{"name":"Run
>> tests","completed":false,"due":"2019-03-23T15:30:40.579600610+00:00"},{"name":"Stand
>> in
>> traffic","completed":false,"due":"2019-03-23T15:30:40.579601081+00:00"},{"name":"Learn
>> Rust","completed":false,"due":"2019-03-23T15:30:40.579601548+00:00"}]-----------------------------------
>> Running 30s test @ http://192.168.1.73:8080/todos
>>   10 threads and 100 connections
>>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>>     Latency     1.86ms    1.20ms  30.81ms   62.92%
>>     Req/Sec     5.42k   175.14     5.67k    87.71%
>>   1622198 requests in 30.10s, 841.55MB read
>> Requests/sec:  53892.23
>> Transfer/sec:     27.96MB
>> -----------------------------------
>> Running 30s test @ http://192.168.1.73:8080/todos
>>   10 threads and 100 connections
>>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>>     Latency     1.90ms    1.19ms   8.98ms   58.18%
>>     Req/Sec     5.31k   324.18     5.66k    90.10%
>>   1589778 requests in 30.10s, 824.73MB read
>> Requests/sec:  52814.93
>> Transfer/sec:     27.40MB
>> -----------------------------------
>> Running 30s test @ http://192.168.1.73:8080/todos
>>   10 threads and 100 connections
>>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>>     Latency     1.85ms    1.14ms   8.39ms   54.70%
>>     Req/Sec     5.44k   204.22     7.38k    92.12%
>>   1626902 requests in 30.10s, 843.99MB read
>> Requests/sec:  54050.13
>> Transfer/sec:     28.04MB
>>
>> I am also enclosing an example of iperf run between client and server
>> machine to illustrate type of raw network bandwidth (BTW I test against
>> iperf running on host natively and on OSv on qemu and firecracker I got
>> pretty much identical results ~ 940 MBits/sec - see
>> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote
>> ).
>>
>> Connecting to host 192.168.1.102, port 5201
>> [  5] local 192.168.1.98 port 65179 connected to 192.168.1.102 port 5201
>> [ ID] Interval           Transfer     Bitrate
>> [  5]   0.00-1.00   sec   111 MBytes   930 Mbits/sec
>> [  5]   1.00-2.00   sec   111 MBytes   932 Mbits/sec
>> [  5]   2.00-3.00   sec   112 MBytes   938 Mbits/sec
>> [  5]   3.00-4.00   sec   112 MBytes   939 Mbits/sec
>> [  5]   4.00-5.00   sec   112 MBytes   940 Mbits/sec
>> [  5]   5.00-6.00   sec   111 MBytes   933 Mbits/sec
>> [  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec
>> [  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec
>> [  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec
>> [  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  10.00-11.00  sec   112 MBytes   939 Mbits/sec
>> [  5]  11.00-12.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  12.00-13.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  13.00-14.00  sec   112 MBytes   942 Mbits/sec
>> [  5]  14.00-15.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  15.00-16.00  sec   111 MBytes   927 Mbits/sec
>> [  5]  16.00-17.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  17.00-18.00  sec   112 MBytes   942 Mbits/sec
>> [  5]  18.00-19.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  19.00-20.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  20.00-21.00  sec   112 MBytes   936 Mbits/sec
>> [  5]  21.00-22.00  sec   112 MBytes   940 Mbits/sec
>> [  5]  22.00-23.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  23.00-24.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  24.00-25.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  25.00-26.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  26.00-27.00  sec   112 MBytes   940 Mbits/sec
>> [  5]  27.00-28.00  sec   112 MBytes   941 Mbits/sec
>> [  5]  28.00-29.00  sec   112 MBytes   940 Mbits/sec
>> [  5]  29.00-30.00  sec   112 MBytes   941 Mbits/sec
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [ ID] Interval           Transfer     Bitrate
>> [  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec
>> sender
>> [  5]   0.00-30.00  sec  3.28 GBytes   939 Mbits/sec
>> receiver
>>
>> iperf Done.
>>
>>
>> Observations/Conclusions
>>
>>    - OSv fares a little better on QEMU/KVM than firecracker and that
>>    varies from ~5% to ~20% (Golang). Also please note vast difference between
>>    1 cpu test results on firecracker and QEMU (hyperthreading is handled
>>    differently). On QEMU there is a small bump from 1 to 2 to 4 cpus except
>>    for Golang, on firecracker there is almost ~90-100% bump from 1 to 2 cpus.
>>       - To that end I have opened firecracker issue -
>>       https://github.com/firecracker-microvm/firecracker/issues/1034.
>>       - When you compare OSv on firecracker vs Linux on firecracker
>>    (comparing OSv on QEMU would be I guess unfair) you can see that:
>>       - Golang app on OSv was ~ 15% better vs on Linux with 1 cpu,
>>       almost identical with 2 cpus and app being faster on Linux ~30% with 4 
>> CPUs
>>       (I did check that Golang runtime properly detects number of cpus)
>>       - Java app on OSv was ~ 5% faster with 1 CPU, ~ 20% faster with 2
>>       CPUs and slightly slower with 4 CPUs
>>       - Could not run Rust app on Linux because it was alpine
>>       distribution built with musl and I did not have time to get Rust build
>>       properly for that scenario
>>    - When you compare OSv on QEMU/KVM vs Docker you can see that:
>>       - All apps running with single CPU fares almost the same with OSv
>>       being sometimes a little faster
>>       - Java and Rust apps performed only a little better (2-10%) on
>>       Docker vs OSv
>>       - Golang on OSv scaled well with number of CPUs but performed much
>>       worse on OSv (20-30%) with 2 and 4 cpus
>>    - There seems to be a bottleneck around 40-50K requests per seconds
>>    somewhere. Looking at one result, the raw network rate reported was around
>>    26-28MB per second. GIven that HTTP requests require sending request and
>>    response possibly that is what is the maximum the network - combination of
>>    ethernet switch and server and client machines - can handle?
>>
>>
>> Questions
>>
>>    - Are there any flaws in this test setup?
>>    - Why does OSv not scale in some scenarios - especially when bumping
>>    from 2 to 4 cpus?? Networking bottleneck? Scheduler? Locks?
>>    - Could we further optimize OSv running with single CPU (skip global
>>    cross-CPU page allocator, etc)?
>>
>>
>> To get even more insight I also compared how OSv on QEMU would fare
>> against same app running in Docker with wrk running on the host and firing
>> requests locally. You can find the results under
>> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/host
>> .
>>
>> OSv on QEMU
>> *Golang*
>>
>> *1 CPU*
>> Requests/sec:  25188.60
>> Requests/sec:  24664.43
>> Requests/sec:  23935.77
>> *2 CPUs*
>> Requests/sec:  37118.95
>> Requests/sec:  37108.96
>> Requests/sec:  35997.58
>>
>> *4 CPUs*
>> Requests/sec:  49987.20
>> Requests/sec:  48710.74
>> Requests/sec:  44789.96
>>
>>
>> *Java*
>> *1 CPU*
>> Requests/sec:  43648.02
>> Requests/sec:  45457.98
>> Requests/sec:  41818.13
>>
>> *2 CPUs*
>> Requests/sec:  76224.39
>> Requests/sec:  75734.63
>> Requests/sec:  70597.35
>>
>> *4 CPUs*
>> Requests/sec:  80543.30
>> Requests/sec:  75187.46
>> Requests/sec:  72986.93
>>
>>
>> *Rust*
>> *1 CPU*
>> Requests/sec:  42392.75
>> Requests/sec:  39679.21
>> Requests/sec:  37871.49
>>
>> *2 CPUs*
>> Requests/sec:  82484.67
>> Requests/sec:  83272.65
>> Requests/sec:  71671.13
>>
>> *4 CPUs*
>> Requests/sec:  95910.23
>> Requests/sec:  86811.76
>> Requests/sec:  83213.93
>>
>>
>> Docker
>>
>> *Golang*
>> *1 CPU*
>> Requests/sec:  24191.63
>> Requests/sec:  23574.89
>> Requests/sec:  23716.33
>>
>> *2 CPUs*
>> Requests/sec:  34889.01
>> Requests/sec:  34487.01
>> Requests/sec:  34468.03
>>
>> *4 CPUs*
>> Requests/sec:  48850.24
>> Requests/sec:  48690.09
>> Requests/sec:  48356.66
>>
>>
>> *Java*
>> *1 CPU*
>> Requests/sec:  32267.09
>> Requests/sec:  34670.41
>> Requests/sec:  34828.68
>>
>> *2 CPUs*
>> Requests/sec:  47533.94
>> Requests/sec:  50734.05
>> Requests/sec:  50203.98
>>
>> *4 CPUs*
>> Requests/sec:  69644.61
>> Requests/sec:  72704.40
>> Requests/sec:  70805.84
>>
>>
>> *Rust*
>> *1 CPU*
>> Requests/sec:  37061.52
>> Requests/sec:  36637.62
>> Requests/sec:  33154.57
>>
>> *2 CPUs*
>> Requests/sec:  51743.94
>> Requests/sec:  51476.78
>> Requests/sec:  50934.27
>>
>> *4 CPUs*
>> Requests/sec:  75125.41
>> Requests/sec:  74051.27
>> Requests/sec:  74434.78
>>
>>    - Does this test even make sense?
>>    - As you can see OSv outperforms docker in this scenario to various
>>    degree by 5-20%.  Can anybody explain why? Is it because in this case 
>> iboth
>>    wrk and apps are on the same machine and number of context switches are
>>    fewer between kernel and user mode in favor of OSv? Does it mean that we
>>    could benefit from a setup with a load balancer (for example like haproxy
>>    or squid) that would be running on the same host in user mode and
>>    forwarding to single-CPU OSv instances vs single OSv with multiple CPUs?
>>
>> Looking forward to hear what others think.
>>
>> Waldek
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to