Hello Waldek, The experiments are very interesting. I showed something similar at OSSumit'18 (see https://github.com/torokernel/papers/blob/master/OSSummit18.pdf). What I do not understand from your conclusions is why do you expect that OSv scales with the number of cores? Maybe I did not understand something.
Matias El mar., 26 mar. 2019 a las 23:29, Waldek Kozaczuk (<[email protected]>) escribió: > Last week I spent some time investigating OSv performance and comparing it > to Docker and Linux guests. To that end I adopted > "unikernels-v-containers"' repo by Tom Goethals and extended it with 2 new > apps (Rust and Node.js) and new scripts to build and deploy OSv apps on > QEMU/KVM - https://github.com/wkozaczuk/unikernels-v-containers. So as > you can see my focus was on OSv on QEMU/KVM and firecracker vs Linux on > firecracker vs Docker whereas Tom's paper was comparing OSv on Xen vs > Docker (details of discussion around it and the link to the paper you can > find here - https://groups.google.com/forum/#!topic/osv-dev/lhkqFfzbHwk). > > Specifically I wanted to compare networking performance in terms of number > of REST API requests per second processed by a typical microservice app > implemented in Rust (built using hyper), Golang and Java (built using > vertx.io) and running on following: > > - OSv on QEMU/KVM > - OSv on firecracker > - Docker container > - Linux on firecracker > > Each app in essence implements simple todo REST api returning a json > payload 100-200 characters long (for example see here Java one - > https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi/java-osv/src/main/java/rest/SimpleREST.java). > The source code of all apps is under this subtree - > https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi. > One thing to not was that each request would return always the same payload > (I wonder if that may cause the response gets cached and affects results). > > The test setup looked like this: > > *Host:* > > - MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus > reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it > - firecracker 0.15.0 > - QEMU 2.12.0 > > > *Client machine:* > > - similar to the one above with wrk as a test client firing requests > using 10 threads and 100 open connections for 30 seconds in 3 series one by > one (please see this test script - > > https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh > ). > - wrk by default uses Keep-Alive for http connections so TCP handshake > is minimal > > The host and client machine were connected directly to 1 GBit ethernet > switch and host exposed guest IP using a bridged TAP nic (please see the > script used - > https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh > ). > > You can find scripts to start applications on OSv and docker here - > https://github.com/wkozaczuk/unikernels-v-containers (run* scripts). > Please note --cpu-set parameter used in docker script to limit number of > CPUs. > > You can find detailed results under > https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote > . > > Here are just requests per seconds numbers (full example - > https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk > ) > > OSv on QEMU > *Golang* > *1 CPU* > Requests/sec: 24313.06 > Requests/sec: 23874.74 > Requests/sec: 23300.26 > > *2 CPUs* > Requests/sec: 37089.26 > Requests/sec: 35475.22 > Requests/sec: 33581.87 > > *4 CPUs* > Requests/sec: 42747.11 > Requests/sec: 43057.99 > Requests/sec: 42346.27 > > *Java* > *1 CPU* > Requests/sec: 41049.41 > Requests/sec: 43622.81 > Requests/sec: 44777.60 > *2 CPUs* > Requests/sec: 46245.95 > Requests/sec: 45746.48 > Requests/sec: 46224.42 > *4 CPUs* > Requests/sec: 48128.33 > Requests/sec: 45467.53 > Requests/sec: 45776.45 > > *Rust* > > *1 CPU* > Requests/sec: 43455.34 > Requests/sec: 43927.73 > Requests/sec: 41100.07 > > *2 CPUs* > Requests/sec: 49120.31 > Requests/sec: 49298.28 > Requests/sec: 48076.98 > *4 CPUs* > Requests/sec: 51477.57 > Requests/sec: 51587.92 > Requests/sec: 49118.68 > > OSv on firecracker > *Golang* > > *1 cpu* > Requests/sec: 16721.56 > Requests/sec: 16422.33 > Requests/sec: 16540.24 > > *2 cpus* > Requests/sec: 28538.35 > Requests/sec: 26676.68 > Requests/sec: 28100.00 > > *4 cpus* > Requests/sec: 36448.57 > Requests/sec: 33808.45 > Requests/sec: 34383.20 > > > *Java* > *1 cpu* > Requests/sec: 20191.95 > Requests/sec: 21384.60 > Requests/sec: 21705.82 > > *2 cpus* > Requests/sec: 40876.17 > Requests/sec: 40625.69 > Requests/sec: 43766.45 > 4 cpus > Requests/sec: 46336.07 > Requests/sec: 45933.35 > Requests/sec: 45467.22 > > > *Rust* > *1 cpu* > Requests/sec: 23604.27 > Requests/sec: 23379.86 > Requests/sec: 23477.19 > > *2 cpus* > Requests/sec: 46973.84 > Requests/sec: 46590.41 > Requests/sec: 46128.15 > > *4 cpus* > Requests/sec: 49491.98 > Requests/sec: 50255.20 > Requests/sec: 50183.11 > > Linux on firecracker > *Golang* > > *1 CPU* > Requests/sec: 14498.02 > Requests/sec: 14373.21 > Requests/sec: 14213.61 > > *2 CPU* > Requests/sec: 28201.27 > Requests/sec: 28600.92 > Requests/sec: 28558.33 > > *4 CPU* > Requests/sec: 48983.83 > Requests/sec: 47590.97 > Requests/sec: 45758.82 > > *Java* > > *1 CPU* > Requests/sec: 18217.58 > Requests/sec: 17709.30 > Requests/sec: 19829.01 > > *2 CPU* > Requests/sec: 33188.75 > Requests/sec: 33233.55 > Requests/sec: 36951.05 > > *4 CPU* > Requests/sec: 47718.13 > Requests/sec: 46456.51 > Requests/sec: 48408.99 > > *Rust* > Could not get same rust on Alpine linux that uses musl > > Docker > *Golang* > > *1 CPU* > Requests/sec: 24568.70 > Requests/sec: 24621.82 > Requests/sec: 24451.52 > > *2 CPU* > Requests/sec: 49366.54 > Requests/sec: 48510.87 > Requests/sec: 43809.97 > > *4 CPU* > Requests/sec: 53613.09 > Requests/sec: 53033.38 > Requests/sec: 51422.59 > > *Java* > > *1 CPU* > Requests/sec: 40078.52 > Requests/sec: 43850.54 > Requests/sec: 44588.22 > > *2 CPUs* > Requests/sec: 48792.39 > Requests/sec: 51170.05 > Requests/sec: 52033.04 > > *4 CPUs* > Requests/sec: 51409.24 > Requests/sec: 52756.73 > Requests/sec: 47126.19 > > *Rust* > > *1 CPU*Requests/sec: 40220.04 > Requests/sec: 44601.38 > Requests/sec: 44419.06 > > *2 CPUs* > Requests/sec: 53420.56 > Requests/sec: 53490.33 > Requests/sec: 53320.99 > > *4 CPUs* > Requests/sec: 53892.23 > Requests/sec: 52814.93 > Requests/sec: 54050.13 > > Full example (Rust 4 CPUs - > https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk > ): > [{"name":"Write > presentation","completed":false,"due":"2019-03-23T15:30:40.579556117+00:00"},{"name":"Host > meetup","completed":false,"due":"2019-03-23T15:30:40.579599959+00:00"},{"name":"Run > tests","completed":false,"due":"2019-03-23T15:30:40.579600610+00:00"},{"name":"Stand > in > traffic","completed":false,"due":"2019-03-23T15:30:40.579601081+00:00"},{"name":"Learn > Rust","completed":false,"due":"2019-03-23T15:30:40.579601548+00:00"}]----------------------------------- > Running 30s test @ http://192.168.1.73:8080/todos > 10 threads and 100 connections > Thread Stats Avg Stdev Max +/- Stdev > Latency 1.86ms 1.20ms 30.81ms 62.92% > Req/Sec 5.42k 175.14 5.67k 87.71% > 1622198 requests in 30.10s, 841.55MB read > Requests/sec: 53892.23 > Transfer/sec: 27.96MB > ----------------------------------- > Running 30s test @ http://192.168.1.73:8080/todos > 10 threads and 100 connections > Thread Stats Avg Stdev Max +/- Stdev > Latency 1.90ms 1.19ms 8.98ms 58.18% > Req/Sec 5.31k 324.18 5.66k 90.10% > 1589778 requests in 30.10s, 824.73MB read > Requests/sec: 52814.93 > Transfer/sec: 27.40MB > ----------------------------------- > Running 30s test @ http://192.168.1.73:8080/todos > 10 threads and 100 connections > Thread Stats Avg Stdev Max +/- Stdev > Latency 1.85ms 1.14ms 8.39ms 54.70% > Req/Sec 5.44k 204.22 7.38k 92.12% > 1626902 requests in 30.10s, 843.99MB read > Requests/sec: 54050.13 > Transfer/sec: 28.04MB > > I am also enclosing an example of iperf run between client and server > machine to illustrate type of raw network bandwidth (BTW I test against > iperf running on host natively and on OSv on qemu and firecracker I got > pretty much identical results ~ 940 MBits/sec - see > https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote > ). > > Connecting to host 192.168.1.102, port 5201 > [ 5] local 192.168.1.98 port 65179 connected to 192.168.1.102 port 5201 > [ ID] Interval Transfer Bitrate > [ 5] 0.00-1.00 sec 111 MBytes 930 Mbits/sec > [ 5] 1.00-2.00 sec 111 MBytes 932 Mbits/sec > [ 5] 2.00-3.00 sec 112 MBytes 938 Mbits/sec > [ 5] 3.00-4.00 sec 112 MBytes 939 Mbits/sec > [ 5] 4.00-5.00 sec 112 MBytes 940 Mbits/sec > [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec > [ 5] 6.00-7.00 sec 112 MBytes 940 Mbits/sec > [ 5] 7.00-8.00 sec 112 MBytes 940 Mbits/sec > [ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec > [ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec > [ 5] 10.00-11.00 sec 112 MBytes 939 Mbits/sec > [ 5] 11.00-12.00 sec 112 MBytes 941 Mbits/sec > [ 5] 12.00-13.00 sec 112 MBytes 941 Mbits/sec > [ 5] 13.00-14.00 sec 112 MBytes 942 Mbits/sec > [ 5] 14.00-15.00 sec 112 MBytes 941 Mbits/sec > [ 5] 15.00-16.00 sec 111 MBytes 927 Mbits/sec > [ 5] 16.00-17.00 sec 112 MBytes 941 Mbits/sec > [ 5] 17.00-18.00 sec 112 MBytes 942 Mbits/sec > [ 5] 18.00-19.00 sec 112 MBytes 941 Mbits/sec > [ 5] 19.00-20.00 sec 112 MBytes 941 Mbits/sec > [ 5] 20.00-21.00 sec 112 MBytes 936 Mbits/sec > [ 5] 21.00-22.00 sec 112 MBytes 940 Mbits/sec > [ 5] 22.00-23.00 sec 112 MBytes 941 Mbits/sec > [ 5] 23.00-24.00 sec 112 MBytes 941 Mbits/sec > [ 5] 24.00-25.00 sec 112 MBytes 941 Mbits/sec > [ 5] 25.00-26.00 sec 112 MBytes 941 Mbits/sec > [ 5] 26.00-27.00 sec 112 MBytes 940 Mbits/sec > [ 5] 27.00-28.00 sec 112 MBytes 941 Mbits/sec > [ 5] 28.00-29.00 sec 112 MBytes 940 Mbits/sec > [ 5] 29.00-30.00 sec 112 MBytes 941 Mbits/sec > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate > [ 5] 0.00-30.00 sec 3.28 GBytes 939 Mbits/sec > sender > [ 5] 0.00-30.00 sec 3.28 GBytes 939 Mbits/sec > receiver > > iperf Done. > > > Observations/Conclusions > > - OSv fares a little better on QEMU/KVM than firecracker and that > varies from ~5% to ~20% (Golang). Also please note vast difference between > 1 cpu test results on firecracker and QEMU (hyperthreading is handled > differently). On QEMU there is a small bump from 1 to 2 to 4 cpus except > for Golang, on firecracker there is almost ~90-100% bump from 1 to 2 cpus. > - To that end I have opened firecracker issue - > https://github.com/firecracker-microvm/firecracker/issues/1034. > - When you compare OSv on firecracker vs Linux on firecracker > (comparing OSv on QEMU would be I guess unfair) you can see that: > - Golang app on OSv was ~ 15% better vs on Linux with 1 cpu, almost > identical with 2 cpus and app being faster on Linux ~30% with 4 CPUs (I > did > check that Golang runtime properly detects number of cpus) > - Java app on OSv was ~ 5% faster with 1 CPU, ~ 20% faster with 2 > CPUs and slightly slower with 4 CPUs > - Could not run Rust app on Linux because it was alpine > distribution built with musl and I did not have time to get Rust build > properly for that scenario > - When you compare OSv on QEMU/KVM vs Docker you can see that: > - All apps running with single CPU fares almost the same with OSv > being sometimes a little faster > - Java and Rust apps performed only a little better (2-10%) on > Docker vs OSv > - Golang on OSv scaled well with number of CPUs but performed much > worse on OSv (20-30%) with 2 and 4 cpus > - There seems to be a bottleneck around 40-50K requests per seconds > somewhere. Looking at one result, the raw network rate reported was around > 26-28MB per second. GIven that HTTP requests require sending request and > response possibly that is what is the maximum the network - combination of > ethernet switch and server and client machines - can handle? > > > Questions > > - Are there any flaws in this test setup? > - Why does OSv not scale in some scenarios - especially when bumping > from 2 to 4 cpus?? Networking bottleneck? Scheduler? Locks? > - Could we further optimize OSv running with single CPU (skip global > cross-CPU page allocator, etc)? > > > To get even more insight I also compared how OSv on QEMU would fare > against same app running in Docker with wrk running on the host and firing > requests locally. You can find the results under > https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/host > . > > OSv on QEMU > *Golang* > > *1 CPU* > Requests/sec: 25188.60 > Requests/sec: 24664.43 > Requests/sec: 23935.77 > *2 CPUs* > Requests/sec: 37118.95 > Requests/sec: 37108.96 > Requests/sec: 35997.58 > > *4 CPUs* > Requests/sec: 49987.20 > Requests/sec: 48710.74 > Requests/sec: 44789.96 > > > *Java* > *1 CPU* > Requests/sec: 43648.02 > Requests/sec: 45457.98 > Requests/sec: 41818.13 > > *2 CPUs* > Requests/sec: 76224.39 > Requests/sec: 75734.63 > Requests/sec: 70597.35 > > *4 CPUs* > Requests/sec: 80543.30 > Requests/sec: 75187.46 > Requests/sec: 72986.93 > > > *Rust* > *1 CPU* > Requests/sec: 42392.75 > Requests/sec: 39679.21 > Requests/sec: 37871.49 > > *2 CPUs* > Requests/sec: 82484.67 > Requests/sec: 83272.65 > Requests/sec: 71671.13 > > *4 CPUs* > Requests/sec: 95910.23 > Requests/sec: 86811.76 > Requests/sec: 83213.93 > > > Docker > > *Golang* > *1 CPU* > Requests/sec: 24191.63 > Requests/sec: 23574.89 > Requests/sec: 23716.33 > > *2 CPUs* > Requests/sec: 34889.01 > Requests/sec: 34487.01 > Requests/sec: 34468.03 > > *4 CPUs* > Requests/sec: 48850.24 > Requests/sec: 48690.09 > Requests/sec: 48356.66 > > > *Java* > *1 CPU* > Requests/sec: 32267.09 > Requests/sec: 34670.41 > Requests/sec: 34828.68 > > *2 CPUs* > Requests/sec: 47533.94 > Requests/sec: 50734.05 > Requests/sec: 50203.98 > > *4 CPUs* > Requests/sec: 69644.61 > Requests/sec: 72704.40 > Requests/sec: 70805.84 > > > *Rust* > *1 CPU* > Requests/sec: 37061.52 > Requests/sec: 36637.62 > Requests/sec: 33154.57 > > *2 CPUs* > Requests/sec: 51743.94 > Requests/sec: 51476.78 > Requests/sec: 50934.27 > > *4 CPUs* > Requests/sec: 75125.41 > Requests/sec: 74051.27 > Requests/sec: 74434.78 > > - Does this test even make sense? > - As you can see OSv outperforms docker in this scenario to various > degree by 5-20%. Can anybody explain why? Is it because in this case iboth > wrk and apps are on the same machine and number of context switches are > fewer between kernel and user mode in favor of OSv? Does it mean that we > could benefit from a setup with a load balancer (for example like haproxy > or squid) that would be running on the same host in user mode and > forwarding to single-CPU OSv instances vs single OSv with multiple CPUs? > > Looking forward to hear what others think. > > Waldek > > > > > -- > You received this message because you are subscribed to the Google Groups > "OSv Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
