Thanks for the article. I definitely heard about Toro. I have not had time to thoroughly read all the slides but it looks like unikernel. But I did not see it advertised as such. Can it run unmodified Linux executables like JVM?
On Wednesday, March 27, 2019 at 6:59:30 AM UTC-4, Matias Vara wrote: > > Hello Waldek, > > The experiments are very interesting. I showed something similar at > OSSumit'18 (see > https://github.com/torokernel/papers/blob/master/OSSummit18.pdf). What I > do not understand from your conclusions is why do you expect that OSv > scales with the number of cores? Maybe I did not understand something. > > Matias > > El mar., 26 mar. 2019 a las 23:29, Waldek Kozaczuk (<[email protected] > <javascript:>>) escribió: > >> Last week I spent some time investigating OSv performance and comparing >> it to Docker and Linux guests. To that end I adopted >> "unikernels-v-containers"' repo by Tom Goethals and extended it with 2 new >> apps (Rust and Node.js) and new scripts to build and deploy OSv apps on >> QEMU/KVM - https://github.com/wkozaczuk/unikernels-v-containers. So as >> you can see my focus was on OSv on QEMU/KVM and firecracker vs Linux on >> firecracker vs Docker whereas Tom's paper was comparing OSv on Xen vs >> Docker (details of discussion around it and the link to the paper you can >> find here - https://groups.google.com/forum/#!topic/osv-dev/lhkqFfzbHwk). >> >> Specifically I wanted to compare networking performance in terms of >> number of REST API requests per second processed by a typical microservice >> app implemented in Rust (built using hyper), Golang and Java (built using >> vertx.io) and running on following: >> >> - OSv on QEMU/KVM >> - OSv on firecracker >> - Docker container >> - Linux on firecracker >> >> Each app in essence implements simple todo REST api returning a json >> payload 100-200 characters long (for example see here Java one - >> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi/java-osv/src/main/java/rest/SimpleREST.java). >> >> The source code of all apps is under this subtree - >> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/restapi. >> One thing to not was that each request would return always the same payload >> (I wonder if that may cause the response gets cached and affects results). >> >> The test setup looked like this: >> >> *Host:* >> >> - MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus >> reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it >> - firecracker 0.15.0 >> - QEMU 2.12.0 >> >> >> *Client machine:* >> >> - similar to the one above with wrk as a test client firing requests >> using 10 threads and 100 open connections for 30 seconds in 3 series one >> by >> one (please see this test script - >> >> https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh >> ). >> - wrk by default uses Keep-Alive for http connections so TCP >> handshake is minimal >> >> The host and client machine were connected directly to 1 GBit ethernet >> switch and host exposed guest IP using a bridged TAP nic (please see the >> script used - >> https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh >> ). >> >> You can find scripts to start applications on OSv and docker here - >> https://github.com/wkozaczuk/unikernels-v-containers (run* scripts). >> Please note --cpu-set parameter used in docker script to limit number of >> CPUs. >> >> You can find detailed results under >> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote >> . >> >> Here are just requests per seconds numbers (full example - >> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk >> ) >> >> OSv on QEMU >> *Golang* >> *1 CPU* >> Requests/sec: 24313.06 >> Requests/sec: 23874.74 >> Requests/sec: 23300.26 >> >> *2 CPUs* >> Requests/sec: 37089.26 >> Requests/sec: 35475.22 >> Requests/sec: 33581.87 >> >> *4 CPUs* >> Requests/sec: 42747.11 >> Requests/sec: 43057.99 >> Requests/sec: 42346.27 >> >> *Java* >> *1 CPU* >> Requests/sec: 41049.41 >> Requests/sec: 43622.81 >> Requests/sec: 44777.60 >> *2 CPUs* >> Requests/sec: 46245.95 >> Requests/sec: 45746.48 >> Requests/sec: 46224.42 >> *4 CPUs* >> Requests/sec: 48128.33 >> Requests/sec: 45467.53 >> Requests/sec: 45776.45 >> >> *Rust* >> >> *1 CPU* >> Requests/sec: 43455.34 >> Requests/sec: 43927.73 >> Requests/sec: 41100.07 >> >> *2 CPUs* >> Requests/sec: 49120.31 >> Requests/sec: 49298.28 >> Requests/sec: 48076.98 >> *4 CPUs* >> Requests/sec: 51477.57 >> Requests/sec: 51587.92 >> Requests/sec: 49118.68 >> >> OSv on firecracker >> *Golang* >> >> *1 cpu* >> Requests/sec: 16721.56 >> Requests/sec: 16422.33 >> Requests/sec: 16540.24 >> >> *2 cpus* >> Requests/sec: 28538.35 >> Requests/sec: 26676.68 >> Requests/sec: 28100.00 >> >> *4 cpus* >> Requests/sec: 36448.57 >> Requests/sec: 33808.45 >> Requests/sec: 34383.20 >> >> >> *Java* >> *1 cpu* >> Requests/sec: 20191.95 >> Requests/sec: 21384.60 >> Requests/sec: 21705.82 >> >> *2 cpus* >> Requests/sec: 40876.17 >> Requests/sec: 40625.69 >> Requests/sec: 43766.45 >> 4 cpus >> Requests/sec: 46336.07 >> Requests/sec: 45933.35 >> Requests/sec: 45467.22 >> >> >> *Rust* >> *1 cpu* >> Requests/sec: 23604.27 >> Requests/sec: 23379.86 >> Requests/sec: 23477.19 >> >> *2 cpus* >> Requests/sec: 46973.84 >> Requests/sec: 46590.41 >> Requests/sec: 46128.15 >> >> *4 cpus* >> Requests/sec: 49491.98 >> Requests/sec: 50255.20 >> Requests/sec: 50183.11 >> >> Linux on firecracker >> *Golang* >> >> *1 CPU* >> Requests/sec: 14498.02 >> Requests/sec: 14373.21 >> Requests/sec: 14213.61 >> >> *2 CPU* >> Requests/sec: 28201.27 >> Requests/sec: 28600.92 >> Requests/sec: 28558.33 >> >> *4 CPU* >> Requests/sec: 48983.83 >> Requests/sec: 47590.97 >> Requests/sec: 45758.82 >> >> *Java* >> >> *1 CPU* >> Requests/sec: 18217.58 >> Requests/sec: 17709.30 >> Requests/sec: 19829.01 >> >> *2 CPU* >> Requests/sec: 33188.75 >> Requests/sec: 33233.55 >> Requests/sec: 36951.05 >> >> *4 CPU* >> Requests/sec: 47718.13 >> Requests/sec: 46456.51 >> Requests/sec: 48408.99 >> >> *Rust* >> Could not get same rust on Alpine linux that uses musl >> >> Docker >> *Golang* >> >> *1 CPU* >> Requests/sec: 24568.70 >> Requests/sec: 24621.82 >> Requests/sec: 24451.52 >> >> *2 CPU* >> Requests/sec: 49366.54 >> Requests/sec: 48510.87 >> Requests/sec: 43809.97 >> >> *4 CPU* >> Requests/sec: 53613.09 >> Requests/sec: 53033.38 >> Requests/sec: 51422.59 >> >> *Java* >> >> *1 CPU* >> Requests/sec: 40078.52 >> Requests/sec: 43850.54 >> Requests/sec: 44588.22 >> >> *2 CPUs* >> Requests/sec: 48792.39 >> Requests/sec: 51170.05 >> Requests/sec: 52033.04 >> >> *4 CPUs* >> Requests/sec: 51409.24 >> Requests/sec: 52756.73 >> Requests/sec: 47126.19 >> >> *Rust* >> >> *1 CPU*Requests/sec: 40220.04 >> Requests/sec: 44601.38 >> Requests/sec: 44419.06 >> >> *2 CPUs* >> Requests/sec: 53420.56 >> Requests/sec: 53490.33 >> Requests/sec: 53320.99 >> >> *4 CPUs* >> Requests/sec: 53892.23 >> Requests/sec: 52814.93 >> Requests/sec: 54050.13 >> >> Full example (Rust 4 CPUs - >> https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/test_results/remote/docker/rust_docker_4_cpu.wrk >> ): >> [{"name":"Write >> presentation","completed":false,"due":"2019-03-23T15:30:40.579556117+00:00"},{"name":"Host >> >> meetup","completed":false,"due":"2019-03-23T15:30:40.579599959+00:00"},{"name":"Run >> >> tests","completed":false,"due":"2019-03-23T15:30:40.579600610+00:00"},{"name":"Stand >> >> in >> traffic","completed":false,"due":"2019-03-23T15:30:40.579601081+00:00"},{"name":"Learn >> >> Rust","completed":false,"due":"2019-03-23T15:30:40.579601548+00:00"}]----------------------------------- >> Running 30s test @ http://192.168.1.73:8080/todos >> 10 threads and 100 connections >> Thread Stats Avg Stdev Max +/- Stdev >> Latency 1.86ms 1.20ms 30.81ms 62.92% >> Req/Sec 5.42k 175.14 5.67k 87.71% >> 1622198 requests in 30.10s, 841.55MB read >> Requests/sec: 53892.23 >> Transfer/sec: 27.96MB >> ----------------------------------- >> Running 30s test @ http://192.168.1.73:8080/todos >> 10 threads and 100 connections >> Thread Stats Avg Stdev Max +/- Stdev >> Latency 1.90ms 1.19ms 8.98ms 58.18% >> Req/Sec 5.31k 324.18 5.66k 90.10% >> 1589778 requests in 30.10s, 824.73MB read >> Requests/sec: 52814.93 >> Transfer/sec: 27.40MB >> ----------------------------------- >> Running 30s test @ http://192.168.1.73:8080/todos >> 10 threads and 100 connections >> Thread Stats Avg Stdev Max +/- Stdev >> Latency 1.85ms 1.14ms 8.39ms 54.70% >> Req/Sec 5.44k 204.22 7.38k 92.12% >> 1626902 requests in 30.10s, 843.99MB read >> Requests/sec: 54050.13 >> Transfer/sec: 28.04MB >> >> I am also enclosing an example of iperf run between client and server >> machine to illustrate type of raw network bandwidth (BTW I test against >> iperf running on host natively and on OSv on qemu and firecracker I got >> pretty much identical results ~ 940 MBits/sec - see >> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote >> ). >> >> Connecting to host 192.168.1.102, port 5201 >> [ 5] local 192.168.1.98 port 65179 connected to 192.168.1.102 port 5201 >> [ ID] Interval Transfer Bitrate >> [ 5] 0.00-1.00 sec 111 MBytes 930 Mbits/sec >> [ 5] 1.00-2.00 sec 111 MBytes 932 Mbits/sec >> [ 5] 2.00-3.00 sec 112 MBytes 938 Mbits/sec >> [ 5] 3.00-4.00 sec 112 MBytes 939 Mbits/sec >> [ 5] 4.00-5.00 sec 112 MBytes 940 Mbits/sec >> [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec >> [ 5] 6.00-7.00 sec 112 MBytes 940 Mbits/sec >> [ 5] 7.00-8.00 sec 112 MBytes 940 Mbits/sec >> [ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 10.00-11.00 sec 112 MBytes 939 Mbits/sec >> [ 5] 11.00-12.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 12.00-13.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 13.00-14.00 sec 112 MBytes 942 Mbits/sec >> [ 5] 14.00-15.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 15.00-16.00 sec 111 MBytes 927 Mbits/sec >> [ 5] 16.00-17.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 17.00-18.00 sec 112 MBytes 942 Mbits/sec >> [ 5] 18.00-19.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 19.00-20.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 20.00-21.00 sec 112 MBytes 936 Mbits/sec >> [ 5] 21.00-22.00 sec 112 MBytes 940 Mbits/sec >> [ 5] 22.00-23.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 23.00-24.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 24.00-25.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 25.00-26.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 26.00-27.00 sec 112 MBytes 940 Mbits/sec >> [ 5] 27.00-28.00 sec 112 MBytes 941 Mbits/sec >> [ 5] 28.00-29.00 sec 112 MBytes 940 Mbits/sec >> [ 5] 29.00-30.00 sec 112 MBytes 941 Mbits/sec >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ ID] Interval Transfer Bitrate >> [ 5] 0.00-30.00 sec 3.28 GBytes 939 Mbits/sec >> sender >> [ 5] 0.00-30.00 sec 3.28 GBytes 939 Mbits/sec >> receiver >> >> iperf Done. >> >> >> Observations/Conclusions >> >> - OSv fares a little better on QEMU/KVM than firecracker and that >> varies from ~5% to ~20% (Golang). Also please note vast difference >> between >> 1 cpu test results on firecracker and QEMU (hyperthreading is handled >> differently). On QEMU there is a small bump from 1 to 2 to 4 cpus except >> for Golang, on firecracker there is almost ~90-100% bump from 1 to 2 >> cpus. >> - To that end I have opened firecracker issue - >> https://github.com/firecracker-microvm/firecracker/issues/1034. >> - When you compare OSv on firecracker vs Linux on firecracker >> (comparing OSv on QEMU would be I guess unfair) you can see that: >> - Golang app on OSv was ~ 15% better vs on Linux with 1 cpu, >> almost identical with 2 cpus and app being faster on Linux ~30% with 4 >> CPUs >> (I did check that Golang runtime properly detects number of cpus) >> - Java app on OSv was ~ 5% faster with 1 CPU, ~ 20% faster with 2 >> CPUs and slightly slower with 4 CPUs >> - Could not run Rust app on Linux because it was alpine >> distribution built with musl and I did not have time to get Rust build >> properly for that scenario >> - When you compare OSv on QEMU/KVM vs Docker you can see that: >> - All apps running with single CPU fares almost the same with OSv >> being sometimes a little faster >> - Java and Rust apps performed only a little better (2-10%) on >> Docker vs OSv >> - Golang on OSv scaled well with number of CPUs but performed much >> worse on OSv (20-30%) with 2 and 4 cpus >> - There seems to be a bottleneck around 40-50K requests per seconds >> somewhere. Looking at one result, the raw network rate reported was >> around >> 26-28MB per second. GIven that HTTP requests require sending request and >> response possibly that is what is the maximum the network - combination >> of >> ethernet switch and server and client machines - can handle? >> >> >> Questions >> >> - Are there any flaws in this test setup? >> - Why does OSv not scale in some scenarios - especially when bumping >> from 2 to 4 cpus?? Networking bottleneck? Scheduler? Locks? >> - Could we further optimize OSv running with single CPU (skip global >> cross-CPU page allocator, etc)? >> >> >> To get even more insight I also compared how OSv on QEMU would fare >> against same app running in Docker with wrk running on the host and firing >> requests locally. You can find the results under >> https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/host >> . >> >> OSv on QEMU >> *Golang* >> >> *1 CPU* >> Requests/sec: 25188.60 >> Requests/sec: 24664.43 >> Requests/sec: 23935.77 >> *2 CPUs* >> Requests/sec: 37118.95 >> Requests/sec: 37108.96 >> Requests/sec: 35997.58 >> >> *4 CPUs* >> Requests/sec: 49987.20 >> Requests/sec: 48710.74 >> Requests/sec: 44789.96 >> >> >> *Java* >> *1 CPU* >> Requests/sec: 43648.02 >> Requests/sec: 45457.98 >> Requests/sec: 41818.13 >> >> *2 CPUs* >> Requests/sec: 76224.39 >> Requests/sec: 75734.63 >> Requests/sec: 70597.35 >> >> *4 CPUs* >> Requests/sec: 80543.30 >> Requests/sec: 75187.46 >> Requests/sec: 72986.93 >> >> >> *Rust* >> *1 CPU* >> Requests/sec: 42392.75 >> Requests/sec: 39679.21 >> Requests/sec: 37871.49 >> >> *2 CPUs* >> Requests/sec: 82484.67 >> Requests/sec: 83272.65 >> Requests/sec: 71671.13 >> >> *4 CPUs* >> Requests/sec: 95910.23 >> Requests/sec: 86811.76 >> Requests/sec: 83213.93 >> >> >> Docker >> >> *Golang* >> *1 CPU* >> Requests/sec: 24191.63 >> Requests/sec: 23574.89 >> Requests/sec: 23716.33 >> >> *2 CPUs* >> Requests/sec: 34889.01 >> Requests/sec: 34487.01 >> Requests/sec: 34468.03 >> >> *4 CPUs* >> Requests/sec: 48850.24 >> Requests/sec: 48690.09 >> Requests/sec: 48356.66 >> >> >> *Java* >> *1 CPU* >> Requests/sec: 32267.09 >> Requests/sec: 34670.41 >> Requests/sec: 34828.68 >> >> *2 CPUs* >> Requests/sec: 47533.94 >> Requests/sec: 50734.05 >> Requests/sec: 50203.98 >> >> *4 CPUs* >> Requests/sec: 69644.61 >> Requests/sec: 72704.40 >> Requests/sec: 70805.84 >> >> >> *Rust* >> *1 CPU* >> Requests/sec: 37061.52 >> Requests/sec: 36637.62 >> Requests/sec: 33154.57 >> >> *2 CPUs* >> Requests/sec: 51743.94 >> Requests/sec: 51476.78 >> Requests/sec: 50934.27 >> >> *4 CPUs* >> Requests/sec: 75125.41 >> Requests/sec: 74051.27 >> Requests/sec: 74434.78 >> >> - Does this test even make sense? >> - As you can see OSv outperforms docker in this scenario to various >> degree by 5-20%. Can anybody explain why? Is it because in this case >> iboth >> wrk and apps are on the same machine and number of context switches are >> fewer between kernel and user mode in favor of OSv? Does it mean that we >> could benefit from a setup with a load balancer (for example like haproxy >> or squid) that would be running on the same host in user mode and >> forwarding to single-CPU OSv instances vs single OSv with multiple CPUs? >> >> Looking forward to hear what others think. >> >> Waldek >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "OSv Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
