Hi, I am forwarding here my exchange with the author of the paper about OSv on Xen beating docker in single vCPU setup. Adding the link to the article here: https://biblio.ugent.be/publication/8582433/file/8582438.pdf
Enjoy, Waldek ---------- Forwarded message --------- From: Waldek Kozaczuk <[email protected]> Date: Sat, Mar 2, 2019 at 5:17 PM Subject: Re: Unikernel performance review paper inquiry To: Tom Goethals <[email protected]> Tom, Thanks for replying to my email. Do you mind if I forward our email exchange to the OSv development group [email protected]? I think that other would also be very interested in your article and findings. Also you might also get some good insight from more experienced than me OSv users and developers. Please see my comments to some of your comments below. Regards, Waldek On Tue, Feb 26, 2019 at 5:19 AM Tom Goethals <[email protected]> wrote: > Hello Waldek, > > > Thank you for the thorough read of my paper. It's interesting to see it > from the perspective of an OSv contributor. I have to admit I was very new > to unikernels (I had barely started my PhD) when I wrote the paper, so I > may have missed some things. Considering the amount of bullet points, I > inserted my answers/comments below for each point. > > > Generally, I did choose OSv as a focus because it seemed (by far) the most > mature and stable platform to create and run unikernels with, so it's good > to see new features and compatibility for languages being added. In the > future, I would really like to do some research on mixed > container-unikernel deployments with a single orchestration platform, but > someone else may very well beat me to it :) > > > During the tests I did not find any large problems or lack of features in > OSv, so I have little to add there. In fact, once I got the hang of it, OSv > unikernels were quite easy to build and run on a lot of hypervisors. Most > problems were about getting Python3 to work, but that has apparently been > fixed. Note that the tests were mostly centered on network performance and > simple workloads, so more in-depth future work could still result in > suggestions. > > > Regards, > > Tom > ------------------------------ > *Van:* Waldek Kozaczuk <[email protected]> > *Verzonden:* dinsdag 26 februari 2019 0:15 > *Aan:* Tom Goethals > *Onderwerp:* Unikernel performance review paper inquiry > > Hi, > > Congratulations on writing this interesting and thorough paper - > https://biblio.ugent.be/publication/8582433/file/8582438.pdf !!! Indeed > it is probably first paper trying to compare performance of OSv and Docker > containers. > > I hope this email finds the authors of this paper as I would like to ask > some questions regarding it as well as clarify/explain some observations > about OSv you touched on. > > First I wanted to introduce myself. My name is Waldemar Kozaczuk and I am > one of the OSv committers. Though I am not one of the original authors of > OSv, I have been playing with it and contributing to since 2015. My major > contributions have been: > > - Implementing ROFS (Read-Only FS) > - Adding support of golang and python 3 > - Enhancing OSv to make it run on AWS firecracker > > I must say I have been very pleased to learn about how well OSv did in the > single-vCPU performance tests but also a little disappointed that OSv did > not fare that well in multi-vCPU tests ;-) Not surprised with results of > workload tests on other hand. I would love to be able to reproduce it > myself at some point. I noticed you refer to this project > https://github.com/togoetha/unikernels-v-containers where I have found > the source code and build scripts for OSv and Docker images. On other hand > I could not locate any scripts nor JMeter setups that would let me run > those tests. Could you possibly point me to those? > => I did not actually make any JMeter scripts. The JMeter GUI was used to > find the breaking point for each service/language, but admittedly it was > time-consuming work that could have been handled better. I can however tell > you the basic settings: 40 threads, started simultaneously (no buildup), > each sending 50000 requests ASAP. Results reflect the number of responses > per second and latency. Some fiddling was done with the settings (less > threads, more threads, varying buildup, ...), but the best results were > actually gained by simply unleashing concurrency hell on the services and > seeing how fast they could handle it. It's somewhat possible the real > limits were slightly higher, but ironically the machine used to generate > the load with JMeter couldn't go any higher. > > Let me structure remaining part of this email as a list of bullet points > referring to your article: > > - You mention in the introduction that unikernels are hard to debug > and lack good debugging tools. I would agree with it but also point out > that OSv shines in this aspect: > - can be easily debugged with gdb - > > https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb > - provides management and monitoring REST API to monitor - > http://osv.io/api/swagger-ui/dist/index.html and HTML5 terminal app > - https://github.com/wkozaczuk/osv-html5-terminal > - can be profiled - > > https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py > - => Debugging was mostly a general remark about the state of the > art in unikernels. Indeed, I had an easier time troubleshooting and > debugging OSv unikernels than other unikernel platforms. Being POSIX > compatible (as opposed to "clean slate" alternatives such as > IncludeOS) > really helps there. > - Which version of capstan did you use? Were you using latest > mikelangelo capstan - https://github.com/mikelangelo-project/capstan > that supports packages (similar to docker compose)? > - => The original Capstan from Cloudius Systems was used ( > https://github.com/cloudius-systems/capstan). I see much has > changed in the newer version, time to catch up. > - You mention you had problem running vertx Java microservice on > Java > 8. Could you please clarify what the exact problem was? We have > number of example apps (like > > https://github.com/cloudius-systems/osv-apps/tree/master/openjdk11-zulu-java-base) > that demonstrate running simple "hello world" even on latest Java 11. I > wonder if you hit an issue related to non-isolated vs isolated mode of > running Java mode. The isolated one was a default one before this commit > > https://github.com/cloudius-systems/osv/commit/99dd1c5b521a0ab4642e79a2e992c50ad719f7c6 > (after release of 0.51.0) and unlike on-isolated one is not supported on > > Java 8. I am also aware that our tiny run-java wrapper does not support new > options added to Java 9 and beyond - like "--add-exports" which might be > necessary when running vertx app which uses netty. > - => Actually, I did get Java to work properly with the included JDK's > up to Java 10 or so. As I remember, my problem was in not getting > minimal > JRE's to run correctly on OSv, resulting from the fact that those were > cross-compiled from the JDK on my machine in a rather hackish way in an > attempt to get them to run on OSv. However, that was not a problem for > the tests, it was just a curiosity. I'm guessing this could work if the > entire Java JDK on the local machine is recompiled to suit OSv first, > and > then building minimal JRE's from it on the local machine to run on OSv. > However, time was a bit short and I just dropped it. > > In general OSv should be able to run any unmodified (without need to recompile) Linux JDK distribution. The best I have found are from Azul (Zulu). But Amazon recently started releasing their own OpenJDK distribution Coretto (https://aws.amazon.com/corretto/) which I have not tried yet. > > - > - > - Indeed python 3 was not supported as of 0.51.0 but is supported > now as of 0.52.0 - > https://github.com/cloudius-systems/osv/releases/tag/v0.52.0 > - => Nice! A lot of improvement in resource use too, and ffmpeg looks > interesting for handling camera streams. > > BTW I worked with some people from EU Mikelangelo project to get ffmpeg on OSv do some video transcoding. Also hopefully within next week or two I will be trying to cut new 0.53.0 release of OSv. The most exciting feature will be support of AWS firecracker (https://firecracker-microvm.github.io/) which allows OSv to boot in 7ms. Stay tuned! > > - > - > - As you have noticed the go wrapper does not affect performance. > It was merely added to provide a workaround around TLS-related issues with > golang apps build as shared library; please see description of this commit > - > > https://github.com/cloudius-systems/osv/commit/438008362a8ef74666b4e44af4b3205b86a52d06 > for details. > - => From the code, it was pretty clear the wrapper did not have any > adverse effects, but I sort of had to confirm it for the paper. I did > find > the post you supplied, but didn't realize it was for TLS issues. Thanks. > - You have not mentioned specifically, but I am guessing all OSv > images you built were with ZFS filesystem. Please note that even with > 0.51.0 we had support for simple Read-Only Filesystem (ROFS). I think ROFS > is even better fit for microservice apps on OSv. You can built OSv images > with ROFS using latest mikelangelo capstan - > > https://github.com/mikelangelo-project/capstan/blob/master/Documentation/OsvFilesystem.md > - => Yes, all images were using ZFS. Considering how small the > services are, I would guess they're kept entirely in memory so I'm not > sure > how much of an improvement ROFS may be (not very familiar with that > sort of > thing), but it's definitely interesting for real-life stateless > services. > - I must say I was a little astonished you were able to > successfully test golang apps built as --buildmode=pie. As you can see see > in https://github.com/cloudius-systems/osv/issues/352, OSv currently > does not support TLS (Thread Local Storage) in local-exec mode (you > probably saw this warning printed by OSv - "WARNING: XYZ.so is a PIE using > TLS..."). But apparently Golang apps built as pie do not use TLS or we are > just lucky. I was myself surprised I could run pie Golang apps like > httpserver without any issues. > - => Yes, that warning popped on my screen plenty of times. But since > it worked, it didn't seem very important. > - I was surprised to hear about OSv scaling poorly under multi-CPU > tests. I have not really tested OSv much on Xen except for running it on > AWS EC2 instances so I do not know what the reasons for that might be. I > will ask on our mailing list. On other hand I must say that during casual > tests on my MacBook Pro with 4 hyper-threading i7 cores and Ubuntu 18.10 > on, I was able to see OSv scale pretty well with QEMU/KVM (I believe type 2 > hypervisor): > - for example I was able to see 50-60% performance increase when > going from 1 to 2 vCPUs > - also with 2 vCPUs I was able to see 10-15% better performance on > OSv comparing to the same app running directly on the Linux host which > shocked me a little I must admit > - for my tests I was using this app - > > https://github.com/cloudius-systems/osv-apps/tree/master/golang-httpserver > - and the ab tool (Apache Bench) to simulate load > - lastly I saw even better performance of a microservice written in > Rust ( > https://github.com/cloudius-systems/osv-apps/tree/master/rust-httpserver > ) > - => Actually, I ran some of the unikernels on VirtualBox first as > proof of concept. While VirtualBox gave comparatively disastrous > numbers (I guess it's not really the best hypervisor), the effect of > multithreading was the same there. At first, I thought my coding was > to > blame, but the problem seems to persist across programming languages > (with > varying impact). The only real lead I managed to discover was that > most of > the requests under Go are in fact handled way faster in a > multi-threaded > application, but a tiny percentage gets held up for up to a second, > which > seems to block general throughput a bit. The tests used a rather > large > connection pool (up to 40), so my best guess at the time was that > the sheer > amount of concurrent requests coming in caused a few too many locks > on > something in the scheduler, but I don't really have experience with > that > level of programming. Since I didn't do any tests with 2 vCPUs, maybe > there's an issue larger vCPU pools? It does seem to be very language > dependent, and I'm still not clear on Go's behavior in that case. Of > course, OSv unikernels have excellent single core performance, so if > the > scaling problem is fixed or was somehow due to the way I tested I'm > pretty > sure they would demolish containers or standard Linux in any > comparison. > - I saw you mentioned that occasionally OSv would stop > responding during multi-cpu tests. Could you please elaborate on this? I > myself occasionally see a "lingering connection closed" issue where couple > of requests never come back when OSv gets fired with a long batch (>100K) > of requests by ab. However in my case if I simply restart the test with ab, > OSv will continue responding. Is this similar to what you saw? > - => This sometimes happened when running Go unikernels without the > wrapper, and in all languages when they were multi-thread enabled. I > don't > seem to have saved any screenshots or output, but as I remember all > requests got locked up (and eventually gave a timeout) and I had to > restart > the entire test anyway because the results got invalidated by the gap. I > will try to get hold of the original test setup again to see if OSv > still > responds if new requests are started. Note that it was pretty rare, > since > it sometimes only happened after up to 10 million requests. > - I would also like to point out that most container deployments at > least in public clouds (even AWS ECS offering) use virtual machines like > EC2 instances NOT bare metal computers. So the difference in performance > between containers and OSv that can run directly as a guest OS on EC2 VM > might be even more profound. Am I wrong? But that might not be fair to > containers ;-) > - => Indeed, this was also mentioned a few times by reviewers and > attendees at the conference. However, I was trying to get as close to > bare > metal for both unikernels and containers to make the comparison more > "fair". I guess in real life it's even easier for unikernels to win... > - Have you considered testing Node.js microservices or ones written > in Rust? We also have a working example of running GraalVM Java native > images on OSv. > - => This may happen in the future. The focus was put on Go, Java and > Python because Java and Python are commonly used in my research team, > and > we're trying to switch some things to Go as well. > > Lastly besides improving performance of multithreaded apps, what else > would you want to be enhanced or improved in OSv? Interested in your > thoughts. > > Looking forward to a reply and sorry for my long email. Thanks in advance > for your reply. > > My regards, > Waldek > > PS. Please follow us on Google groups ( > https://groups.google.com/forum/#!forum/osv-dev) and on Twitter at > #OSv_unikernel > > > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
