I wonder if anyone has had chance to read this paper. I would like to see what others think about reasons OSv thread scheduler does not scale well with number of vCPUs.
On Thursday, March 7, 2019 at 7:56:42 AM UTC-5, Waldek Kozaczuk wrote: > > Hi, > > I am forwarding here my exchange with the author of the paper about OSv on > Xen beating docker in single vCPU setup. Adding the link to the article > here: > https://biblio.ugent.be/publication/8582433/file/8582438.pdf > > Enjoy, > Waldek > ---------- Forwarded message --------- > From: Waldek Kozaczuk <[email protected]> > Date: Sat, Mar 2, 2019 at 5:17 PM > Subject: Re: Unikernel performance review paper inquiry > To: Tom Goethals <[email protected]> > > > Tom, > > Thanks for replying to my email. > > Do you mind if I forward our email exchange to the OSv development group > [email protected]? I think that other would also be very > interested in your article and findings. Also you might also get some good > insight from more experienced than me OSv users and developers. > > Please see my comments to some of your comments below. > > Regards, > Waldek > > On Tue, Feb 26, 2019 at 5:19 AM Tom Goethals <[email protected]> > wrote: > >> Hello Waldek, >> >> >> Thank you for the thorough read of my paper. It's interesting to see it >> from the perspective of an OSv contributor. I have to admit I was very new >> to unikernels (I had barely started my PhD) when I wrote the paper, so I >> may have missed some things. Considering the amount of bullet points, I >> inserted my answers/comments below for each point. >> >> >> Generally, I did choose OSv as a focus because it seemed (by far) the >> most mature and stable platform to create and run unikernels with, so it's >> good to see new features and compatibility for languages being added. In >> the future, I would really like to do some research on mixed >> container-unikernel deployments with a single orchestration platform, but >> someone else may very well beat me to it :) >> >> >> During the tests I did not find any large problems or lack of features in >> OSv, so I have little to add there. In fact, once I got the hang of it, OSv >> unikernels were quite easy to build and run on a lot of hypervisors. Most >> problems were about getting Python3 to work, but that has apparently been >> fixed. Note that the tests were mostly centered on network performance and >> simple workloads, so more in-depth future work could still result in >> suggestions. >> >> >> Regards, >> >> Tom >> ------------------------------ >> *Van:* Waldek Kozaczuk <[email protected]> >> *Verzonden:* dinsdag 26 februari 2019 0:15 >> *Aan:* Tom Goethals >> *Onderwerp:* Unikernel performance review paper inquiry >> >> Hi, >> >> Congratulations on writing this interesting and thorough paper - >> https://biblio.ugent.be/publication/8582433/file/8582438.pdf !!! Indeed >> it is probably first paper trying to compare performance of OSv and Docker >> containers. >> >> I hope this email finds the authors of this paper as I would like to ask >> some questions regarding it as well as clarify/explain some observations >> about OSv you touched on. >> >> First I wanted to introduce myself. My name is Waldemar Kozaczuk and I am >> one of the OSv committers. Though I am not one of the original authors of >> OSv, I have been playing with it and contributing to since 2015. My major >> contributions have been: >> >> - Implementing ROFS (Read-Only FS) >> - Adding support of golang and python 3 >> - Enhancing OSv to make it run on AWS firecracker >> >> I must say I have been very pleased to learn about how well OSv did in >> the single-vCPU performance tests but also a little disappointed that OSv >> did not fare that well in multi-vCPU tests ;-) Not surprised with results >> of workload tests on other hand. I would love to be able to reproduce it >> myself at some point. I noticed you refer to this project >> https://github.com/togoetha/unikernels-v-containers where I have found >> the source code and build scripts for OSv and Docker images. On other hand >> I could not locate any scripts nor JMeter setups that would let me run >> those tests. Could you possibly point me to those? >> => I did not actually make any JMeter scripts. The JMeter GUI was used to >> find the breaking point for each service/language, but admittedly it was >> time-consuming work that could have been handled better. I can however tell >> you the basic settings: 40 threads, started simultaneously (no buildup), >> each sending 50000 requests ASAP. Results reflect the number of responses >> per second and latency. Some fiddling was done with the settings (less >> threads, more threads, varying buildup, ...), but the best results were >> actually gained by simply unleashing concurrency hell on the services and >> seeing how fast they could handle it. It's somewhat possible the real >> limits were slightly higher, but ironically the machine used to generate >> the load with JMeter couldn't go any higher. >> >> Let me structure remaining part of this email as a list of bullet points >> referring to your article: >> >> - You mention in the introduction that unikernels are hard to debug >> and lack good debugging tools. I would agree with it but also point out >> that OSv shines in this aspect: >> - can be easily debugged with gdb - >> >> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb >> - provides management and monitoring REST API to monitor - >> http://osv.io/api/swagger-ui/dist/index.html and HTML5 terminal >> app - https://github.com/wkozaczuk/osv-html5-terminal >> - can be profiled - >> >> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py >> - => Debugging was mostly a general remark about the state of the >> art in unikernels. Indeed, I had an easier time troubleshooting and >> debugging OSv unikernels than other unikernel platforms. Being >> POSIX >> compatible (as opposed to "clean slate" alternatives such as >> IncludeOS) >> really helps there. >> - Which version of capstan did you use? Were you using latest >> mikelangelo capstan - https://github.com/mikelangelo-project/capstan >> that supports packages (similar to docker compose)? >> - => The original Capstan from Cloudius Systems was used ( >> https://github.com/cloudius-systems/capstan). I see much has >> changed in the newer version, time to catch up. >> - You mention you had problem running vertx Java microservice on >> Java > 8. Could you please clarify what the exact problem was? We have >> number of example apps (like >> >> https://github.com/cloudius-systems/osv-apps/tree/master/openjdk11-zulu-java-base) >> >> that demonstrate running simple "hello world" even on latest Java 11. I >> wonder if you hit an issue related to non-isolated vs isolated mode of >> running Java mode. The isolated one was a default one before this commit >> >> https://github.com/cloudius-systems/osv/commit/99dd1c5b521a0ab4642e79a2e992c50ad719f7c6 >> >> (after release of 0.51.0) and unlike on-isolated one is not supported on >> > >> Java 8. I am also aware that our tiny run-java wrapper does not support >> new >> options added to Java 9 and beyond - like "--add-exports" which might be >> necessary when running vertx app which uses netty. >> - => Actually, I did get Java to work properly with the included >> JDK's up to Java 10 or so. As I remember, my problem was in not >> getting >> minimal JRE's to run correctly on OSv, resulting from the fact that >> those >> were cross-compiled from the JDK on my machine in a rather hackish way >> in >> an attempt to get them to run on OSv. However, that was not a problem >> for >> the tests, it was just a curiosity. I'm guessing this could work if >> the >> entire Java JDK on the local machine is recompiled to suit OSv first, >> and >> then building minimal JRE's from it on the local machine to run on >> OSv. >> However, time was a bit short and I just dropped it. >> >> In general OSv should be able to run any unmodified (without need to > recompile) Linux JDK distribution. The best I have found are from Azul > (Zulu). But Amazon recently started releasing their own OpenJDK > distribution Coretto (https://aws.amazon.com/corretto/) which I have not > tried yet. > >> >> - >> - >> - Indeed python 3 was not supported as of 0.51.0 but is supported >> now as of 0.52.0 - >> https://github.com/cloudius-systems/osv/releases/tag/v0.52.0 >> - => Nice! A lot of improvement in resource use too, and ffmpeg looks >> interesting for handling camera streams. >> >> BTW I worked with some people from EU Mikelangelo project to get ffmpeg > on OSv do some video transcoding. > Also hopefully within next week or two I will be trying to cut new 0.53.0 > release of OSv. The most exciting feature will be support of AWS > firecracker (https://firecracker-microvm.github.io/) which allows OSv to > boot in 7ms. Stay tuned! > >> >> - >> - >> - As you have noticed the go wrapper does not affect performance. >> It was merely added to provide a workaround around TLS-related issues >> with >> golang apps build as shared library; please see description of this >> commit >> - >> >> https://github.com/cloudius-systems/osv/commit/438008362a8ef74666b4e44af4b3205b86a52d06 >> >> for details. >> - => From the code, it was pretty clear the wrapper did not have any >> adverse effects, but I sort of had to confirm it for the paper. I did >> find >> the post you supplied, but didn't realize it was for TLS issues. >> Thanks. >> - You have not mentioned specifically, but I am guessing all OSv >> images you built were with ZFS filesystem. Please note that even with >> 0.51.0 we had support for simple Read-Only Filesystem (ROFS). I think >> ROFS >> is even better fit for microservice apps on OSv. You can built OSv images >> with ROFS using latest mikelangelo capstan - >> >> https://github.com/mikelangelo-project/capstan/blob/master/Documentation/OsvFilesystem.md >> - => Yes, all images were using ZFS. Considering how small the >> services are, I would guess they're kept entirely in memory so I'm not >> sure >> how much of an improvement ROFS may be (not very familiar with that >> sort of >> thing), but it's definitely interesting for real-life stateless >> services. >> - I must say I was a little astonished you were able to >> successfully test golang apps built as --buildmode=pie. As you can see >> see >> in https://github.com/cloudius-systems/osv/issues/352, OSv currently >> does not support TLS (Thread Local Storage) in local-exec mode (you >> probably saw this warning printed by OSv - "WARNING: XYZ.so is a PIE >> using >> TLS..."). But apparently Golang apps built as pie do not use TLS or we >> are >> just lucky. I was myself surprised I could run pie Golang apps like >> httpserver without any issues. >> - => Yes, that warning popped on my screen plenty of times. But since >> it worked, it didn't seem very important. >> - I was surprised to hear about OSv scaling poorly under multi-CPU >> tests. I have not really tested OSv much on Xen except for running it on >> AWS EC2 instances so I do not know what the reasons for that might be. I >> will ask on our mailing list. On other hand I must say that during casual >> tests on my MacBook Pro with 4 hyper-threading i7 cores and Ubuntu 18.10 >> on, I was able to see OSv scale pretty well with QEMU/KVM (I believe type >> 2 >> hypervisor): >> - for example I was able to see 50-60% performance increase when >> going from 1 to 2 vCPUs >> - also with 2 vCPUs I was able to see 10-15% better performance on >> OSv comparing to the same app running directly on the Linux host which >> shocked me a little I must admit >> - for my tests I was using this app - >> >> https://github.com/cloudius-systems/osv-apps/tree/master/golang-httpserver >> - and the ab tool (Apache Bench) to simulate load >> - lastly I saw even better performance of a microservice written >> in Rust ( >> >> https://github.com/cloudius-systems/osv-apps/tree/master/rust-httpserver >> ) >> - => Actually, I ran some of the unikernels on VirtualBox first as >> proof of concept. While VirtualBox gave comparatively disastrous >> numbers (I guess it's not really the best hypervisor), the effect >> of >> multithreading was the same there. At first, I thought my coding >> was to >> blame, but the problem seems to persist across programming >> languages (with >> varying impact). The only real lead I managed to discover was that >> most of >> the requests under Go are in fact handled way faster in a >> multi-threaded >> application, but a tiny percentage gets held up for up to a second, >> which >> seems to block general throughput a bit. The tests used a rather >> large >> connection pool (up to 40), so my best guess at the time was that >> the sheer >> amount of concurrent requests coming in caused a few too many locks >> on >> something in the scheduler, but I don't really have experience with >> that >> level of programming. Since I didn't do any tests with 2 vCPUs, >> maybe >> there's an issue larger vCPU pools? It does seem to be very >> language >> dependent, and I'm still not clear on Go's behavior in that case. >> Of >> course, OSv unikernels have excellent single core performance, so >> if the >> scaling problem is fixed or was somehow due to the way I tested I'm >> pretty >> sure they would demolish containers or standard Linux in any >> comparison. >> - I saw you mentioned that occasionally OSv would stop >> responding during multi-cpu tests. Could you please elaborate on this? I >> myself occasionally see a "lingering connection closed" issue where >> couple >> of requests never come back when OSv gets fired with a long batch (>100K) >> of requests by ab. However in my case if I simply restart the test with >> ab, >> OSv will continue responding. Is this similar to what you saw? >> - => This sometimes happened when running Go unikernels without the >> wrapper, and in all languages when they were multi-thread enabled. I >> don't >> seem to have saved any screenshots or output, but as I remember all >> requests got locked up (and eventually gave a timeout) and I had to >> restart >> the entire test anyway because the results got invalidated by the gap. >> I >> will try to get hold of the original test setup again to see if OSv >> still >> responds if new requests are started. Note that it was pretty rare, >> since >> it sometimes only happened after up to 10 million requests. >> - I would also like to point out that most container deployments >> at least in public clouds (even AWS ECS offering) use virtual machines >> like >> EC2 instances NOT bare metal computers. So the difference in performance >> between containers and OSv that can run directly as a guest OS on EC2 VM >> might be even more profound. Am I wrong? But that might not be fair to >> containers ;-) >> - => Indeed, this was also mentioned a few times by reviewers and >> attendees at the conference. However, I was trying to get as close to >> bare >> metal for both unikernels and containers to make the comparison more >> "fair". I guess in real life it's even easier for unikernels to win... >> - Have you considered testing Node.js microservices or ones >> written in Rust? We also have a working example of running GraalVM Java >> native images on OSv. >> - => This may happen in the future. The focus was put on Go, Java and >> Python because Java and Python are commonly used in my research team, >> and >> we're trying to switch some things to Go as well. >> >> Lastly besides improving performance of multithreaded apps, what else >> would you want to be enhanced or improved in OSv? Interested in your >> thoughts. >> >> Looking forward to a reply and sorry for my long email. Thanks in advance >> for your reply. >> >> My regards, >> Waldek >> >> PS. Please follow us on Google groups ( >> https://groups.google.com/forum/#!forum/osv-dev) and on Twitter at >> #OSv_unikernel >> >> >> -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
