Hey! Interesting paper Waldek! I'm trying to push OSv to my friends here, let's hope This sparks some contributions :)
Kind Regards, Geraldo Netto Sapere Aude => Non dvcor, dvco http://exdev.sf.net/ On Thu, 7 Mar 2019 at 13:56, Waldek Kozaczuk <[email protected]> wrote: > > Hi, > > I am forwarding here my exchange with the author of the paper about OSv on > Xen beating docker in single vCPU setup. Adding the link to the article here: > https://biblio.ugent.be/publication/8582433/file/8582438.pdf > > Enjoy, > Waldek > ---------- Forwarded message --------- > From: Waldek Kozaczuk <[email protected]> > Date: Sat, Mar 2, 2019 at 5:17 PM > Subject: Re: Unikernel performance review paper inquiry > To: Tom Goethals <[email protected]> > > > Tom, > > Thanks for replying to my email. > > Do you mind if I forward our email exchange to the OSv development group > [email protected]? I think that other would also be very interested in > your article and findings. Also you might also get some good insight from > more experienced than me OSv users and developers. > > Please see my comments to some of your comments below. > > Regards, > Waldek > > On Tue, Feb 26, 2019 at 5:19 AM Tom Goethals <[email protected]> > wrote: >> >> Hello Waldek, >> >> >> Thank you for the thorough read of my paper. It's interesting to see it from >> the perspective of an OSv contributor. I have to admit I was very new to >> unikernels (I had barely started my PhD) when I wrote the paper, so I may >> have missed some things. Considering the amount of bullet points, I inserted >> my answers/comments below for each point. >> >> >> Generally, I did choose OSv as a focus because it seemed (by far) the most >> mature and stable platform to create and run unikernels with, so it's good >> to see new features and compatibility for languages being added. In the >> future, I would really like to do some research on mixed container-unikernel >> deployments with a single orchestration platform, but someone else may very >> well beat me to it :) >> >> >> During the tests I did not find any large problems or lack of features in >> OSv, so I have little to add there. In fact, once I got the hang of it, OSv >> unikernels were quite easy to build and run on a lot of hypervisors. Most >> problems were about getting Python3 to work, but that has apparently been >> fixed. Note that the tests were mostly centered on network performance and >> simple workloads, so more in-depth future work could still result in >> suggestions. >> >> >> Regards, >> >> Tom >> >> ________________________________ >> Van: Waldek Kozaczuk <[email protected]> >> Verzonden: dinsdag 26 februari 2019 0:15 >> Aan: Tom Goethals >> Onderwerp: Unikernel performance review paper inquiry >> >> Hi, >> >> Congratulations on writing this interesting and thorough paper - >> https://biblio.ugent.be/publication/8582433/file/8582438.pdf !!! Indeed it >> is probably first paper trying to compare performance of OSv and Docker >> containers. >> >> I hope this email finds the authors of this paper as I would like to ask >> some questions regarding it as well as clarify/explain some observations >> about OSv you touched on. >> >> First I wanted to introduce myself. My name is Waldemar Kozaczuk and I am >> one of the OSv committers. Though I am not one of the original authors of >> OSv, I have been playing with it and contributing to since 2015. My major >> contributions have been: >> >> Implementing ROFS (Read-Only FS) >> Adding support of golang and python 3 >> Enhancing OSv to make it run on AWS firecracker >> >> I must say I have been very pleased to learn about how well OSv did in the >> single-vCPU performance tests but also a little disappointed that OSv did >> not fare that well in multi-vCPU tests ;-) Not surprised with results of >> workload tests on other hand. I would love to be able to reproduce it myself >> at some point. I noticed you refer to this project >> https://github.com/togoetha/unikernels-v-containers where I have found the >> source code and build scripts for OSv and Docker images. On other hand I >> could not locate any scripts nor JMeter setups that would let me run those >> tests. Could you possibly point me to those? >> => I did not actually make any JMeter scripts. The JMeter GUI was used to >> find the breaking point for each service/language, but admittedly it was >> time-consuming work that could have been handled better. I can however tell >> you the basic settings: 40 threads, started simultaneously (no buildup), >> each sending 50000 requests ASAP. Results reflect the number of responses >> per second and latency. Some fiddling was done with the settings (less >> threads, more threads, varying buildup, ...), but the best results were >> actually gained by simply unleashing concurrency hell on the services and >> seeing how fast they could handle it. It's somewhat possible the real limits >> were slightly higher, but ironically the machine used to generate the load >> with JMeter couldn't go any higher. >> >> Let me structure remaining part of this email as a list of bullet points >> referring to your article: >> >> You mention in the introduction that unikernels are hard to debug and lack >> good debugging tools. I would agree with it but also point out that OSv >> shines in this aspect: >> >> can be easily debugged with gdb - >> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb >> provides management and monitoring REST API to monitor - >> http://osv.io/api/swagger-ui/dist/index.html and HTML5 terminal app - >> https://github.com/wkozaczuk/osv-html5-terminal >> can be profiled - >> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py >> >> => Debugging was mostly a general remark about the state of the art in >> unikernels. Indeed, I had an easier time troubleshooting and debugging OSv >> unikernels than other unikernel platforms. Being POSIX compatible (as >> opposed to "clean slate" alternatives such as IncludeOS) really helps there. >> >> Which version of capstan did you use? Were you using latest mikelangelo >> capstan - https://github.com/mikelangelo-project/capstan that supports >> packages (similar to docker compose)? >> >> => The original Capstan from Cloudius Systems was used >> (https://github.com/cloudius-systems/capstan). I see much has changed in the >> newer version, time to catch up. >> >> You mention you had problem running vertx Java microservice on Java > 8. >> Could you please clarify what the exact problem was? We have number of >> example apps (like >> https://github.com/cloudius-systems/osv-apps/tree/master/openjdk11-zulu-java-base) >> that demonstrate running simple "hello world" even on latest Java 11. I >> wonder if you hit an issue related to non-isolated vs isolated mode of >> running Java mode. The isolated one was a default one before this commit >> https://github.com/cloudius-systems/osv/commit/99dd1c5b521a0ab4642e79a2e992c50ad719f7c6 >> (after release of 0.51.0) and unlike on-isolated one is not supported on > >> Java 8. I am also aware that our tiny run-java wrapper does not support new >> options added to Java 9 and beyond - like "--add-exports" which might be >> necessary when running vertx app which uses netty. >> >> => Actually, I did get Java to work properly with the included JDK's up to >> Java 10 or so. As I remember, my problem was in not getting minimal JRE's to >> run correctly on OSv, resulting from the fact that those were cross-compiled >> from the JDK on my machine in a rather hackish way in an attempt to get them >> to run on OSv. However, that was not a problem for the tests, it was just a >> curiosity. I'm guessing this could work if the entire Java JDK on the local >> machine is recompiled to suit OSv first, and then building minimal JRE's >> from it on the local machine to run on OSv. However, time was a bit short >> and I just dropped it. > > In general OSv should be able to run any unmodified (without need to > recompile) Linux JDK distribution. The best I have found are from Azul > (Zulu). But Amazon recently started releasing their own OpenJDK distribution > Coretto (https://aws.amazon.com/corretto/) which I have not tried yet. >> >> >> Indeed python 3 was not supported as of 0.51.0 but is supported now as of >> 0.52.0 - https://github.com/cloudius-systems/osv/releases/tag/v0.52.0 >> >> => Nice! A lot of improvement in resource use too, and ffmpeg looks >> interesting for handling camera streams. > > BTW I worked with some people from EU Mikelangelo project to get ffmpeg on > OSv do some video transcoding. > Also hopefully within next week or two I will be trying to cut new 0.53.0 > release of OSv. The most exciting feature will be support of AWS firecracker > (https://firecracker-microvm.github.io/) which allows OSv to boot in 7ms. > Stay tuned! >> >> >> As you have noticed the go wrapper does not affect performance. It was >> merely added to provide a workaround around TLS-related issues with golang >> apps build as shared library; please see description of this commit - >> https://github.com/cloudius-systems/osv/commit/438008362a8ef74666b4e44af4b3205b86a52d06 >> for details. >> >> => From the code, it was pretty clear the wrapper did not have any adverse >> effects, but I sort of had to confirm it for the paper. I did find the post >> you supplied, but didn't realize it was for TLS issues. Thanks. >> >> You have not mentioned specifically, but I am guessing all OSv images you >> built were with ZFS filesystem. Please note that even with 0.51.0 we had >> support for simple Read-Only Filesystem (ROFS). I think ROFS is even better >> fit for microservice apps on OSv. You can built OSv images with ROFS using >> latest mikelangelo capstan - >> https://github.com/mikelangelo-project/capstan/blob/master/Documentation/OsvFilesystem.md >> >> => Yes, all images were using ZFS. Considering how small the services are, I >> would guess they're kept entirely in memory so I'm not sure how much of an >> improvement ROFS may be (not very familiar with that sort of thing), but >> it's definitely interesting for real-life stateless services. >> >> I must say I was a little astonished you were able to successfully test >> golang apps built as --buildmode=pie. As you can see see in >> https://github.com/cloudius-systems/osv/issues/352, OSv currently does not >> support TLS (Thread Local Storage) in local-exec mode (you probably saw this >> warning printed by OSv - "WARNING: XYZ.so is a PIE using TLS..."). But >> apparently Golang apps built as pie do not use TLS or we are just lucky. I >> was myself surprised I could run pie Golang apps like httpserver without any >> issues. >> >> => Yes, that warning popped on my screen plenty of times. But since it >> worked, it didn't seem very important. >> >> I was surprised to hear about OSv scaling poorly under multi-CPU tests. I >> have not really tested OSv much on Xen except for running it on AWS EC2 >> instances so I do not know what the reasons for that might be. I will ask on >> our mailing list. On other hand I must say that during casual tests on my >> MacBook Pro with 4 hyper-threading i7 cores and Ubuntu 18.10 on, I was able >> to see OSv scale pretty well with QEMU/KVM (I believe type 2 hypervisor): >> >> for example I was able to see 50-60% performance increase when going from 1 >> to 2 vCPUs >> also with 2 vCPUs I was able to see 10-15% better performance on OSv >> comparing to the same app running directly on the Linux host which shocked >> me a little I must admit >> for my tests I was using this app - >> https://github.com/cloudius-systems/osv-apps/tree/master/golang-httpserver - >> and the ab tool (Apache Bench) to simulate load >> lastly I saw even better performance of a microservice written in Rust >> (https://github.com/cloudius-systems/osv-apps/tree/master/rust-httpserver) >> >> => Actually, I ran some of the unikernels on VirtualBox first as proof of >> concept. While VirtualBox gave comparatively disastrous numbers (I guess >> it's not really the best hypervisor), the effect of multithreading was the >> same there. At first, I thought my coding was to blame, but the problem >> seems to persist across programming languages (with varying impact). The >> only real lead I managed to discover was that most of the requests under Go >> are in fact handled way faster in a multi-threaded application, but a tiny >> percentage gets held up for up to a second, which seems to block general >> throughput a bit. The tests used a rather large connection pool (up to 40), >> so my best guess at the time was that the sheer amount of concurrent >> requests coming in caused a few too many locks on something in the >> scheduler, but I don't really have experience with that level of >> programming. Since I didn't do any tests with 2 vCPUs, maybe there's an >> issue larger vCPU pools? It does seem to be very language dependent, and I'm >> still not clear on Go's behavior in that case. Of course, OSv unikernels >> have excellent single core performance, so if the scaling problem is fixed >> or was somehow due to the way I tested I'm pretty sure they would demolish >> containers or standard Linux in any comparison. >> >> I saw you mentioned that occasionally OSv would stop responding during >> multi-cpu tests. Could you please elaborate on this? I myself occasionally >> see a "lingering connection closed" issue where couple of requests never >> come back when OSv gets fired with a long batch (>100K) of requests by ab. >> However in my case if I simply restart the test with ab, OSv will continue >> responding. Is this similar to what you saw? >> >> => This sometimes happened when running Go unikernels without the wrapper, >> and in all languages when they were multi-thread enabled. I don't seem to >> have saved any screenshots or output, but as I remember all requests got >> locked up (and eventually gave a timeout) and I had to restart the entire >> test anyway because the results got invalidated by the gap. I will try to >> get hold of the original test setup again to see if OSv still responds if >> new requests are started. Note that it was pretty rare, since it sometimes >> only happened after up to 10 million requests. >> >> I would also like to point out that most container deployments at least in >> public clouds (even AWS ECS offering) use virtual machines like EC2 >> instances NOT bare metal computers. So the difference in performance between >> containers and OSv that can run directly as a guest OS on EC2 VM might be >> even more profound. Am I wrong? But that might not be fair to containers ;-) >> >> => Indeed, this was also mentioned a few times by reviewers and attendees at >> the conference. However, I was trying to get as close to bare metal for both >> unikernels and containers to make the comparison more "fair". I guess in >> real life it's even easier for unikernels to win... >> >> Have you considered testing Node.js microservices or ones written in Rust? >> We also have a working example of running GraalVM Java native images on OSv. >> >> => This may happen in the future. The focus was put on Go, Java and Python >> because Java and Python are commonly used in my research team, and we're >> trying to switch some things to Go as well. >> >> Lastly besides improving performance of multithreaded apps, what else would >> you want to be enhanced or improved in OSv? Interested in your thoughts. >> >> Looking forward to a reply and sorry for my long email. Thanks in advance >> for your reply. >> >> My regards, >> Waldek >> >> PS. Please follow us on Google groups >> (https://groups.google.com/forum/#!forum/osv-dev) and on Twitter at >> #OSv_unikernel >> >> > -- > You received this message because you are subscribed to the Google Groups > "OSv Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
