I wonder if anyone has had chance to read this paper. I would like to see 
what others think about reasons OSv thread scheduler does not scale well 
with number of vCPUs.

On Thursday, March 7, 2019 at 7:56:42 AM UTC-5, Waldek Kozaczuk wrote:
>
> Hi,
>
> I am forwarding here my exchange with the author of the paper about OSv on 
> Xen beating docker in single vCPU setup. Adding the link to the article 
> here:
> https://biblio.ugent.be/publication/8582433/file/8582438.pdf
>
> Enjoy,
> Waldek
> ---------- Forwarded message ---------
> From: Waldek Kozaczuk <[email protected]>
> Date: Sat, Mar 2, 2019 at 5:17 PM
> Subject: Re: Unikernel performance review paper inquiry
> To: Tom Goethals <[email protected]>
>
>
> Tom,
>
> Thanks for replying to my email. 
>
> Do you mind if I forward our email exchange to the OSv development group 
> [email protected]? I think that other would also be very 
> interested in your article and findings. Also you might also get some good 
> insight from more experienced than me OSv users and developers.
>
> Please see my comments to some of your comments below.
>
> Regards,
> Waldek
>
> On Tue, Feb 26, 2019 at 5:19 AM Tom Goethals <[email protected]> 
> wrote:
>
>> Hello Waldek,
>>
>>
>> Thank you for the thorough read of my paper. It's interesting to see it 
>> from the perspective of an OSv contributor. I have to admit I was very new 
>> to unikernels (I had barely started my PhD) when I wrote the paper, so I 
>> may have missed some things. Considering the amount of bullet points, I 
>> inserted my answers/comments below for each point.
>>
>>
>> Generally, I did choose OSv as a focus because it seemed (by far) the 
>> most mature and stable platform to create and run unikernels with, so it's 
>> good to see new features and compatibility for languages being added. In 
>> the future, I would really like to do some research on mixed 
>> container-unikernel deployments with a single orchestration platform, but 
>> someone else may very well beat me to it :) 
>>
>>
>> During the tests I did not find any large problems or lack of features in 
>> OSv, so I have little to add there. In fact, once I got the hang of it, OSv 
>> unikernels were quite easy to build and run on a lot of hypervisors. Most 
>> problems were about getting Python3 to work, but that has apparently been 
>> fixed. Note that the tests were mostly centered on network performance and 
>> simple workloads, so more in-depth future work could still result in 
>> suggestions.
>>
>>
>> Regards,
>>
>> Tom
>> ------------------------------
>> *Van:* Waldek Kozaczuk <[email protected]>
>> *Verzonden:* dinsdag 26 februari 2019 0:15
>> *Aan:* Tom Goethals
>> *Onderwerp:* Unikernel performance review paper inquiry 
>>  
>> Hi,
>>
>> Congratulations on writing this interesting and thorough paper - 
>> https://biblio.ugent.be/publication/8582433/file/8582438.pdf !!! Indeed 
>> it is probably first paper trying to compare performance of OSv and Docker 
>> containers.
>>
>> I hope this email finds the authors of this paper as I would like to ask 
>> some questions regarding it as well as clarify/explain some observations 
>> about OSv you touched on.
>>
>> First I wanted to introduce myself. My name is Waldemar Kozaczuk and I am 
>> one of the OSv committers. Though I am not one of the original authors of 
>> OSv, I have been playing with it and contributing to since 2015. My major 
>> contributions have been:
>>
>>    - Implementing ROFS (Read-Only FS)
>>    - Adding support of golang and python 3
>>    - Enhancing OSv to make it run on AWS firecracker
>>
>> I must say I have been very pleased to learn about how well OSv did in 
>> the single-vCPU performance tests but also a little disappointed that OSv 
>> did not fare that well in multi-vCPU tests ;-) Not surprised with results 
>> of workload tests on other hand. I would love to be able to reproduce it 
>> myself at some point. I noticed you refer to this project 
>> https://github.com/togoetha/unikernels-v-containers where I have found 
>> the source code and build scripts for OSv and Docker images. On other hand 
>> I could not locate any scripts nor JMeter setups that would let me run 
>> those tests. Could you possibly point me to those?
>> => I did not actually make any JMeter scripts. The JMeter GUI was used to 
>> find the breaking point for each service/language, but admittedly it was 
>> time-consuming work that could have been handled better. I can however tell 
>> you the basic settings: 40 threads, started simultaneously (no buildup), 
>> each sending 50000 requests ASAP. Results reflect the number of responses 
>> per second and latency. Some fiddling was done with the settings (less 
>> threads, more threads, varying buildup, ...), but the best results were 
>> actually gained by simply unleashing concurrency hell on the services and 
>> seeing how fast they could handle it. It's somewhat possible the real 
>> limits were slightly higher, but ironically the machine used to generate 
>> the load with JMeter couldn't go any higher.
>>
>> Let me structure remaining part of this email as a list of bullet points 
>> referring to your article: 
>>
>>    - You mention in the introduction that unikernels are hard to debug 
>>    and lack good debugging tools. I would agree with it but also point out 
>>    that OSv shines in this aspect: 
>>       - can be easily debugged with gdb - 
>>       
>> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb
>>       - provides management and monitoring REST API to monitor - 
>>       http://osv.io/api/swagger-ui/dist/index.html and HTML5 terminal 
>>       app - https://github.com/wkozaczuk/osv-html5-terminal
>>       - can be profiled - 
>>       
>> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py
>>       - => Debugging was mostly a general remark about the state of the 
>>          art in unikernels. Indeed, I had an easier time troubleshooting and 
>>          debugging OSv unikernels than other unikernel platforms. Being 
>> POSIX 
>>          compatible (as opposed to "clean slate" alternatives such as 
>> IncludeOS) 
>>          really helps there.
>>       - Which version of capstan did you use? Were you using latest 
>>    mikelangelo capstan - https://github.com/mikelangelo-project/capstan 
>>    that supports packages (similar to docker compose)?
>>    - => The original Capstan from Cloudius Systems was used (
>>       https://github.com/cloudius-systems/capstan). I see much has 
>>       changed in the newer version, time to catch up.
>>       - You mention you had problem running vertx Java microservice on 
>>    Java > 8. Could you please clarify what the exact problem was? We have 
>>    number of example apps (like 
>>    
>> https://github.com/cloudius-systems/osv-apps/tree/master/openjdk11-zulu-java-base)
>>  
>>    that demonstrate running simple "hello world" even on latest Java 11. I 
>>    wonder if you hit an issue related to non-isolated vs isolated mode of 
>>    running Java mode. The isolated one was a default one before this commit 
>>    
>> https://github.com/cloudius-systems/osv/commit/99dd1c5b521a0ab4642e79a2e992c50ad719f7c6
>>  
>>    (after release of 0.51.0) and unlike on-isolated one is not supported on 
>> > 
>>    Java 8. I am also aware that our tiny run-java wrapper does not support 
>> new 
>>    options added to Java 9 and beyond - like "--add-exports" which might be 
>>    necessary when running vertx app which uses netty.
>>    - => Actually, I did get Java to work properly with the included 
>>       JDK's up to Java 10 or so. As I remember, my problem was in not 
>> getting 
>>       minimal JRE's to run correctly on OSv, resulting from the fact that 
>> those 
>>       were cross-compiled from the JDK on my machine in a rather hackish way 
>> in 
>>       an attempt to get them to run on OSv. However, that was not a problem 
>> for 
>>       the tests, it was just a curiosity. I'm guessing this could work if 
>> the 
>>       entire Java JDK on the local machine is recompiled to suit OSv first, 
>> and 
>>       then building minimal JRE's from it on the local machine to run on 
>> OSv. 
>>       However, time was a bit short and I just dropped it.
>>    
>> In general OSv should be able to run any unmodified (without need to 
> recompile) Linux JDK distribution. The best I have found are from Azul 
> (Zulu). But Amazon recently started releasing their own OpenJDK 
> distribution Coretto (https://aws.amazon.com/corretto/) which I have not 
> tried yet.
>
>>
>>    - 
>>       - 
>>       - Indeed python 3 was not supported as of 0.51.0 but is supported 
>>    now as of 0.52.0 - 
>>    https://github.com/cloudius-systems/osv/releases/tag/v0.52.0
>>    - => Nice! A lot of improvement in resource use too, and ffmpeg looks 
>>       interesting for handling camera streams.
>>    
>> BTW I worked with some people from EU Mikelangelo project to get ffmpeg 
> on OSv do some video transcoding. 
> Also hopefully within next week or two I will be trying to cut new 0.53.0 
> release of OSv. The most exciting feature will be support of AWS 
> firecracker (https://firecracker-microvm.github.io/) which allows OSv to 
> boot in 7ms. Stay tuned!
>
>>
>>    - 
>>       - 
>>       - As you have noticed the go wrapper does not affect performance. 
>>    It was merely added to provide a workaround around TLS-related issues 
>> with 
>>    golang apps build as shared library; please see description of this 
>> commit 
>>    - 
>>    
>> https://github.com/cloudius-systems/osv/commit/438008362a8ef74666b4e44af4b3205b86a52d06
>>  
>>    for details.
>>    - => From the code, it was pretty clear the wrapper did not have any 
>>       adverse effects, but I sort of had to confirm it for the paper. I did 
>> find 
>>       the post you supplied, but didn't realize it was for TLS issues. 
>> Thanks.
>>       - You have not mentioned specifically, but I am guessing all OSv 
>>    images you built were with ZFS filesystem. Please note that even with 
>>    0.51.0 we had support for simple Read-Only Filesystem (ROFS). I think 
>> ROFS 
>>    is even better fit for microservice apps on OSv. You can built OSv images 
>>    with ROFS using latest mikelangelo capstan - 
>>    
>> https://github.com/mikelangelo-project/capstan/blob/master/Documentation/OsvFilesystem.md
>>    - => Yes, all images were using ZFS. Considering how small the 
>>       services are, I would guess they're kept entirely in memory so I'm not 
>> sure 
>>       how much of an improvement ROFS may be (not very familiar with that 
>> sort of 
>>       thing), but it's definitely interesting for real-life stateless 
>> services.
>>       - I must say I was a little astonished you were able to 
>>    successfully test golang apps built as --buildmode=pie. As you can see 
>> see 
>>    in https://github.com/cloudius-systems/osv/issues/352, OSv currently 
>>    does not support TLS (Thread Local Storage) in local-exec mode (you 
>>    probably saw this warning printed by OSv - "WARNING: XYZ.so is a PIE 
>> using 
>>    TLS..."). But apparently Golang apps built as pie do not use TLS or we 
>> are 
>>    just lucky. I was myself surprised I could run pie Golang apps like 
>>    httpserver without any issues.
>>    - => Yes, that warning popped on my screen plenty of times. But since 
>>       it worked, it didn't seem very important. 
>>       - I was surprised to hear about OSv scaling poorly under multi-CPU 
>>    tests. I have not really tested OSv much on Xen except for running it on 
>>    AWS EC2 instances so I do not know what the reasons for that might be. I 
>>    will ask on our mailing list. On other hand I must say that during casual 
>>    tests on my MacBook Pro with 4 hyper-threading i7 cores and Ubuntu 18.10 
>>    on, I was able to see OSv scale pretty well with QEMU/KVM (I believe type 
>> 2 
>>    hypervisor): 
>>       - for example I was able to see 50-60% performance increase when 
>>       going from 1 to 2 vCPUs 
>>       - also with 2 vCPUs I was able to see 10-15% better performance on 
>>       OSv comparing to the same app running directly on the Linux host which 
>>       shocked me a little I must admit
>>       - for my tests I was using this app - 
>>       
>> https://github.com/cloudius-systems/osv-apps/tree/master/golang-httpserver 
>>       - and the ab tool (Apache Bench) to simulate load
>>       - lastly I saw even better performance of a microservice written 
>>       in Rust (
>>       
>> https://github.com/cloudius-systems/osv-apps/tree/master/rust-httpserver
>>       )
>>       - => Actually, I ran some of the unikernels on VirtualBox first as 
>>          proof of concept. While VirtualBox gave comparatively disastrous 
>>          numbers (I guess it's not really the best hypervisor), the effect 
>> of 
>>          multithreading was the same there. At first, I thought my coding 
>> was to 
>>          blame, but the problem seems to persist across programming 
>> languages (with 
>>          varying impact). The only real lead I managed to discover was that 
>> most of 
>>          the requests under Go are in fact handled way faster in a 
>> multi-threaded 
>>          application, but a tiny percentage gets held up for up to a second, 
>> which 
>>          seems to block general throughput a bit. The tests used a rather 
>> large 
>>          connection pool (up to 40), so my best guess at the time was that 
>> the sheer 
>>          amount of concurrent requests coming in caused a few too many locks 
>> on 
>>          something in the scheduler, but I don't really have experience with 
>> that 
>>          level of programming. Since I didn't do any tests with 2 vCPUs, 
>> maybe 
>>          there's an issue larger vCPU pools? It does seem to be very 
>> language 
>>          dependent, and I'm still not clear on Go's behavior in that case. 
>> Of 
>>          course, OSv unikernels have excellent single core performance, so 
>> if the 
>>          scaling problem is fixed or was somehow due to the way I tested I'm 
>> pretty 
>>          sure they would demolish containers or standard Linux in any 
>> comparison.
>>          - I saw you mentioned that occasionally OSv would stop 
>>    responding during multi-cpu tests. Could you please elaborate on this? I 
>>    myself occasionally see a "lingering connection closed" issue where 
>> couple 
>>    of requests never come back when OSv gets fired with a long batch (>100K) 
>>    of requests by ab. However in my case if I simply restart the test with 
>> ab, 
>>    OSv will continue responding. Is this similar to what you saw?
>>    - => This sometimes happened when running Go unikernels without the 
>>       wrapper, and in all languages when they were multi-thread enabled. I 
>> don't 
>>       seem to have saved any screenshots or output, but as I remember all 
>>       requests got locked up (and eventually gave a timeout) and I had to 
>> restart 
>>       the entire test anyway because the results got invalidated by the gap. 
>> I 
>>       will try to get hold of the original test setup again to see if OSv 
>> still 
>>       responds if new requests are started. Note that it was pretty rare, 
>> since 
>>       it sometimes only happened after up to 10 million requests.
>>       - I would also like to point out that most container deployments 
>>    at least in public clouds (even AWS ECS offering) use virtual machines 
>> like 
>>    EC2 instances NOT bare metal computers. So the difference in performance 
>>    between containers and OSv that can run directly as a guest OS on EC2 VM 
>>    might be even more profound. Am I wrong? But that might not be fair to 
>>    containers ;-)
>>    - => Indeed, this was also mentioned a few times by reviewers and 
>>       attendees at the conference. However, I was trying to get as close to 
>> bare 
>>       metal for both unikernels and containers to make the comparison more 
>>       "fair". I guess in real life it's even easier for unikernels to win...
>>       - Have you considered testing Node.js microservices or ones 
>>    written in Rust? We also have a working example of running GraalVM Java 
>>    native images on OSv.
>>    - => This may happen in the future. The focus was put on Go, Java and 
>>       Python because Java and Python are commonly used in my research team, 
>> and 
>>       we're trying to switch some things to Go as well.
>>       
>> Lastly besides improving performance of multithreaded apps, what else 
>> would you want to be enhanced or improved in OSv? Interested in your 
>> thoughts. 
>>
>> Looking forward to a reply and sorry for my long email. Thanks in advance 
>> for your reply.
>>
>> My regards,
>> Waldek
>>
>> PS. Please follow us on Google groups (
>> https://groups.google.com/forum/#!forum/osv-dev) and on Twitter at 
>> #OSv_unikernel
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to