[solo5] Stepping down from maintaining Solo5 and MirageOS
Dear friends and colleagues, effective today, I am stepping down from both maintaining Solo5 and from the core team of MirageOS. - What does that mean? I will no longer monitor pull requests or issues filed on repositories in the Solo5/ or mirage/ organisations on GitHub, with the exception of the issue of appointing a new maintainer for Solo5. I am giving up general commit privileges for repositories in mirage/. Regarding maintainership of Solo5: Given that MirageOS is the primary downstream of Solo5, I would prefer that members of the MirageOS community step up to continue maintenance of Solo5. Failing that, perhaps there is someone in the wider community who is interested. I will file an issue to discuss this on GitHub shortly. - Why? I burnt out on Solo5, MirageOS and related projects some time in the spring of this year. I then proceeded to take a 6 month-long sabbatical, with the somewhat arbitrarily timed intent of returning to work on Solo5 and MirageOS in October this year. Over the course of the past month, I have found myself unable to so much as merge existing PRs in Solo5 and their dependencies for the upcoming MirageOS 4 release. Regardless of whether this means I have yet to fully recover from my burnout, or that as part of recovery I must move on to new projects and different challenges, it would be irresponsible of me to cling to my "ownership" of Solo5 and/or maintainership of dependent MirageOS projects any longer, and block the continued work of others. - Why are you telling us this? This was a hard call, as was the call to write about it publicly. I believe in being open and transparent with the community, and being so helps me to get a sense of closure on my decision. - I have other questions! Please feel free to send them directly to me. With thanks to @yomimono for inspiring the general structure of this e-mail, and seeing that she wrote something similar publicly and survived :-) Looking forward to a time when I can get excited about work again, whatever that may be, Best, -mato
[solo5] Solo5 v0.6.8 released
Hi, Solo5 v0.6.8 is now available. From the release notes: ## 0.6.8 (2020-12-17) Bug fixes: * xen: Fix corrupted console output by correctly waiting for xenconsoled to consume all data. (#490) * hvt: Free kvm\_cpuid after use. (#485) New features: * xen: Add multiboot support. (#482) Acknowledgements: * Thanks to Marek Marczykowski-Górecki (@marmarek) for implementing multiboot support for the Xen target. And, since I forgot to send an announcement for v0.6.7 to the list, here are the release notes for that release: ## 0.6.7 (2020-10-08) Bug fixes: * spt: Fix a bug where libseccomp's seccomp\_load() would cause the tender to be killed by SIGSYS due to calling free(), which would under certain conditions call sbrk(), which is not in our seccomp filter. Work around this behaviour by exporting the generated BPF filter and loading it into the kernel manually. (#479) * Various improvements to the Xen bindings for use by MirageOS. (#480, #476). Enjoy, -mato
[solo5] Solo5 v0.6.6 released
Hi, Solo5 v0.6.6 is now available. From the release notes: New features: * This release adds minimal bindings for the Xen hypervisor. Hosts running Xen 4.10 or later on x86\_64 are supported, and the bindings build Solo5 unikernels as PVHv2 domUs only. These bindings are not a full Solo5 target; they exist for the purpose of providing low-level bootstrap code to MirageOS and do not provide any network or block I/O functionality. Bug fixes: * genode: Do not yield for expired deadlines. (#466) This release is part of the ongoing work to replace the MirageOS Xen platform stack with a new, legacy-free codebase. See [1] for details, along with instructions on how to test the work in progress. Enjoy, -mato [1] https://github.com/mirage/mirage/issues/1159
[solo5] Solo5 v0.6.5 released
Hi, Solo5 v0.6.5 is now available. From the release notes: New features: * Stop host kernels from attempting to execute Solo5 binaries. This improves both the user experience on some hosts (e.g. "No such file or directory" vs. "Segmentation fault" on Linux) and overall security posture by forcing the host kernel ELF loader to bail out earlier rather than actually jumping to the unikernel code. (#442) * hvt: Full support for W^X and correct guest-side page protections on OpenBSD 6.7+ systems with EPT. (#447) * hvt: capsicum(4) sandbox for the hvt tender on FreeBSD 12+. (#366) Bug fixes: * hvt: Fix hang in `HVT_HYPERCALL_POLL`. On Linux hosts, if `solo5_yield()` was called with a deadline that has already passed and the unikernel was not using any network devices then the underlying hypercall would hang forever. Not known to affect any existing code in production. (#460) Other notable changes: * muen: Muen ABI updates, now uses ABI version 2 on the Solo5 side. Muen commit f10bd6b or later is required. (#454, #448) * genode: Support for Genode is limited by toolchain issues and Genode bindings are no longer built by default. (#446, see also ocaml/opam-repository#16368) * Improvements to the build system on BSD/clang hosts. System headers (sys/endian.h, osreldate.h) that were mistakenly being installed into the Solo5-provided include paths have been removed. For OCaml/MirageOS users, ocaml-freestanding 0.6.0 or later is now required. (#453, #455, #457, #461, see also mirage/ocaml-freestanding#77) * Improvements to built-in self tests. (#451) * Fix build failures with GCC >= 10. (#459) Known issues: * Full W^X support / correct guest-side page protections are currently only available on the "spt" target on Linux, and the "hvt" target on OpenBSD 6.7 or later. (#303) * On OpenBSD, "hvt" operation with multiple network devices results in packet loss. This appears to be a bug in kqueue(2) but we have no confirmation from upstream. (#374) * virtio-net is not functional on at least QEMU 5.0 and possibly earlier versions. QEMU versions up to and including 3.1.0 are known to work. (#463) Acknowledgements: * Thanks to Adam Steen (@adamsteen) for pushing for OpenBSD kernel support for manipulating guest EPT mappings, bringing full W^X to hvt on OpenBSD 6.7 or later. * Thanks to Adrian-Ken Rueegsegger (@kensan) for the Muen updates. * Thanks to Anurag Soni (@anuragsoni) for diagnosing and fixing the build on systems with GCC >= 10. * Thanks to Hannes Mehnert (@hannesm) for diagnosing #460 and for help with testing BSD/clang build system changes and generally helping out. * Thanks to Stefan Grundmann (@sg2342) for the capsicum(4) hvt tender sandbox on FreeBSD. For MirageOS users, this release should be available in OPAM within 24 hours, barring any major showstoppers found by CI. Enjoy, -mato
[solo5] Solo5 v0.6.4 released
Hi all, Solo5 0.6.4 is now available. This release updates the Genode and Muen bindings, and fixes the following notable issues: * Disk images produced by solo5-virtio-mkimage were not bootable due to changes in recent SYSLINUX versions (thanks to Ricardo Koller, @ricarkol for finding the fix). * Build failure on FreeBSD 12.1+ (clang 8.0.1) due to issues with TLS in the toolchain. New features: * "configure.sh" has a new "--only-tools" option, which builds no tenders or bindings. In conjunction with "make install-tools", this can be used to install "solo5-elftool" only. This is intended for downstream systems which need to make use of "solo5-elftool" to query manifest information. * A "scripts/opam-release.sh" tool has been added to generate OPAM repository metadata as part of the Solo5 release process. Apart from that, this release contains documentation updates and the "hvt" tender on the aarch64 architecture is now considered production-quality. Regards, Martin
[solo5] Solo5 0.6.3 released
Hi all, I have released Solo5 0.6.3: This release fixes a build problem with Linux distributions such as OpenSUSE which install libseccomp headers into subdirectories of /usr/include. Note that this introduces pkg-config as a dependency for building Solo5 on Linux. No functional changes. Martin
[solo5] Solo5 0.6.2 released
This release fixes further OPAM/MirageOS installation problems found by CI, and the following build problems: * spt: Support ppc64le on RedHat (thanks to Stefan Berger, @stefanberger). * Fix check/warning for libseccomp >= 2.3.3 (thanks to Mechiel Lukkien, @mjl-). No functional changes. Martin
[solo5] Solo5 0.6.1 released
Why make one, when you can do two releases in a day! This release fixes OPAM/MirageOS installation problems found by CI. No functional changes. Additionally, the following entry was missed from the changes for 0.6.0: * Solo5 tenders, tools and bindings now have an embedded version number, using the well-known convention "v0.6.0-4-gc9786d87". Bindings will log the version number during start-up. As a consequence of this, Solo5 now needs to be built from either a Git tree or a tarball produced by "make distrib". Suitable tarballs are uploaded to Github during the release process. Cheers, Martin
[solo5] Solo5 v0.6.0 released
Dear all, I've just released Solo5 version 0.6.0. Highlights from the (long!) changelog: This is a major feature release which introduces the concept of an "application manifest", which enables support for multiple network and block devices. This release removes the compile-time specialization of the "hvt" tender. While this was a nice experiment, it is not practical for real-world deployment scenarios where it is expected that the party supplying the tender (i.e. the operator / user) will be different to the party supplying the unikernel (i.e. the developer). Due to these and other changes, both the public Solo5 APIs (as defined and documented in solo5.h) and internal tenders/bindings ABI have changed. Likewise, the build process for Solo5-based unikernels has changed, and downstream projects will need to be updated. Please refer to the full CHANGES.md and the Solo5 documentation for details. MirageOS/Solo5 releases will be coming over the next few days. Enjoy, Martin
Re: [solo5] OpenBSD and spt
Hi Adam, On Friday, 26.04.2019 at 06:00, Adam Steen wrote: > Good Afternoon all > > Is there any appetite for a cross platform support in solo5 SPT? It's certainly something that's possible, but the state of the spt code in general is a bit too raw to consider abstracting it to target different host kernels just yet. Also, I don't think there is as much of a pressing need for spt as on Linux -- I'd expect most people running OpenBSD on amd64 to be doing so on bare metal, i.e. with access to the CPU virtualization hardware, where you can just run hvt. > The system call restriction would be very easy, and I expect the other code > to compile with very little changes. What would you use for implementing the syscall restrictions? The granularity of pledge(2) is different to that of seccomp/BPF... -mato
Re: [solo5] OpenBSD Support
On Wednesday, 20.03.2019 at 10:29, Martin Lucina wrote: > Hi Adam, > > On Wednesday, 20.03.2019 at 05:54, Adam Steen wrote: > > Hi All > > > > With the soon to be released OpenBSD 6.5, having Opam 2.0.3 and OCaml > > 4.07.1 as packages, I thought i would open discussion about which release > > of OpenBSD should be supported by Solo5. > > That decision is mainly up to you as the de facto maintainer of the OpenBSD > support. > > > I was thinking it should be the latest stable release, ie as of the release > > of 6.5, Not Current, it just seems too much of a moving target. > > Personally I'd keep it simple (and reduce the amount of work involved), and > stick to supporting 6.5 *only*. Actually, we should continue to support 6.4 as that now has CI (added this week, finally!) and then 6.5 only after it's released. It's impossible for me to track -current for CI. IOW, we should stive to support the current releases (N, N - 1). -mato
[solo5] Heads up: Build system refactoring and hvt changes
Hi all, I've just merged a significant change on master in #345 which affects both users and developers going forward. I'm sending the notes from the commit log out here in lieu of updating documentation, as I'll be offline for the next two weeks starting today. As a pre-cursor to this change, the Solo5 CI has been updated with end to end tests using MirageOS in #340, and OpenBSD 6.4 has been added to the CI (currently failing tests, may be due to a misconfiguration on the CI node, no time to investigate right now). Those of you who have outstanding PRs, I'd appreciate it if you rebase those on master while I'm away, this will also ensure they get tested properly using the new CI additions. Thanks, Martin Changelog from #345 follows: commit 2f1cae37d8b9f97826ea08934558a4360bdb13da Author: Martin Lucina Date: Thu Feb 21 16:26:49 2019 +0100 Refactor build system, remove hvt compile-time specialization This is a large change, and the majority of it is effectively a full re-write of the build system. The actual removal of hvt compile-time specialization is fairly straightforward. User-visible changes: - 'configure.sh' now needs to be run manually before running 'make', this is consistent with POLA. - conversely, 'make clean' no longer cleans Makeconf. Use 'distclean' or 'clobber' for that. - 'configure.sh' will now print the targets that can (will) be built on this system. The strategy is still "build everything we can", however I have disabled Genode on all systems except Linux due to toolchain issues. - You can now build a subset of targets from the top-level 'make', by specifying 'CONFIG_XXX=' (disable) or 'CONFIG_XXX=1' (enable) either on the command line, or editing the generated Makeconf. - Makefiles use silent rules by default. To get the old verbose ones back, use 'make V=1'. - The 'solo5-hvt' tender is no longer "specialized" to the unikernel. We build two tenders, 'solo5-hvt' with all non-debug modules configured and 'solo5-hvt-debug' with additional debug modules (gdb, dumpcore where available). - 'solo5-hvt-configure' is kept around for now for backward compatibility with OPAM/MirageOS but is essentially a NOP. Developer-visible changes: - The build system now has proper support for auto-generation of dependencies. This means you can safely edit source files, run make and be sure you will get a complete incremental build. - Makefiles have been refactored to use common best practices, remove repetition, consistent variable names and clear interfaces between configure.sh/Makeconf/Makefiles, all the while keeping them simple enough to understand for me on a Monday morning before coffee. I.e. limit use of macros, eval, etc. - hvt tender modules are no longer defined by compile-time flags, instead a dynamic array is placed into a special ELF section (.modules). This means that a hvt tender binary can be combined from an arbitrary set of hvt_module_XXX object files, which is the right way to do things going forward and also simplifies the build system (not needing to build multiple targets from the same set of sources). Shortcomings / TODOs: - Dependency files (*.d) are stored in-tree. I spent several days on trying to figure out how to get them to work out of tree, but in combination with the non-recursive use of subdirectories in 'bindings' I could not figure out the required Makefile magic. - HVT_DROP_PRIVILEGES=0 is non-functional with the new modules arrangement, but needs a re-design anyway. Other changes included as part of this PR: - Revert privilege dropping on FreeBSD (see discussion in #282). - The build system changes effectively implement option 1 in #292, i.e. on x86_64 -m no-red-zone is only used for bindings, not for application code. - tests/tests.bats has been refactored for DRY as it was getting totally unmaintainable.
Re: [solo5] Multiple consoles/Debug API, result type
Hi Daniel, On Sunday, 24.03.2019 at 16:26, m...@daniel-mendler.de wrote: > I have another feature request, which came up while working with solo5. [...] I'm leaving on a logistically somewhat complicated vacation in four days time and will be offline until the 15th of April. As there are things in progress (such as the build system refactoring / de-specialization of hvt in [1]) that I'd like to get merged before I leave, I'd like to ask that we defer these discussions until after I get back. I don't have the mental bandwidth right now to both hold a coherent high-level design discussion and complete my immediate priorities :-( To be clear: I appreciate your suggestions and I'm not against adding features, or re-thinking the Solo5 APIs, but it needs to be done calmly and thoughtfully -- as the maxim goes, "no is temporary but yes is forever". In the mean time, if you've not yet listened to my Solo5 talk from this year's FOSDEM, I'd appreciate it if you'd take the time to do that. You can find the talk at [2]. This will hopefully make the overall project goals clearer. Thanks, -mato [1] https://github.com/Solo5/solo5/issues/326 [2] https://fosdem.org/2019/schedule/event/solo5_unikernels/
Re: [solo5] Memory management
On Thursday, 21.03.2019 at 15:54, m...@daniel-mendler.de wrote: > Hi Martin, > > > I am not sure about the "dynamic part". I would either set the stack > > from outside (as discussed above), or allow some kind of memconfig > > call, which can be executed only once after application startup. > > Alternatively you could provide munmap and munmap_done. Dynamic people > > would just never call munmap_done... > > I thought a bit more about this idea of adding a munmap call and I don't > really like it. > Maybe the goal should be extended to allow multiple different memory areas, > randomly distributed in the address space to leverage ASLR. One of those ASLR is a separate topic in itself, see here for a rough plan of what needs to be done: https://github.com/Solo5/solo5/issues/304 > memory areas is the stack then. This could work either by configuring the > memory areas statically from outside or dynamically inside the application. > The application should start with a fixed small stack size. To allocate heap > areas or a larger stack, solo5 should simplify offer solo5_mem_alloc, which > allows to allocate page sized heap blocks until the available heap memory is > exhausted. No, this has already been discussed before. Dynamic memory allocation is not on the cards. Most recently https://github.com/Solo5/solo5/issues/335#issuecomment-472499246 and earlier https://github.com/Solo5/solo5/issues/223. Providing just a "munmap()" and nothing else might be a simpler way to get guard pages, or it might not. Anyway, one thing at a time. As I mentioned in my other email, lets ignore multiple stacks for now and just concentrate on how a separate stack region could be done. Martin
Re: [solo5] Porting a runtime to Solo5
Hi Daniel, On Thursday, 21.03.2019 at 10:29, m...@daniel-mendler.de wrote: > I have one unrelated question - the Solo5 API exposes the heap as one > continuous block of memory, where the stack is at the top. Is there some way > to ensure that the growing stack does not overwrite the heap, by using some > red zone pages? Not at the moment. > I can manually limit stack growth by limiting recursion etc, > but I would prefer to have physically enforced guarantees. Would it make > sense to change the API in such a way that stack and heap are kept fully > separate? For example at compile time a fixed stack size is specified and > the stack is kept at a different location from the heap. The Solo5 API > should then pass heap_start, heap_size, stack_start, stack_size to the > application. This is a good idea and I've already thought about it several times. In general, it should be possible to implement in a way that is portable, with fallback to using a single memory region under the hood for targets where it's either not possible, not worth the implementation complexity or just not done yet. On such targets you'd not get a red zone, but your application would still run. However, there are two things to consider: 1) How is the default stack size determined / who gets to choose it and when? At the moment things are nice and simple, e.g. for 'hvt' you just run with "solo5-hvt --mem=XYZ ..." which gives you an easy way to say how much *maximum* resource you want to commit. *Requiring* users to specify a separate stack size up front at run time seems like exposing an unnecessary detail, not to mention that a lot of people will not know what a reasonable value is. 2) I can imagine that the IncludeOS people on this list will immediately pipe up with "Ha! Now we want multiple stacks, *dynamically allocated at run time*, each in its own region" which opens up its own can of worms entirely. An option I thought of which addresses the second point is extending the Solo5 API to provide a single virtual-memory manipulation call, specifically the equivalent of munmap(). The call would be bounds-checked against the heap region, so attempting to touch anything else will just fail. This would make guard page setup the responsibility of the libOS (ocaml-freestanding in the Mirage case) and allow for "carving up" your memory region as you see fit. The main issue I have with that is that for 'spt' (the seccomp-based unikernel-as-user-process target on master) it opens up (some) access to the host munmap() system call. Now, there have been local privilege escalation bugs on Linux involving "only" munmap() and clone(). Granted, we don't allow clone() but who's to say there won't be one involving "only" munmap() and poll() :-/ So, it's unclear what the way forward is here... all ideas welcome. Martin
Re: [solo5] OpenBSD Support
Hi Adam, On Wednesday, 20.03.2019 at 05:54, Adam Steen wrote: > Hi All > > With the soon to be released OpenBSD 6.5, having Opam 2.0.3 and OCaml 4.07.1 > as packages, I thought i would open discussion about which release of OpenBSD > should be supported by Solo5. That decision is mainly up to you as the de facto maintainer of the OpenBSD support. > I was thinking it should be the latest stable release, ie as of the release > of 6.5, Not Current, it just seems too much of a moving target. Personally I'd keep it simple (and reduce the amount of work involved), and stick to supporting 6.5 *only*. Cheers, -mato
Re: [solo5] Plans to support WebAssembly as a target
Hi Jonathan, (meta: You're not subscribed to the list, so your message got moderated.) On Sunday, 17.02.2019 at 10:07, Jonathan Beri wrote: > I enjoyed the recent presentation given at Fosdem 2019 > <https://fosdem.org/2019/schedule/event/solo5_unikernels/> given by Martin > Lucina & Ricardo Koller. At the end, Martin mentioned an idea to support > WebAssembly as a target. How developed is the idea and has any progress > been made in that direction? At this stage it's just an idea. I ran out of time towards the end of the talk, so, to elaborate a bit on this: - This is about using WebAssembly as a way to get arch-independence for Solo5-based unikernels. I.e. run as a component on a server with something like wasmjit, not in a browser. - I've not looked into WebAssembly in detail at all. From my possibly naive understanding, it should be relatively easy to use something like an Emscripten-based cross toolchain to produce Solo5 WebAssembly "binaries" (whatever they are). - The Solo5 design mandates that you get no more executable pages after loading the unikernel binary. This precludes the use of a JIT. While others have asked [1] to relax this, I'm very wary of doing so as it significantly reduces "security posture" and generally goes against the design goal of keeping the system "static". Having said that, it looks like the "hack" described in that issue will have to go in in the short term. Possible solutions to this: 1. Either keep the "no more executable pages" rule, which implies an AOT compilation step at/around deployment time (from wasm -> native code). 2. Or, figure out a minimal, as-safe-as-possible interface at the Solo5 layer which allows a JIT to function while *enforcing* W^X on the various targets. I'm not sure this is possible, as it would imply e.g. in the 'spt' case allowing at least mprotect() in the seccomp filter which is a fairly wide attack surface. If you know more about how JITs work, I'd be interested in some resources on how the transition from "writable" to "executable" memory is managed internally by the JIT. - Then there's the question of what the resulting architecture (tender/bindings) would look like, and what the interfaces to the "outside world" are in a wasm scenario and so on. However, figuring out the previous point is probably the biggest hurdle. [1] https://github.com/Solo5/solo5/issues/321 Hope this helps, Martin
Re: [solo5] Solo5 talk at FOSDEM this Sunday
Hi all, On Wednesday, 30.01.2019 at 20:03, Martin Lucina wrote: > If you're at FOSDEM, hope to see you there. For those not attending, I'll > follow up with links to the recording after the event. A recording of the talk is now available at the event page: https://fosdem.org/2019/schedule/event/solo5_unikernels/ Direct download: https://video.fosdem.org/2019/AW1.121/solo5_unikernels.webm https://video.fosdem.org/2019/AW1.121/solo5_unikernels.mp4 Martin
[solo5] Solo5 talk at FOSDEM this Sunday
Hi all, Ricardo Koller and myself will be presenting a talk on Solo5 at FOSDEM, this coming Sunday (Feb 3rd). >From the abstract: Solo5 is a microkernel friendly, sandboxed, re-targetable execution environment for unikernels, with a taste for minimalism. We will start with an overview of core Solo5 concepts and present the interfaces it offers to the unikernel/library operating system/application developer. Using existing library operating systems, such as MirageOS, we will demonstrate the developer experience for various Solo5 targets, going on to show how rigorously applying minimalist principles to interface design is used to our advantage, blurring traditional lines between unikernels, processes, kernels and hypervisors. We will conclude with some lessons learned during development of Solo5 thus far, and present ideas and challenges for future development. Link to the event: https://fosdem.org/2019/schedule/event/solo5_unikernels/ If you're at FOSDEM, hope to see you there. For those not attending, I'll follow up with links to the recording after the event. Martin
Re: [solo5] Re: Setting IP address of the guest
On Wednesday, 16.01.2019 at 16:55, Dávid Kovács wrote: > I looked around a bit and found that the virtio feature bitmap does not > have anything similar for IP to what you use there with MAC addresses. Am I > right there? This whole thing might just be a stupid question so feel free > to tell me if so. :) Sorry, I don't understand your question. virtio-net is an L2 device, so there's no IP involved. -mato
Re: [solo5] Setting IP address of the guest
Hi, On Tuesday, 15.01.2019 at 17:01, Dávid Kovács wrote: > Hi! > > I would need a way to set the IPv4 address an unikernel when starting it, > is there a way to do this? If it is not implemented than how would I go > about implementing it? Just try and follow whatever you guys did with mac > addresses? That depends entirely on the unikernel/libOS you are using in the guest. At the Solo5 layer we are only concerned with providing an L2 Ethernet-style interface to the guest. For example, with a MirageOS unikernel such as the device-usage/network example from [1], the network configuration is passed on the Solo5 tender command line. So, to set a static IPv4 address of 10.10.10.10/24 and a gateway of 10.10.10.1 you could use something like the following: $ ./solo5-hvt --net=tap100 ./network.hvt -- --ipv4=10.10.10.10/24 --ipv4-gateway=10.10.10.1 Hope this helps, Martin [1] https://github.com/mirage/mirage-skeleton
[solo5] Solo5 0.4.1 released
Hi all, I'm happy to announce the release of Solo5 0.4.1. This release introduces experimental support for the Genode Operating System Framework as a target for Solo5-based unikernels. Instructions for using this target will be forthcoming from the Genode folks shortly once they are ready. The one other user-visible change is the addition of support for dropping "root" privileges in the "hvt" tender on FreeBSD and OpenBSD. On OpenBSD the tender is further deprivileged using pledge(2). The Linux behaviour has not changed, we still recommend that you do not run the tender as "root" but do not enforce this in any way. Whether or not this will change is yet to be decided, please see https://github.com/Solo5/solo5/issues/282 for discussion. Other changes: * Migrate OPAM integration to OPAM 2. Acknowledgements: Thanks to the following new contributors to this release: * Emery Hemingway * Stefan Grundmann Enjoy, -mato
[solo5] "Unikernels as Processes" paper
Hi all, I'm happy to announce that our ACM SoCC 2018 paper entitled "Unikernels as Processes" is now publicly available at https://dl.acm.org/citation.cfm?id=3267845. The paper by Dan and Ricardo of IBM Research, Nikhil of BITS Pilani and myself presents the central tenet that the host attack surface/TCB of a Linux seccomp-sandboxed unikernel is comparable to, or, depending or your evaluation metric, better than that of hardware virtualization sandbox. Dan & Ricardo's implementation of a seccomp tender for Solo5 is available today as part of the "nabla containers" project at https://github.com/nabla-containers/solo5. I plan to work together with Dan & Ricardo on upstreaming this code to Solo5 next month. Regards, -mato
[solo5] Solo5 0.4.0 released
Hi all, I'm happy to announce the release of Solo5 0.4.0. ## 0.4.0 (2018-09-14) This release is a major restructuring and renaming of Solo5 components, primarily to reflect that the "ukvm monitor" is no longer specific to the KVM hypervisor and to allow for future development of further targets and tenders enabling different sandboxing technologies. Major changes: * `kernel/X`: Moved to `bindings/X`, now referred to as the "Solo5 _bindings_ for X". Build products are now named `bindings/X/solo5_X.o`. * `kernel/solo5.h`: Moved to `include/solo5/solo5.h`. * _ukvm_: Target has been renamed to _hvt_. Monitor code is now referred to as the hvt _tender_ and has been moved to `tenders/hvt/`. * `ukvm-configure`: Now named `solo5-hvt-configure`. * `ukvm-bin`: Now named `solo5-hvt`. * `ukvm/ukvm_guest.h`: Renamed to `include/solo5/hvt_abi.h`. * Generated VM names used on FreeBSD and OpenBSD have been changed from `ukvm%d` to `solo5-%d`, with `%d` being the PID of the `solo5-hvt` tender. * Core file names produced by the _hvt_ dumpcore module have been changed from `core.ukvm.%d` to `core.solo5-hvt.%d`. * `solo5-run-virtio` and `solo5-mkimage`: Renamed to `solo5-virtio-run` and `solo5-virtio-mkimage` respectively. * OPAM packages used by MirageOS have been renamed from `solo5-kernel-X` to `solo5-bindings-X`, accounting for the change from `ukvm` to `hvt`. Full details of the impact of this change on existing Mirage/Solo5 installations will be provided separately as part of a MirageOS release. For further details please refer to the discussion and commits merged as part of #274. Other changes: * Update OpenBSD requirements to 6.4 and minor OpenBSD build fixes (#270, #273). Best, -mato
Re: [solo5] Is dumpcore enabled
On Wednesday, 15.08.2018 at 05:00, Adam Steen wrote: > Hi Martin > > I have complete the pull request [1], and was looking for further discussion > from anyone who is interested. Thanks, I'll look into it, but it might take a week or so, currently concentrating on the renaming/refactor for #172. It might be easier to do this after that is done as all the internal API names will change. Cheers, -mato
Re: [solo5] Is dumpcore enabled
Hi Adam, apologies for the delayed reply, I've been on vacation. On Sunday, 22.07.2018 at 22:36, Adam Steen wrote: > Hi All > > After a quick discussion with Hannes and attempting to implement coredump on > OpenBSD[1], i found there was no easy way to determine if use_coredump was > true. ie coredump was enabled., > > I had to remove a tight pledge, privilege drop and chroot just to enable > coredump, i wasn't sure it was worth it. Hannes suggested turning the > security changes on or off if coredump was enabled, but we both couldn't find > a easy way to do this. > > So i am posting here to get the discussion started about, what we can do to > make this easier. > > The only things i could come up with, with the current code, was to disable > the security if the dumpcore module was compile in (available). see [2] and > [3]. > > I wanted to raise this discussion not around my coredump code but about > determining which module(s) are enabled in the ukvm main/system code This is a good question. With the current model of determining modules to enable at compile time, I would use the same approach for determining whether or not to "drop privileges" (not sure what best to call this functionality, suggestions please?). Specifically: 1. Add a compile-time #define, e.g. UKVM_DROP_PRIVILEGES. In a suitable header, say ukvm/ukvm.h since that is included by all modules, define this to 1 if not already defined. I.e. default to dropping privileges. 2. ukvm-configure can then manually add -DUKVM_DROP_PRIVILEGES=0 to CFLAGS if dumpcore has been requested. 3. If UKVM_DROP_PRIVILEGES=1, you get the current behaviour in your code. 4. If UKVM_DROP_PRIVILEGES=0, privilege dropping is disabled *AND* ukvm prints a stern warning to this effect at startup, including something about not being recommended for production, etc. Separately from this, and with a view to adding some amount of privilege dropping by default on other systems besides OpenBSD, I think that: 1. The privilege dropping code should be moved into its own top-level function, e.g. ukvm_hv_drop_privileges(), which goes into ukvm_hv_.c. 2. This function is clearly called from ukvm/ukvm_main.c, just before entering the VCPU loop(?). This would also be where the #if printing the warning if disabled (see (4) above) goes. As for what privileges should exactly be dropped on other OSes by default, I need to think about that a bit more and will follow up during the week. However, this should be enough to get you started. Would this approach work for you? Any other opinions? Cheers, -mato
Re: [solo5] Solo5, MireageOS and Write and Exec Memory
Hi Adam, On Wednesday, 20.06.2018 at 22:07, Adam Steen wrote: > Hi All > > As some of you know I have been working to get MirageOS/Solo5 working on > OpenBSD. > > As of Friday of last week, I though i had at least achieved this, but > after running an end to end test with released version the of > Mirage-Skeleton Hello World Tutorial, I now find this causes as " > mprotect W^X violation" on OpenBSD. That's err, odd. Do you get a "Warning: phdr[N] requests WRITE and EXEC permissions" when running ukvm-bin? > I know Solo5 does not issue any mprotect requests with WRITE and EXEC > permissions, but something in MirageOS does. I have be testing building > the Hello World example for a while now without any problems of this > nature, i am not sure where to start look at what changed. > > I can permit W^X memory on OpenBSD with a change to the Solo5 configure > script and a file system setting, but this has now missed the boat for > this release and would prefer not to do this. That would be just papering over the problem. As for fixes, it'll be while yet before the post 0.3.0 renaming and restructuring gets done, I can easily cut a point release before then for trivial fixes (no ABI changes). > > Any tips on where i should look? When you write "causes a mprotect W^X violation", how does that actually manifest itself at run-time? Does ukvm-bin get killed? Or some other way? I've looked at the build products (on Linux) for both the Solo5 standalone tests and the mirage-skeleton tutorial/hello unikernel with "readelf -l", and I don't see any phdrs asking for W and X protections at the same time, so my guess would be something in the OpenBSD build is acting up (again). -mato
Re: [solo5] Heads up: Release and code freeze plans
On Friday, 08.06.2018 at 17:21, Martin Lucina wrote: > An update on this: > > After discussing with some of the core Mirage folks and others at robur, > we've come to the conclusion that it is better to release what we have now > rather than wait after the renaming is complete. > > So, the rough plan is to: > > 1. Decide on a "flag day" and rewrite Git history on master to sort out > the issue of my mis-attributed commits, > 2. Cut a "minimalist" Solo5 0.3.0 release of what we have now on master, > and subsequently the Mirage/Solo5 integration bits (as pointed to by the > OPAM repository in Solo5/opam-solo5). > 3. After that, continue with the renaming and restructuring of the > codebase as outlined in #172. > > I'll follow up in more detail on Monday. And another update. Point 1 above is now moot as I have decided to just push an empty commit with "errata", referring to the mis-attributed commits in question. I will proceed with point 2 over the course of this week. Thanks, -mato
Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm
Hi Nikhil, On Tuesday, 12.06.2018 at 13:47, nikhil ap wrote: > > I'm not sure what you mean. What selection of modules gets complied > > in/enabled for a tender would be up to the operator of that tender to > > determine as a policy decision. The tender would then, based on > > interpreting the binary's manifest, determine whether or not it > > "can/will/is allowed to" launch the (separately supplied) unikernel. > > > > Ok. I had thought we would compile-in all the modules for the tender. > You are suggesting that if the operator only requires the net module, > he will configure the tender by running tender-configure net > which will only compile-in the net module and will feed the unikernel image > with the manifest to this tender. Well, what I think we should do is provide a default configuration, which operators can trim down / extend as they see fit. With a move away from compile-time coupling this would be done by a toplevel "configure.sh" which would replace the per-tender script ("ukvm-configure") that we have now. Again, the specifics of how this will actually work are yet to be determined. > Also, I was thinking once you are done with the re-naming, we could have a > call to discuss and conclude on a design? What can be done for an initial > phase? > What tooling can we provide taking into the account the unikernels we > support, etc. > Others could join as well and at the end we should be able to document the > design. Thoughts? Speaking from experience a video call is the worst possible format to discuss designs. We should either do this asynchronously here / on GitHub or organise a workshop in person. > Another thing is since this is mostly a configuration based change, I can > still > come up with a proposal for multi-nic assuming we've loaded the manifest > and > determined how many NICs we need. I can do a write-up on what are the > changes > that are required for the tender-binding-application. Is this fine? All in good time. Lets get the release and renaming out of the way first, then we can discuss what happens next. -mato
[solo5] Correcting my mis-attributed commits
All, as I mentioned in my email regarding release and code freeze plans, I had mistakenly committed several commits earlier this year using an incorrect former employer's email address. Rather than go through the pain associated with rewriting public Git history, given that this is essentially an "administrative change" with no other purpose than to correct the public record, I have instead pushed an empty commit today to the Solo5 master branch correcting the error and referring to all the individual commits in question. For the record, full text of the commit follows. --- cut here --- commit 0bea597637ec2beada2ad1e4df4d94875f0e257e Author: Martin Lucina Date: Mon Jun 11 15:10:44 2018 +0200 Correct mis-attributed commits The following 12 commits, performed between Thu Jan 25 16:26:05 2018 +0100 and Wed Mar 28 18:46:31 2018 +0200 have an incorrect Author and/or Committer of "Martin Lucina ". This commit is an errata to reflect that the correct Author and/or Committer for these commits is "Martin Lucina ". 173869df7aa2827f273386b01f6e1627d76d1de9 6994d79105286bdad4973f165e3765b799d1eafc 3918be0b64fa25f3cd9af21cec5f7f2c2f843fb1 956db83dbbd973224b9f8b38ec04ee7baa55d863 8643dee74acd07ef558a0fa60dfc862500a6d4fd 82697d6d8bbddf27aaa904c72970173b2a4911a5 f2c2b821b8949a0cbb5d65185645c5c887240792 88778700d9cb431d4408f60ae2d9e2a4575973c7 cd266928e4de28828d517ca9860f80d966c60cee 1f0c4e925489f094c807fa34f1a86151c10ab7cc e944f29421b96102a94e7a07a6a7ab025f1c9d7b 7c8674e9c5881db19adef10dea41f1aae4e12da3 --- cut here --- Martin
Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm
Hi Nikhil, On Monday, 11.06.2018 at 16:10, nikhil ap wrote: > I'm not familiar with MirageOS. I need to look at how it works. > Although, this implies that we need to change the tooling for every > unikernel we support. That's correct. > For ex, MirageOS, IncludeOS, Rumprun etc. Also, whenever we need to add > configuration parameters/resources > to the manifest, we would have to change this everywhere to reflect this. > Is it possible to have > some kind of stand-alone tool which the other unikernels can trigger during > the configuration time > to be able generate the binary with the manifest? Yes, obviously the goal would be to have easy to use tooling available to assist the various guest unikernel build systems to generate the manifest. Integrating it into the final unikernel binary should just be a case of passing the resulting object to 'ld'. > > If we split the manifest and associated artefact, then this has several > > disadvantages: > > > > 1) The user/operator now has to keep track of two files. Or we have to ship > > as a .tar/other container, in which case we may as well just embed the > > manifest in the ELF binary. > > > > 2) The coupling is much looser, so the potential for messing things up is > > accordingly higher. > > > > If we combine the two in a single artefact, with the associated build-time > > runes on the libOS side being generated by the build system (i.e. the > > developer does not create a manifest "manually") then these problems go > > away. > > > > I guess an advantage of using a separate configuration file would be that > we > can compile only the module that is required. I'm not sure what you mean. What selection of modules gets complied in/enabled for a tender would be up to the operator of that tender to determine as a policy decision. The tender would then, based on interpreting the binary's manifest, determine whether or not it "can/will/is allowed to" launch the (separately supplied) unikernel. > > By the way, you mentioned JSON. JSON is absolutely unsuitable for this > > purpose; given that the (untrusted) manifest needs to be interpreted by > > e.g. a (trusted) tender, the format needs to be designed in a way that > > > .. a way that? Would like to know this ends :) ...that allows for secure deserialization of untrusted data? See here for an (old) comparison of some formats: https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html Personally I'm very much in favour of minimalism, so there's also the possibility of just inventing a custom binary format tailored to only the information we need to store in the manifest. I don't yet know which approach is best in the long term. Note that we should be able to version the manifest as a whole, so we could start with a simple custom format and move to something more involved later if there is a need. > > I thought of Json because IncludeOS uses json to configure the > resources and specify them to Qemu. > > An example: > { > "image": "test_router.img", > "mem" : 320, > "net" : [{"device" : "virtio"}, >{"device" : "virtio"}], > } > > So in order for us to be able to support IncludeOS we would have to have > their > build system (cmake) parse the json file and emit a tender binary with the > manifest. No, the manifest is part of the change to a run-time coupling with a tender. I.e. the IncludeOS build system only has to produce a unikernel with a manifest, not a tender binary. Those would be supplied separately by downloading and building the "new" Solo5. -mato
Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm
Hi Nikhil, On Monday, 11.06.2018 at 10:15, nikhil ap wrote: > > The issue is not so much working out what the guest-side API (i.e. that in > > solo5.h) should be, but keeping the conceptual integrity of how Solo5 > > (kernel/bindings), ukvm (monitor/tender) and application (unikernel) fit > > together. This will require a digression into the project's history, so > > please bear with me. > > > When you say binding, do you mean the current hypercall interface? Based on the terminology discussed in #172 (ukvm renaming), the "bindings" are the implementation of the Solo5 public API for a particular target. "Target" in this case refers to a host/hypervisor/tender combination. In terms of code organisation, bindings are what we have today in kernel/. This should all be much clearer after the renaming and reorganisation is done, hence the need for that to happen sooner rather than later. > > My preferred approach (discussed with Dan & Ricardo several times, but not > > in public) would be to replace the current model of build-time coupling > > with run-time coupling. The main points of a "back of the napkin" > > implementation would be something like this: > > > > 3.1. The application binary, supplied by the "user", would contain a > > "manifest" (e.g. inside a special ELF note) declaring which resources > > (NICs, block storage, future IPC, etc.) it requires to operate. > > > > How should the user add this manifest? Do we need to use the toolchain to > create such a binary? > For example, do we use objcopy to add the elf note? May be using an example > with Mirage or IncludeOS might > help me in understanding better. In the case of MirageOS, the "mirage" front-end tool already has all the information it needs to produce such a manifest. So, for MirageOS the relevant build-time commands (using objcopy, or generating a .c/.S which produces the relevant special section, to be determined) would be generated at "mirage configure" time, and then executed at "mirage build" time. I'm not familiar with IncludeOS' build system, but I presume a similar mechanism can be implemented there. > Also, what is the advantage of using the manifest inside the binary instead > of using a configuration file such as > json to specify the resources? We could read the configuration file, which > would include all the information you mentioned > in a manifest and could generate a tender which is crafted for the > application. Wouldn't this satisfy the 2 points above? If we split the manifest and associated artefact, then this has several disadvantages: 1) The user/operator now has to keep track of two files. Or we have to ship as a .tar/other container, in which case we may as well just embed the manifest in the ELF binary. 2) The coupling is much looser, so the potential for messing things up is accordingly higher. If we combine the two in a single artefact, with the associated build-time runes on the libOS side being generated by the build system (i.e. the developer does not create a manifest "manually") then these problems go away. By the way, you mentioned JSON. JSON is absolutely unsuitable for this purpose; given that the (untrusted) manifest needs to be interpreted by e.g. a (trusted) tender, the format needs to be designed in a way that > > 3.3. It follows that the tender can now be supplied separately by the > > "operator". At start-up, it would load the manifest and use it to determine > > which modules to enable [4], based on this list it would also require the > > "operator" to configure all host resources for all enabled modules. > > > > Does this mean all the modules need to be compiled and included in the > tender > even though the application might not load it? > I'm assuming by loading you mean by adding it to the module table which we > have now in tender. Well, I've not thought through the implementation in full yet. "Loading" might mean dynamic linking, or something else. Or, in an initial implementation it might mean what you write (all modules are compiled in, unused ones get disabled). If we get the design right then the implementation can evolve. > I would like to understand the scope of your design before proceeding. > May be I can build a minimal prototype which affirms to the > tender/unikernel requirement we have and *then* > can look at supporting multiple NICs. (Although I do a gross hack for > supporting multiple nics already > but I don't think its going to be useful considering what is required) Sure, I can try and write down more of the design, but need to get releasing and renaming out of the way first, otherwise we're going to keep getting confused about the terminology. -mato
[solo5] Heads up: Release and code freeze plans
Hi all, as of today all the outstanding PRs on the Solo5 repository has been merged, so we are in a good position to set concrete plans for the next release (tentatively 0.3.0). The main item I have on my list, to be done either before or immediately after a release is #172 ("renaming ukvm to a more generic name"), and the associated code and terminology reorganisation discussed there. I have been putting off working on this partly becase it will touch everything in the codebase, so it would have made rebasing the existing in-flight PRs hard for contributors. There are two options for how this can be done: 1) Release, freeze, rename: We cut a release now, with the code in a state where existing features are tested and stable (well, OpenBSD is still new and thus "experimental", but that's fine). After that there will be a freeze period (say, ~2 weeks) during which I will work on #172. 2) Freeze, rename, release: We do the renaming first, then cut a release. The disadvantage of option 1) is mainly for downstreams (MirageOS), since the renaming will change the user-visible terminology for targets (ukvm -> vt), and the goal of #172 is to get everyone (users, developers, downstreams) on the same page terminology and code-wise. However, even though it's "just" a rename and refactoring, after discussing with Dan and Ricardo this week it does seem prudent to release known-good code. I'll discuss this next week with the MirageOS core team and make a decision then. There is one other annoying "administrative" issue. Over the course of the past few months I have mistakenly committed a bunch of commits to the repository with a former employer's email address. I'd like to fix this for legal/attribution reasons, however the only way to accomplish this is to rewrite the history on the public Solo5 "master" branch. This will mean that everyone's existing clones and/or outstanding PRs will need to be manually fixed. The freeze looks like a good (only?) opportunity to get this done with the least amount of disruption. I'll send out a separate email about this after investigating exactly what needs to be done and how it will break other repositories in more detail. -mato
Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm
Hi Nikhil, On Tuesday, 22.05.2018 at 13:18, nikhil ap wrote: > Currently solo5/ukvm supports only one NIC. I wanted to know your thoughts > on supporting multiple NICs. This is really important and essential > requirement when trying to run on IncludeOS unikernels which is heavily > focused on networking and NFV. > > After talking to Ricardo, I understand that the difficulty is in coming up > with a new API. I wanted to start off the discussion. The issue is not so much working out what the guest-side API (i.e. that in solo5.h) should be, but keeping the conceptual integrity of how Solo5 (kernel/bindings), ukvm (monitor/tender) and application (unikernel) fit together. This will require a digression into the project's history, so please bear with me. One of the original goals for ukvm, outlined in Dan and Ricardo's original 2016 paper on the subject [1] was to "specialize" the monitor to match the running unikernel. Among other things, the benefits of this approach are: 1.1. We ensure that the trusted tender code is minimal and does not contain or enable any "unused" interfaces. Such interfaces are a well-known source of security issues (see e.g. the now-famous QEMU floppy-drive bug). 1.2. At the same time, we ensure that operation of the tender is _coupled_ to the unikernel it is running. In other words, in order for an operator/orchestrator to run a ukvm-based unikernel today, *all* the host resources (devices) the unikernel requires to operate are configured at start-up. You cannot [2] run a ukvm-bin configured with a "UKVM_MODULE_NET" without also specifying with --net which host resource the guest device should be bound to. Both of these are intentional and constitute a large part of what makes Solo5/ukvm unique compared to many similar systems that exist today. The way they are accomplished today is by _coupling_ the tender to the unikernel at _build time_. We (Dan, Ricardo and I) have known for quite some time that the build-time coupling of the tender and unikernel -- while it seemed like a good idea at the time -- is not practical for use in real-world scenarios, for several reasons: 2.1. "Users" (producers of "untrusted" applications that run under a Solo5 tender) should not be providing their own "trusted" tender binary. This goes against any security model that e.g. a public/private cloud deploying such applications would want to use in practice [2]. 2.2. Build-time coupling is totally inflexible for multi-platform host OS support, as it requires the "user" to build all possible combinations of tender binaries for all targets they might want to deploy their application on. For example, a "user" building their unikernels on Linux but deploying on FreeBSD has no way [3] to produce a "correct" tender binary. 2.3. Build-time coupling cannot cleanly support N instances of a resource (be that NICs, or block devices) while at the same time staying consistent with point (1.2) above. The question is, how can we resolve the above points going forward, keeping in mind (1.1) and (1.2)? My preferred approach (discussed with Dan & Ricardo several times, but not in public) would be to replace the current model of build-time coupling with run-time coupling. The main points of a "back of the napkin" implementation would be something like this: 3.1. The application binary, supplied by the "user", would contain a "manifest" (e.g. inside a special ELF note) declaring which resources (NICs, block storage, future IPC, etc.) it requires to operate. 3.2. The "manifest" would include information regarding the exact ABI version that the application requires, so we could actually provide a stable ABI contract between the tender and application, which we do not do today. 3.3. It follows that the tender can now be supplied separately by the "operator". At start-up, it would load the manifest and use it to determine which modules to enable [4], based on this list it would also require the "operator" to configure all host resources for all enabled modules. Getting to this point requires a significant amount of development work, and time, however I think its absolutely necessary for Solo5 to be widely usable in practice. If you want to get support for multiple NICs before then, I'm interested in interim compromise proposals (prose first, not code!) that consider the changes needed to all components involved (tender, bindings, application) and can work with the current source-time coupling while not being too gross of a hack. I've thought of several options, but couldn't find one that passed. Rather than presenting them right now, I'd prefer to hear ideas from others on what approaches might work. Thanks for reading, and I hope this explains why it's not "just" an issue of adding an index to the guest-side APIs. Please ask questions. -mato [1] https://www.usenix.org/system/files/conference/hotcloud16/hotcloud16_williams.pdf [2] There are people playing various tricks today to make this work, it's not
[solo5] OpenBSD status (was Re: IRC?)
(Cc:ing the list as others may have ideas or be able to help you with diagnosing OCaml build/toolchain issues) > [09:46] ← │ adamsteen (~asteen@203.63.192.100) has left #mirage Oh, I didn't realise you were around at all. Sorry! I've left comments on the pkgconf issue in #226; however the more pressing problem is the trap I'm seeing on 6.3. If you can find time to look into that then we can hopefully merge when I get back (14th), or ping @djwillia / @ricarkol if things look good, they can also click the green button. My intuition is that it's more likely to be an issue with the OCaml parts of the build system and/or toolchain than ukvm itself since the C tests/ run fine... ... which is confirmed by the fact that I just tried running a tutorial/hello/hello.ukvm built on Linux using the ukvm-bin built on OpenBSD, and it works fine[*]: $ doas ./ukvm-bin ./hello.ukvm.linux | ___| __| _ \ | _ \ __ \ \__ \ ( | | ( | ) | /\___/ _|\___// Solo5: Memory map: 512 MB addressable: Solo5: unused @ (0x0 - 0xf) Solo5: text @ (0x10 - 0x1e8fff) Solo5: rodata @ (0x1e9000 - 0x220fff) Solo5: data @ (0x221000 - 0x2d0fff) Solo5: heap >= 0x2d1000 < stack < 0x2000 2018-05-03 22:25:55 -00:00: INF [application] hello ^Cukvm-bin: Exiting on signal 2 [*]: Well, it appears to hang (or sleep forever) after the first message, but that may be due to timekeeping in general being screwed when running OpenBSD under nested KVM (which I noticed recently), could be confirmed by running on bare metal. For reference, comparing the output of the OpenBSD-built unikernel: $ doas ./ukvm-bin ./hello.ukvm | ___| __| _ \ | _ \ __ \ \__ \ ( | | ( | ) | /\___/ _|\___// Solo5: Memory map: 512 MB addressable: Solo5: unused @ (0x0 - 0xf) Solo5: text @ (0x10 - 0x21dfff) Solo5: rodata @ (0x21e000 - 0x254fff) Solo5: data @ (0x255000 - 0x4) Solo5: heap >= 0x5 < stack < 0x2000 Solo5: trap: type=#UD ec=0x0 rip=0x1008 rsp=0x1df8 rflags=0x10002 Solo5: ABORT: cpu_x86_64.c:171: Fatal trap So, the OpenBSD-built binary is obviously wrong (bogus end of data segment, RIP looks like execution went off into never never land). I would first try if you can reproduce it yourself on OpenBSD 6.3, compare with OpenBSD-current then try and track down the cause. Aside: We should probably add some more sanity checks into the ukvm ELF loader, to refuse to load binaries with obviously bogus PHDRs. Cheers, -mato
Re: [solo5] ARM64 CI now up and running
On Monday, 30.04.2018 at 22:21, Martin Lucina wrote: > Due to there not being any nested KVM support for aarch64, this is a bit > more involved and instead runs in a suitably privileged Docker container[2] > using surf-build v2.0.0.beta.4, plus some custom driver scripts which I > have yet to commit somewhere. I have now committed the driver script (sans the wrapper with credentials) to the Solo5/solo5-ci repository, and in the process have updated the README to reflect the new setup better. > The builder will show up on (new or rebased) PRs as > "aarch64-Debian9-gcc630". Actually, that's not true, it will also pick up pushes to existing open PRs, where the aarch64 builder will fail if they are not rebased on master post f4a4755f76ae1ddb8f70c4209dffa3c0438faaab (yesterday) due to changes required in build.sh. -mato
[solo5] ARM64 CI now up and running
Hi all, After spending just over a week in Node.js hell, I'm happy to report that ARM64 CI with end to end tests (i.e. not just the old compile-only setup) is now back up and running on a Raspberry PI 3 B+ supplied by Hannes, and installed with a mainline kernel (v4.16.x) and plain Debian thanks to derpeter's HOWTO[1]. Due to there not being any nested KVM support for aarch64, this is a bit more involved and instead runs in a suitably privileged Docker container[2] using surf-build v2.0.0.beta.4, plus some custom driver scripts which I have yet to commit somewhere. The builder will show up on (new or rebased) PRs as "aarch64-Debian9-gcc630". Now, finally time to get back to reviewing and merging Adam's OpenBSD work. Cheers, -mato [1] https://mirage.io/wiki/arm64 [2] https://github.com/Solo5/solo5-ci/blob/master/any-Debian9-gcc630/Dockerfile
Re: [solo5] Improving solo5's network performance
Hi Nikhil, On Tuesday, 24.04.2018 at 20:42, nikhil ap wrote: > [...] > > What exactly do you mean by "polling mode"? Just using poll() on > > non-blocking sockets similarly to what the current ukvm implementation of > > solo5_yield() does? > > > > Polling mode: IO thread keeps polling for packets on tapfd and shmstream > and if there is nothing, sleeps for a millisecond. Hence the CPU > consumption is high even when there is no data. Right, so that's basically busy-waiting. > Event mode: IO thread waits for events using epoll. Only wakes up when > there is actually data to be read from tapfd and shmstream. When there is > no data CPU usage is 0% but it has more VMEXITs than polling mode because > the solo5 kernel need to notify when the packets are queued on shmstream > using the hypercall. Hence queuing multiple packets onto shmstream and > notifying it with solo5_net_flush() will reduce the number of hypercalls. Ok, makes sense. > > This would require a portability layer to run on the BSDs. If I'm reading > > the figures you sent correctly, it looks like polling performs better > > anyway? > > > Polling mode is just to figure out what is the best we can do but I'm not > sure if it is suitable for production. The goal is to achieve a very good > performance which is comparable to qemu and which is not CPU intensive. Indeed, busy-waiting is not something we'd want to use in production. In any case, as you write, the goal is to achieve performance which is comparable to QEMU, not necessarily faster at any cost. > > The same thing could presumably be accomplished with a writev()-style API, > > or is this doing something different? > > With solo5_net_queue() API, we can queue packets in shmstream without > notifying ukvm-bin to read from it and signal ukvm-bin with > solo5_net_flush(). I haven't seen your APIs, but based on this description I'm assuming a vector write (i.e. write X packets in one call) would work. If we do end up modifying the Solo5 APIs we need to do so carefully to accomodate other possible targets (seccomp, SGX), and also take into account any impact on possible failure semantics. > > Do you have any statistics comparing (host) CPU load for the various tests? > > > CPU load is mentioned in my initial email which is on the host or do you > require something else? I think that's sufficient for now, please include those figures when you send an updated table with the TX-side performance. > [...] > > So, I suggest that you continue to iterate on this work, ideally in public, > > and submit a PR after the next Solo5 release. > > > Any ETA on the next solo5 release? The usual answer is "when it's ready" :-) Having said that: I'm going to be on vacation May 4th through 13th, and I don't expect to have things ready before then. The main thing which is dragging on at the moment is getting CI set up for ARM64 and OpenBSD, since I don't want to make a release without having this in place for all supported targets. So, realistically, mid-late May, fingers crossed. Cheers, -mato
Re: [solo5] Improving solo5's network performance
Hi Nikhil, On Friday, 20.04.2018 at 20:26, nikhil ap wrote: > Hi Guys, > > Summarising the discussions we've had so far and the work I've been doing: Thank you for this summary and for doing these experiments. The numbers are very impressive. For those who were not on the private email thread about this, Nikhil has been working on a minimal PoC implementation of the Solo5/ukvm network APIs with a shared memory transport based on the "shmstream" protocol used by the Muen Separation Kernel. >- Implemented network interface using shmstream (shared memory) in order >to reduce the number of hypercalls and thus reduce the number of VMEntries >and VMExits. >- Separate IO thread is implemented in ukvm-bin to read/write packets >from the shared memory. >- IO thread currently supports polling mode and event mode. What exactly do you mean by "polling mode"? Just using poll() on non-blocking sockets similarly to what the current ukvm implementation of solo5_yield() does? >- Event-driven model is implemented using eventfds and io-thread waits >for events using epoll. This would require a portability layer to run on the BSDs. If I'm reading the figures you sent correctly, it looks like polling performs better anyway? >- The applications could run both the modes without any changes to their >APIs. >- Currently shmstream mode can be set with --shm option. > - Ex: ./ukvm-bin --net=tap100 --shm=poll test_ping_serve.ukvm In the grand scheme of things, I'd like to eventually replace the current implementation with a shmstream-based one entirely, if it proves to be portable (host OS and arch-wise) and secure enough. So, it would not be user-selectable. However, there are also other things that need to happen before that (see below). >- However, in case of event-mode, for better performance, the >application can chose to notify ukvm-bin after queuing *all* the packets in >the shared memory instead of a single packet transmit by using a new solo5 >public APIs: solo5_net_queue() and solo5_net_flush(); The same thing could presumably be accomplished with a writev()-style API, or is this doing something different? What is the performance gain in practice? In the figures you sent, is that with or without this optimization? >- Solo5 performance was tested with IncludeOS (IncludeOS had to be >modified to address the recent API changes in solo5) and with UDP traffic. >- Summarising the results below Do you have any statistics comparing (host) CPU load for the various tests? Next steps: I'd like to push out a formal release of Solo5 that integrates the work done over the past year which does not have a proper formal release (FreeBSD vmm support, ARM64 support) and some of the work which is "ready to merge" ASAP (OpenBSD vmm support, possibly your work on guest-side core dumps). This would also include the renaming of the various Solo5 components as discussed in #172. Once that is done we will then have time to look at integrating your shmstream-based network implementation, and, as I wrote above, ideally to replace the current implementation entirely. In keeping with the minimalist design of Solo5 and ukvm, rather than providing several different network implementations, or variants of one (polling vs. event-driven, etc.) I'd prefer that we work towards choosing one single variant to use and support [as a default], otherwise we run the risk of going down the "QEMU-style" route of a proliferation of different optional components, which is both hard to understand for users and hard to comprehensively test and secure. So, I suggest that you continue to iterate on this work, ideally in public, and submit a PR after the next Solo5 release. Cheers, and thanks again, -mato