from:"Martin Lucina"

[solo5] Stepping down from maintaining Solo5 and MirageOS

2021-11-11 Thread Martin Lucina

Dear friends and colleagues,

effective today, I am stepping down from both maintaining Solo5 and from the
core team of MirageOS.

- What does that mean?

I will no longer monitor pull requests or issues filed on repositories in
the Solo5/ or mirage/ organisations on GitHub, with the exception of the
issue of appointing a new maintainer for Solo5.  I am giving up general
commit privileges for repositories in mirage/.

Regarding maintainership of Solo5: Given that MirageOS is the primary
downstream of Solo5, I would prefer that members of the MirageOS
community step up to continue maintenance of Solo5. Failing that, perhaps
there is someone in the wider community who is interested. I will file an
issue to discuss this on GitHub shortly.

- Why?

I burnt out on Solo5, MirageOS and related projects some time in the spring
of this year.  I then proceeded to take a 6 month-long sabbatical, with the
somewhat arbitrarily timed intent of returning to work on Solo5 and
MirageOS in October this year.

Over the course of the past month, I have found myself unable to so much as
merge existing PRs in Solo5 and their dependencies for the upcoming
MirageOS 4 release.

Regardless of whether this means I have yet to fully recover from my
burnout, or that as part of recovery I must move on to new projects and
different challenges, it would be irresponsible of me to cling to my
"ownership" of Solo5 and/or maintainership of dependent MirageOS projects
any longer, and block the continued work of others.

- Why are you telling us this?

This was a hard call, as was the call to write about it publicly. I believe
in being open and transparent with the community, and being so helps me to
get a sense of closure on my decision.

- I have other questions!

Please feel free to send them directly to me.

With thanks to @yomimono for inspiring the general structure of this e-mail,
and seeing that she wrote something similar publicly and survived :-)

Looking forward to a time when I can get excited about work again, whatever
that may be,

Best,

-mato

[solo5] Solo5 v0.6.8 released

2020-12-17 Thread Martin Lucina

Hi,

Solo5 v0.6.8 is now available. From the release notes:

## 0.6.8 (2020-12-17)

Bug fixes:

* xen: Fix corrupted console output by correctly waiting for xenconsoled to
  consume all data. (#490)
* hvt: Free kvm\_cpuid after use. (#485)

New features:

* xen: Add multiboot support. (#482)

Acknowledgements:

* Thanks to Marek Marczykowski-Górecki (@marmarek) for implementing multiboot
  support for the Xen target.

And, since I forgot to send an announcement for v0.6.7 to the list, here
are the release notes for that release:

## 0.6.7 (2020-10-08)

Bug fixes:

* spt: Fix a bug where libseccomp's seccomp\_load() would cause the tender to
  be killed by SIGSYS due to calling free(), which would under certain
  conditions call sbrk(), which is not in our seccomp filter. Work around this
  behaviour by exporting the generated BPF filter and loading it into the
  kernel manually. (#479)
* Various improvements to the Xen bindings for use by MirageOS. (#480, #476).

Enjoy,

-mato

[solo5] Solo5 v0.6.6 released

2020-07-21 Thread Martin Lucina

Hi,

Solo5 v0.6.6 is now available. From the release notes:

New features:

* This release adds minimal bindings for the Xen hypervisor. Hosts running
  Xen 4.10 or later on x86\_64 are supported, and the bindings build Solo5
  unikernels as PVHv2 domUs only.  These bindings are not a full Solo5 target;
  they exist for the purpose of providing low-level bootstrap code to MirageOS
  and do not provide any network or block I/O functionality.

Bug fixes:

* genode: Do not yield for expired deadlines. (#466)

This release is part of the ongoing work to replace the MirageOS Xen
platform stack with a new, legacy-free codebase. See [1] for details, along
with instructions on how to test the work in progress.

Enjoy,

-mato

[1] https://github.com/mirage/mirage/issues/1159

[solo5] Solo5 v0.6.5 released

2020-05-14 Thread Martin Lucina

Hi,

Solo5 v0.6.5 is now available. From the release notes:

New features:

* Stop host kernels from attempting to execute Solo5 binaries. This improves
  both the user experience on some hosts (e.g. "No such file or directory" vs.
  "Segmentation fault" on Linux) and overall security posture by forcing the
  host kernel ELF loader to bail out earlier rather than actually jumping to
  the unikernel code. (#442)
* hvt: Full support for W^X and correct guest-side page protections on OpenBSD
  6.7+ systems with EPT. (#447)
* hvt: capsicum(4) sandbox for the hvt tender on FreeBSD 12+. (#366)

Bug fixes:

* hvt: Fix hang in `HVT_HYPERCALL_POLL`. On Linux hosts, if `solo5_yield()` was
  called with a deadline that has already passed and the unikernel was not using
  any network devices then the underlying hypercall would hang forever. Not
  known to affect any existing code in production. (#460)

Other notable changes:

* muen: Muen ABI updates, now uses ABI version 2 on the Solo5 side. Muen
  commit f10bd6b or later is required. (#454, #448)
* genode: Support for Genode is limited by toolchain issues and Genode bindings
  are no longer built by default. (#446, see also ocaml/opam-repository#16368)
* Improvements to the build system on BSD/clang hosts. System headers
  (sys/endian.h, osreldate.h) that were mistakenly being installed into the
  Solo5-provided include paths have been removed. For OCaml/MirageOS users,
  ocaml-freestanding 0.6.0 or later is now required. (#453, #455, #457, #461,
  see also mirage/ocaml-freestanding#77)
* Improvements to built-in self tests. (#451)
* Fix build failures with GCC >= 10. (#459)

Known issues:

* Full W^X support / correct guest-side page protections are currently only
  available on the "spt" target on Linux, and the "hvt" target on OpenBSD 6.7
  or later. (#303)
* On OpenBSD, "hvt" operation with multiple network devices results in packet
  loss. This appears to be a bug in kqueue(2) but we have no confirmation from
  upstream. (#374)
* virtio-net is not functional on at least QEMU 5.0 and possibly earlier
  versions. QEMU versions up to and including 3.1.0 are known to work. (#463)

Acknowledgements:

* Thanks to Adam Steen (@adamsteen) for pushing for OpenBSD kernel support for
  manipulating guest EPT mappings, bringing full W^X to hvt on OpenBSD 6.7 or
  later.
* Thanks to Adrian-Ken Rueegsegger (@kensan) for the Muen updates.
* Thanks to Anurag Soni (@anuragsoni) for diagnosing and fixing the build on
  systems with GCC >= 10.
* Thanks to Hannes Mehnert (@hannesm) for diagnosing #460 and for help with
  testing BSD/clang build system changes and generally helping out.
* Thanks to Stefan Grundmann (@sg2342) for the capsicum(4) hvt tender sandbox
  on FreeBSD.

For MirageOS users, this release should be available in OPAM within 24
hours, barring any major showstoppers found by CI.

Enjoy,

-mato

[solo5] Solo5 v0.6.4 released

2019-11-14 Thread Martin Lucina

Hi all,

Solo5 0.6.4 is now available.

This release updates the Genode and Muen bindings, and fixes the following
notable issues:

* Disk images produced by solo5-virtio-mkimage were not bootable due to changes
  in recent SYSLINUX versions (thanks to Ricardo Koller, @ricarkol for finding
  the fix).
* Build failure on FreeBSD 12.1+ (clang 8.0.1) due to issues with TLS in the
  toolchain.

New features:

* "configure.sh" has a new "--only-tools" option, which builds no tenders or
  bindings. In conjunction with "make install-tools", this can be used to
  install "solo5-elftool" only. This is intended for downstream systems which
  need to make use of "solo5-elftool" to query manifest information.
* A "scripts/opam-release.sh" tool has been added to generate OPAM repository
  metadata as part of the Solo5 release process.

Apart from that, this release contains documentation updates and the "hvt"
tender on the aarch64 architecture is now considered production-quality.

Regards,

Martin

[solo5] Solo5 0.6.3 released

2019-10-08 Thread Martin Lucina

Hi all,

I have released Solo5 0.6.3:

This release fixes a build problem with Linux distributions such as
OpenSUSE which install libseccomp headers into subdirectories of
/usr/include. Note that this introduces pkg-config as a dependency for
building Solo5 on Linux.

No functional changes.

Martin

[solo5] Solo5 0.6.2 released

2019-09-23 Thread Martin Lucina

This release fixes further OPAM/MirageOS installation problems found by CI,
and the following build problems:

* spt: Support ppc64le on RedHat (thanks to Stefan Berger, @stefanberger).
* Fix check/warning for libseccomp >= 2.3.3 (thanks to Mechiel Lukkien, @mjl-).

No functional changes.

Martin

[solo5] Solo5 0.6.1 released

2019-09-19 Thread Martin Lucina

Why make one, when you can do two releases in a day!

This release fixes OPAM/MirageOS installation problems found by CI. No
functional changes.

Additionally, the following entry was missed from the changes for 0.6.0:

* Solo5 tenders, tools and bindings now have an embedded version number, using
  the well-known convention "v0.6.0-4-gc9786d87". Bindings will log the version
  number during start-up. As a consequence of this, Solo5 now needs to be built
  from either a Git tree or a tarball produced by "make distrib". Suitable
  tarballs are uploaded to Github during the release process.

Cheers,

Martin

[solo5] Solo5 v0.6.0 released

2019-09-19 Thread Martin Lucina

Dear all,

I've just released Solo5 version 0.6.0. Highlights from the (long!)
changelog:

This is a major feature release which introduces the concept of an "application
manifest", which enables support for multiple network and block devices.

This release removes the compile-time specialization of the "hvt" tender. While
this was a nice experiment, it is not practical for real-world deployment
scenarios where it is expected that the party supplying the tender (i.e. the
operator / user) will be different to the party supplying the unikernel (i.e.
the developer).

Due to these and other changes, both the public Solo5 APIs (as defined and
documented in solo5.h) and internal tenders/bindings ABI have changed.
Likewise, the build process for Solo5-based unikernels has changed, and
downstream projects will need to be updated.

Please refer to the full CHANGES.md and the Solo5 documentation for
details.

MirageOS/Solo5 releases will be coming over the next few days.

Enjoy,

Martin

Re: [solo5] OpenBSD and spt

2019-04-29 Thread Martin Lucina

Hi Adam,

On Friday, 26.04.2019 at 06:00, Adam Steen wrote:
> Good Afternoon all
> 
> Is there any appetite for a cross platform support in solo5 SPT?

It's certainly something that's possible, but the state of the spt code in
general is a bit too raw to consider abstracting it to target different
host kernels just yet.

Also, I don't think there is as much of a pressing need for spt as on Linux
-- I'd expect most people running OpenBSD on amd64 to be doing so on bare
metal, i.e. with access to the CPU virtualization hardware, where you can
just run hvt.

> The system call restriction would be very easy, and I expect the other code 
> to compile with very little changes.

What would you use for implementing the syscall restrictions? The
granularity of pledge(2) is different to that of seccomp/BPF...

-mato

Re: [solo5] OpenBSD Support

2019-03-28 Thread Martin Lucina

On Wednesday, 20.03.2019 at 10:29, Martin Lucina wrote:
> Hi Adam,
> 
> On Wednesday, 20.03.2019 at 05:54, Adam Steen wrote:
> > Hi All
> > 
> > With the soon to be released OpenBSD 6.5, having Opam 2.0.3 and OCaml 
> > 4.07.1 as packages, I thought i would open discussion about which release 
> > of OpenBSD should be supported by Solo5.
> 
> That decision is mainly up to you as the de facto maintainer of the OpenBSD
> support.
> 
> > I was thinking it should be the latest stable release, ie as of the release 
> > of 6.5, Not Current, it just seems too much of a moving target.
> 
> Personally I'd keep it simple (and reduce the amount of work involved), and
> stick to supporting 6.5 *only*.

Actually, we should continue to support 6.4 as that now has CI (added this
week, finally!) and then 6.5 only after it's released. It's impossible for
me to track -current for CI.

IOW, we should stive to support the current releases (N, N - 1).

-mato

[solo5] Heads up: Build system refactoring and hvt changes

2019-03-28 Thread Martin Lucina

Hi all,

I've just merged a significant change on master in #345 which affects both
users and developers going forward.

I'm sending the notes from the commit log out here in lieu of updating
documentation, as I'll be offline for the next two weeks starting today.

As a pre-cursor to this change, the Solo5 CI has been updated with end to
end tests using MirageOS in #340, and OpenBSD 6.4 has been added to the CI
(currently failing tests, may be due to a misconfiguration on the CI node,
no time to investigate right now).

Those of you who have outstanding PRs, I'd appreciate it if you rebase those on
master while I'm away, this will also ensure they get tested properly using the
new CI additions.

Thanks,

Martin

Changelog from #345 follows:

commit 2f1cae37d8b9f97826ea08934558a4360bdb13da
Author: Martin Lucina 
Date:   Thu Feb 21 16:26:49 2019 +0100

Refactor build system, remove hvt compile-time specialization

This is a large change, and the majority of it is effectively a full
re-write of the build system. The actual removal of hvt compile-time
specialization is fairly straightforward.

User-visible changes:

- 'configure.sh' now needs to be run manually before running 'make',
  this is consistent with POLA.
- conversely, 'make clean' no longer cleans Makeconf. Use 'distclean' or
  'clobber' for that.
- 'configure.sh' will now print the targets that can (will) be built on
  this system. The strategy is still "build everything we can", however
  I have disabled Genode on all systems except Linux due to toolchain
  issues.
- You can now build a subset of targets from the top-level 'make', by
  specifying 'CONFIG_XXX=' (disable) or 'CONFIG_XXX=1' (enable) either
  on the command line, or editing the generated Makeconf.
- Makefiles use silent rules by default. To get the old verbose ones
  back, use 'make V=1'.
- The 'solo5-hvt' tender is no longer "specialized" to the unikernel.
  We build two tenders, 'solo5-hvt' with all non-debug modules
  configured and 'solo5-hvt-debug' with additional debug modules (gdb,
  dumpcore where available).
- 'solo5-hvt-configure' is kept around for now for backward
  compatibility with OPAM/MirageOS but is essentially a NOP.

Developer-visible changes:

- The build system now has proper support for auto-generation of
  dependencies. This means you can safely edit source files, run make
  and be sure you will get a complete incremental build.
- Makefiles have been refactored to use common best practices, remove
  repetition, consistent variable names and clear interfaces between
  configure.sh/Makeconf/Makefiles, all the while keeping them simple
  enough to understand for me on a Monday morning before coffee. I.e.
  limit use of macros, eval, etc.
- hvt tender modules are no longer defined by compile-time flags,
  instead a dynamic array is placed into a special ELF section
  (.modules).  This means that a hvt tender binary can be combined from
  an arbitrary set of hvt_module_XXX object files, which is the right
  way to do things going forward and also simplifies the build system
  (not needing to build multiple targets from the same set of sources).

Shortcomings / TODOs:

- Dependency files (*.d) are stored in-tree. I spent several days on
  trying to figure out how to get them to work out of tree, but in
  combination with the non-recursive use of subdirectories in 'bindings'
  I could not figure out the required Makefile magic.
- HVT_DROP_PRIVILEGES=0 is non-functional with the new modules
  arrangement, but needs a re-design anyway.

Other changes included as part of this PR:

- Revert privilege dropping on FreeBSD (see discussion in #282).
- The build system changes effectively implement option 1 in #292, i.e.
  on x86_64 -m no-red-zone is only used for bindings, not for
  application code.
- tests/tests.bats has been refactored for DRY as it was getting totally
  unmaintainable.

Re: [solo5] Multiple consoles/Debug API, result type

2019-03-25 Thread Martin Lucina

Hi Daniel,

On Sunday, 24.03.2019 at 16:26, m...@daniel-mendler.de wrote:
> I have another feature request, which came up while working with solo5. [...]

I'm leaving on a logistically somewhat complicated vacation in four days
time and will be offline until the 15th of April. As there are things in
progress (such as the build system refactoring / de-specialization of hvt
in [1]) that I'd like to get merged before I leave, I'd like to ask that we
defer these discussions until after I get back.

I don't have the mental bandwidth right now to both hold a coherent
high-level design discussion and complete my immediate priorities :-(

To be clear: I appreciate your suggestions and I'm not against adding
features, or re-thinking the Solo5 APIs, but it needs to be done calmly and
thoughtfully -- as the maxim goes, "no is temporary but yes is forever".

In the mean time, if you've not yet listened to my Solo5 talk from this
year's FOSDEM, I'd appreciate it if you'd take the time to do that. You can
find the talk at [2]. This will hopefully make the overall project goals
clearer.

Thanks,

-mato

[1] https://github.com/Solo5/solo5/issues/326
[2] https://fosdem.org/2019/schedule/event/solo5_unikernels/

Re: [solo5] Memory management

2019-03-22 Thread Martin Lucina

On Thursday, 21.03.2019 at 15:54, m...@daniel-mendler.de wrote:
> Hi Martin,
> 
> > I am not sure about the "dynamic part". I would either set the stack
> > from outside (as discussed above), or allow some kind of memconfig
> > call, which can be executed only once after application startup.
> > Alternatively you could provide munmap and munmap_done. Dynamic people
> > would just never call munmap_done...
> 
> I thought a bit more about this idea of adding a munmap call and I don't
> really like it.
> Maybe the goal should be extended to allow multiple different memory areas,
> randomly distributed in the address space to leverage ASLR. One of those

ASLR is a separate topic in itself, see here for a rough plan of what needs
to be done:

https://github.com/Solo5/solo5/issues/304

> memory areas is the stack then. This could work either by configuring the
> memory areas statically from outside or dynamically inside the application.
> The application should start with a fixed small stack size. To allocate heap
> areas or a larger stack, solo5 should simplify offer solo5_mem_alloc, which
> allows to allocate page sized heap blocks until the available heap memory is
> exhausted.

No, this has already been discussed before. Dynamic memory allocation is
not on the cards.

Most recently
https://github.com/Solo5/solo5/issues/335#issuecomment-472499246 and
earlier https://github.com/Solo5/solo5/issues/223.

Providing just a "munmap()" and nothing else might be a simpler way to get
guard pages, or it might not. Anyway, one thing at a time. As I mentioned
in my other email, lets ignore multiple stacks for now and just concentrate
on how a separate stack region could be done.

Martin

Re: [solo5] Porting a runtime to Solo5

2019-03-21 Thread Martin Lucina

Hi Daniel,

On Thursday, 21.03.2019 at 10:29, m...@daniel-mendler.de wrote:
> I have one unrelated question - the Solo5 API exposes the heap as one
> continuous block of memory, where the stack is at the top. Is there some way
> to ensure that the growing stack does not overwrite the heap, by using some
> red zone pages? 

Not at the moment.

> I can manually limit stack growth by limiting recursion etc,
> but I would prefer to have physically enforced guarantees. Would it make
> sense to change the API in such a way that stack and heap are kept fully
> separate? For example at compile time a fixed stack size is specified and
> the stack is kept at a different location from the heap. The Solo5 API
> should then pass heap_start, heap_size, stack_start, stack_size to the
> application.

This is a good idea and I've already thought about it several times.

In general, it should be possible to implement in a way that is portable,
with fallback to using a single memory region under the hood for targets
where it's either not possible, not worth the implementation complexity or
just not done yet.

On such targets you'd not get a red zone, but your application would still
run.

However, there are two things to consider:

1) How is the default stack size determined / who gets to choose it and
when?

At the moment things are nice and simple, e.g. for 'hvt' you just run with
"solo5-hvt --mem=XYZ ..." which gives you an easy way to say how much
*maximum* resource you want to commit.

*Requiring* users to specify a separate stack size up front at run time
seems like exposing an unnecessary detail, not to mention that a lot of
people will not know what a reasonable value is.

2) I can imagine that the IncludeOS people on this list will immediately
pipe up with "Ha! Now we want multiple stacks, *dynamically allocated at
run time*, each in its own region" which opens up its own can of worms
entirely.

An option I thought of which addresses the second point is extending the
Solo5 API to provide a single virtual-memory manipulation call,
specifically the equivalent of munmap(). The call would be bounds-checked
against the heap region, so attempting to touch anything else will just
fail.

This would make guard page setup the responsibility of the libOS
(ocaml-freestanding in the Mirage case) and allow for "carving up" your
memory region as you see fit.

The main issue I have with that is that for 'spt' (the seccomp-based
unikernel-as-user-process target on master) it opens up (some) access to
the host munmap() system call. Now, there have been local privilege
escalation bugs on Linux involving "only" munmap() and clone(). Granted, we
don't allow clone() but who's to say there won't be one involving "only"
munmap() and poll() :-/

So, it's unclear what the way forward is here... all ideas welcome.

Martin

Re: [solo5] OpenBSD Support

2019-03-20 Thread Martin Lucina

Hi Adam,

On Wednesday, 20.03.2019 at 05:54, Adam Steen wrote:
> Hi All
> 
> With the soon to be released OpenBSD 6.5, having Opam 2.0.3 and OCaml 4.07.1 
> as packages, I thought i would open discussion about which release of OpenBSD 
> should be supported by Solo5.

That decision is mainly up to you as the de facto maintainer of the OpenBSD
support.

> I was thinking it should be the latest stable release, ie as of the release 
> of 6.5, Not Current, it just seems too much of a moving target.

Personally I'd keep it simple (and reduce the amount of work involved), and
stick to supporting 6.5 *only*.

Cheers,

-mato

Re: [solo5] Plans to support WebAssembly as a target

2019-02-18 Thread Martin Lucina

Hi Jonathan,

(meta: You're not subscribed to the list, so your message got moderated.)

On Sunday, 17.02.2019 at 10:07, Jonathan Beri wrote:
> I enjoyed the recent presentation given at Fosdem 2019
> <https://fosdem.org/2019/schedule/event/solo5_unikernels/> given by Martin
> Lucina & Ricardo Koller. At the end, Martin mentioned an idea to support
> WebAssembly as a target. How developed is the idea and has any progress
> been made in that direction?

At this stage it's just an idea. I ran out of time towards the end of the
talk, so, to elaborate a bit on this:

- This is about using WebAssembly as a way to get arch-independence for
  Solo5-based unikernels. I.e. run as a component on a server with
  something like wasmjit, not in a browser.

- I've not looked into WebAssembly in detail at all. From my possibly
  naive understanding, it should be relatively easy to use something like
  an Emscripten-based cross toolchain to produce Solo5 WebAssembly
  "binaries" (whatever they are).

- The Solo5 design mandates that you get no more executable pages after
  loading the unikernel binary. This precludes the use of a JIT. While
  others have asked [1] to relax this, I'm very wary of doing so as it
  significantly reduces "security posture" and generally goes against the
  design goal of keeping the system "static". Having said that, it looks
  like the "hack" described in that issue will have to go in in the short
  term.

  Possible solutions to this:

  1. Either keep the "no more executable pages" rule, which implies an AOT
  compilation step at/around deployment time (from wasm -> native code).

  2. Or, figure out a minimal, as-safe-as-possible interface at the Solo5
  layer which allows a JIT to function while *enforcing* W^X on the various
  targets. I'm not sure this is possible, as it would imply e.g. in the
  'spt' case allowing at least mprotect() in the seccomp filter which is a
  fairly wide attack surface.

  If you know more about how JITs work, I'd be interested in some resources
  on how the transition from "writable" to "executable" memory is managed
  internally by the JIT.

- Then there's the question of what the resulting architecture
  (tender/bindings) would look like, and what the interfaces to the
  "outside world" are in a wasm scenario and so on. However, figuring out
  the previous point is probably the biggest hurdle.

[1] https://github.com/Solo5/solo5/issues/321

Hope this helps,

Martin

Re: [solo5] Solo5 talk at FOSDEM this Sunday

2019-02-04 Thread Martin Lucina

Hi all,

On Wednesday, 30.01.2019 at 20:03, Martin Lucina wrote:
> If you're at FOSDEM, hope to see you there. For those not attending, I'll
> follow up with links to the recording after the event.

A recording of the talk is now available at the event page:

https://fosdem.org/2019/schedule/event/solo5_unikernels/

Direct download:

https://video.fosdem.org/2019/AW1.121/solo5_unikernels.webm
https://video.fosdem.org/2019/AW1.121/solo5_unikernels.mp4

Martin

[solo5] Solo5 talk at FOSDEM this Sunday

2019-01-30 Thread Martin Lucina

Hi all,

Ricardo Koller and myself will be presenting a talk on Solo5 at FOSDEM,
this coming Sunday (Feb 3rd).

>From the abstract:

Solo5 is a microkernel friendly, sandboxed, re-targetable execution
environment for unikernels, with a taste for minimalism. We will start
with an overview of core Solo5 concepts and present the interfaces it
offers to the unikernel/library operating system/application developer.
Using existing library operating systems, such as MirageOS, we will
demonstrate the developer experience for various Solo5 targets, going
on to show how rigorously applying minimalist principles to interface
design is used to our advantage, blurring traditional lines between
unikernels, processes, kernels and hypervisors. We will conclude with
some lessons learned during development of Solo5 thus far, and present
ideas and challenges for future development.

Link to the event:

https://fosdem.org/2019/schedule/event/solo5_unikernels/

If you're at FOSDEM, hope to see you there. For those not attending, I'll
follow up with links to the recording after the event.

Martin

Re: [solo5] Re: Setting IP address of the guest

2019-01-16 Thread Martin Lucina

On Wednesday, 16.01.2019 at 16:55, Dávid Kovács wrote:
> I looked around a bit and found that the virtio feature bitmap does not
> have anything similar for IP to what you use there with MAC addresses. Am I
> right there? This whole thing might just be a stupid question so feel free
> to tell me if so. :)

Sorry, I don't understand your question. virtio-net is an L2 device, so
there's no IP involved.

-mato

Re: [solo5] Setting IP address of the guest

2019-01-15 Thread Martin Lucina

Hi,

On Tuesday, 15.01.2019 at 17:01, Dávid Kovács wrote:
> Hi!
> 
> I would need a way to set the IPv4 address an unikernel when starting it,
> is there a way to do this? If it is not implemented than how would I go
> about implementing it? Just try and follow whatever you guys did with mac
> addresses?

That depends entirely on the unikernel/libOS you are using in the guest.
At the Solo5 layer we are only concerned with providing an L2
Ethernet-style interface to the guest.

For example, with a MirageOS unikernel such as the device-usage/network
example from [1], the network configuration is passed on the Solo5 tender
command line. So, to set a static IPv4 address of 10.10.10.10/24 and a
gateway of 10.10.10.1 you could use something like the following:

$ ./solo5-hvt --net=tap100 ./network.hvt -- --ipv4=10.10.10.10/24 
--ipv4-gateway=10.10.10.1

Hope this helps,

Martin

[1] https://github.com/mirage/mirage-skeleton

[solo5] Solo5 0.4.1 released

2018-11-12 Thread Martin Lucina

Hi all,

I'm happy to announce the release of Solo5 0.4.1.

This release introduces experimental support for the Genode Operating System
Framework as a target for Solo5-based unikernels. Instructions for using
this target will be forthcoming from the Genode folks shortly once they are
ready.

The one other user-visible change is the addition of support for dropping
"root" privileges in the "hvt" tender on FreeBSD and OpenBSD. On OpenBSD
the tender is further deprivileged using pledge(2).

The Linux behaviour has not changed, we still recommend that you do
not run the tender as "root" but do not enforce this in any way. Whether or
not this will change is yet to be decided, please see
https://github.com/Solo5/solo5/issues/282 for discussion.

Other changes:

* Migrate OPAM integration to OPAM 2.

Acknowledgements:

Thanks to the following new contributors to this release:

* Emery Hemingway
* Stefan Grundmann

Enjoy,

-mato

[solo5] "Unikernels as Processes" paper

2018-10-21 Thread Martin Lucina

Hi all,

I'm happy to announce that our ACM SoCC 2018 paper entitled "Unikernels as
Processes" is now publicly available at
https://dl.acm.org/citation.cfm?id=3267845.

The paper by Dan and Ricardo of IBM Research, Nikhil of BITS Pilani and
myself presents the central tenet that the host attack surface/TCB of a
Linux seccomp-sandboxed unikernel is comparable to, or, depending or your
evaluation metric, better than that of hardware virtualization sandbox.

Dan & Ricardo's implementation of a seccomp tender for Solo5 is available
today as part of the "nabla containers" project at
https://github.com/nabla-containers/solo5.

I plan to work together with Dan & Ricardo on upstreaming this code to
Solo5 next month.

Regards,

-mato

[solo5] Solo5 0.4.0 released

2018-09-14 Thread Martin Lucina

Hi all,

I'm happy to announce the release of Solo5 0.4.0.

## 0.4.0 (2018-09-14)

This release is a major restructuring and renaming of Solo5 components,
primarily to reflect that the "ukvm monitor" is no longer specific to the KVM
hypervisor and to allow for future development of further targets and tenders
enabling different sandboxing technologies.

Major changes:

* `kernel/X`: Moved to `bindings/X`, now referred to as the "Solo5 _bindings_
  for X".  Build products are now named `bindings/X/solo5_X.o`.
  * `kernel/solo5.h`: Moved to `include/solo5/solo5.h`.
* _ukvm_: Target has been renamed to _hvt_. Monitor code is now referred to as
  the hvt _tender_ and has been moved to `tenders/hvt/`.
  * `ukvm-configure`: Now named `solo5-hvt-configure`.
  * `ukvm-bin`: Now named `solo5-hvt`.
  * `ukvm/ukvm_guest.h`: Renamed to `include/solo5/hvt_abi.h`.
* Generated VM names used on FreeBSD and OpenBSD have been changed from
  `ukvm%d` to `solo5-%d`, with `%d` being the PID of the `solo5-hvt` tender.
* Core file names produced by the _hvt_ dumpcore module have been changed from
  `core.ukvm.%d` to `core.solo5-hvt.%d`.
* `solo5-run-virtio` and `solo5-mkimage`: Renamed to `solo5-virtio-run` and
  `solo5-virtio-mkimage` respectively.
* OPAM packages used by MirageOS have been renamed from `solo5-kernel-X` to
  `solo5-bindings-X`, accounting for the change from `ukvm` to `hvt`. Full
  details of the impact of this change on existing Mirage/Solo5 installations
  will be provided separately as part of a MirageOS release.

For further details please refer to the discussion and commits merged as part
of #274.

Other changes:

* Update OpenBSD requirements to 6.4 and minor OpenBSD build fixes (#270, #273).

Best,

-mato

Re: [solo5] Is dumpcore enabled

2018-08-16 Thread Martin Lucina

On Wednesday, 15.08.2018 at 05:00, Adam Steen wrote:
> Hi Martin
> 
> I have complete the pull request [1], and was looking for further discussion 
> from anyone who is interested.

Thanks, I'll look into it, but it might take a week or so, currently
concentrating on the renaming/refactor for #172.

It might be easier to do this after that is done as all the internal API
names will change.

Cheers,

-mato

Re: [solo5] Is dumpcore enabled

2018-08-12 Thread Martin Lucina

Hi Adam,

apologies for the delayed reply, I've been on vacation.

On Sunday, 22.07.2018 at 22:36, Adam Steen wrote:
> Hi All
> 
> After a quick discussion with Hannes and attempting to implement coredump on 
> OpenBSD[1], i found there was no easy way to determine if use_coredump was 
> true. ie coredump was enabled.,
> 
> I had to remove a tight pledge, privilege drop and chroot just to enable 
> coredump, i wasn't sure it was worth it. Hannes suggested turning the 
> security changes on or off if coredump was enabled, but we both couldn't find 
> a easy way to do this.
> 
> So i am posting here to get the discussion started about, what we can do to 
> make this easier.
> 
> The only things i could come up with, with the current code, was to disable 
> the security if the dumpcore module was compile in (available). see [2] and 
> [3].
> 
> I wanted to raise this discussion not around my coredump code but about 
> determining which module(s) are enabled in the ukvm main/system code

This is a good question. With the current model of determining modules to
enable at compile time, I would use the same approach for determining
whether or not to "drop privileges" (not sure what best to call this
functionality, suggestions please?).

Specifically:

1. Add a compile-time #define, e.g. UKVM_DROP_PRIVILEGES. In a suitable
header, say ukvm/ukvm.h since that is included by all modules, define this
to 1 if not already defined. I.e. default to dropping privileges.

2. ukvm-configure can then manually add -DUKVM_DROP_PRIVILEGES=0 to CFLAGS
if dumpcore has been requested.

3. If UKVM_DROP_PRIVILEGES=1, you get the current behaviour in your code.

4. If UKVM_DROP_PRIVILEGES=0, privilege dropping is disabled *AND* ukvm
prints a stern warning to this effect at startup, including something about
not being recommended for production, etc.

Separately from this, and with a view to adding some amount of privilege
dropping by default on other systems besides OpenBSD, I think that:

1. The privilege dropping code should be moved into its own top-level
function, e.g. ukvm_hv_drop_privileges(), which goes into ukvm_hv_.c.

2. This function is clearly called from ukvm/ukvm_main.c, just before
entering the VCPU loop(?). This would also be where the #if printing the
warning if disabled (see (4) above) goes.

As for what privileges should exactly be dropped on other OSes by default,
I need to think about that a bit more and will follow up during the week.
However, this should be enough to get you started.

Would this approach work for you? Any other opinions?

Cheers,

-mato

Re: [solo5] Solo5, MireageOS and Write and Exec Memory

2018-06-21 Thread Martin Lucina

Hi Adam,

On Wednesday, 20.06.2018 at 22:07, Adam Steen wrote:
> Hi All
> 
> As some of you know I have been working to get MirageOS/Solo5 working on
> OpenBSD.
> 
> As of Friday of last week, I though i had at least achieved this, but
> after running an end to end test with released version the of
> Mirage-Skeleton Hello World Tutorial, I now find this causes as "
> mprotect W^X violation" on OpenBSD.

That's err, odd. Do you get a "Warning: phdr[N] requests WRITE and EXEC
permissions" when running ukvm-bin?

> I know Solo5 does not issue any mprotect requests with WRITE and EXEC
> permissions, but something in MirageOS does. I have be testing building
> the Hello World example for a while now without any problems of this
> nature, i am not sure where to start look at what changed.
> 
> I can permit W^X memory on OpenBSD with a change to the Solo5 configure
> script and a file system setting, but this has now missed the boat for
> this release and would prefer not to do this.

That would be just papering over the problem. As for fixes, it'll be while
yet before the post 0.3.0 renaming and restructuring gets done, I can
easily cut a point release before then for trivial fixes (no ABI changes).

> 
> Any tips on where i should look?

When you write "causes a mprotect W^X violation", how does that actually
manifest itself at run-time? Does ukvm-bin get killed? Or some other way?

I've looked at the build products (on Linux) for both the Solo5 standalone
tests and the mirage-skeleton tutorial/hello unikernel with "readelf -l",
and I don't see any phdrs asking for W and X protections at the same time,
so my guess would be something in the OpenBSD build is acting up (again).

-mato

Re: [solo5] Heads up: Release and code freeze plans

2018-06-12 Thread Martin Lucina

On Friday, 08.06.2018 at 17:21, Martin Lucina wrote:
> An update on this:
> 
> After discussing with some of the core Mirage folks and others at robur,
> we've come to the conclusion that it is better to release what we have now
> rather than wait after the renaming is complete.
> 
> So, the rough plan is to:
> 
>   1. Decide on a "flag day" and rewrite Git history on master to sort out
>   the issue of my mis-attributed commits,
>   2. Cut a "minimalist" Solo5 0.3.0 release of what we have now on master,
>   and subsequently the Mirage/Solo5 integration bits (as pointed to by the
>   OPAM repository in Solo5/opam-solo5).
>   3. After that, continue with the renaming and restructuring of the
>   codebase as outlined in #172.
> 
> I'll follow up in more detail on Monday.

And another update. Point 1 above is now moot as I have decided to just
push an empty commit with "errata", referring to the mis-attributed commits
in question.

I will proceed with point 2 over the course of this week.

Thanks,

-mato

Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm

2018-06-12 Thread Martin Lucina

Hi Nikhil,

On Tuesday, 12.06.2018 at 13:47, nikhil ap wrote:
> > I'm not sure what you mean. What selection of modules gets complied
> > in/enabled for a tender would be up to the operator of that tender to
> > determine as a policy decision. The tender would then, based on
> > interpreting the binary's manifest, determine whether or not it
> > "can/will/is allowed to" launch the (separately supplied) unikernel.
> >
> 
> Ok. I had thought we would compile-in all the modules for the tender.
> You are suggesting that if the operator only requires the net module,
> he will configure the tender by running tender-configure net
> which will only compile-in the net module and will feed the unikernel image
> with the manifest to this tender.

Well, what I think we should do is provide a default configuration, which
operators can trim down / extend as they see fit. With a move away from
compile-time coupling this would be done by a toplevel "configure.sh" which
would replace the per-tender script ("ukvm-configure") that we have now.

Again, the specifics of how this will actually work are yet to be
determined.

> Also, I was thinking once you are done with the re-naming, we could have a
> call to discuss and conclude on a design? What can be done for an initial
> phase?
> What tooling can we provide taking into the account the unikernels we
> support, etc.
> Others could join as well and at the  end we should be able to document the
> design. Thoughts?

Speaking from experience a video call is the worst possible format to
discuss designs. We should either do this asynchronously here / on GitHub
or organise a workshop in person.

> Another thing is since this is mostly a configuration based change, I can
> still
> come up with a proposal for multi-nic assuming we've loaded the manifest
> and
> determined how many NICs we need. I can do a write-up on what are the
> changes
> that are required for the tender-binding-application.  Is this fine?

All in good time. Lets get the release and renaming out of the way first,
then we can discuss what happens next.

-mato

[solo5] Correcting my mis-attributed commits

2018-06-11 Thread Martin Lucina

All,

as I mentioned in my email regarding release and code freeze plans, I had
mistakenly committed several commits earlier this year using an incorrect
former employer's email address.

Rather than go through the pain associated with rewriting public Git
history, given that this is essentially an "administrative change" with no
other purpose than to correct the public record, I have instead pushed an
empty commit today to the Solo5 master branch correcting the error and
referring to all the individual commits in question.

For the record, full text of the commit follows.

--- cut here ---
commit 0bea597637ec2beada2ad1e4df4d94875f0e257e 
    
Author: Martin Lucina 
Date:   Mon Jun 11 15:10:44 2018 +0200

Correct mis-attributed commits

The following 12 commits, performed between Thu Jan 25 16:26:05 2018
+0100 and Wed Mar 28 18:46:31 2018 +0200 have an incorrect Author and/or
Committer of "Martin Lucina ". This commit is
an errata to reflect that the correct Author and/or Committer for these
    commits is "Martin Lucina ".

173869df7aa2827f273386b01f6e1627d76d1de9
6994d79105286bdad4973f165e3765b799d1eafc
3918be0b64fa25f3cd9af21cec5f7f2c2f843fb1
956db83dbbd973224b9f8b38ec04ee7baa55d863
8643dee74acd07ef558a0fa60dfc862500a6d4fd
82697d6d8bbddf27aaa904c72970173b2a4911a5
f2c2b821b8949a0cbb5d65185645c5c887240792
88778700d9cb431d4408f60ae2d9e2a4575973c7
cd266928e4de28828d517ca9860f80d966c60cee
1f0c4e925489f094c807fa34f1a86151c10ab7cc
e944f29421b96102a94e7a07a6a7ab025f1c9d7b
7c8674e9c5881db19adef10dea41f1aae4e12da3
--- cut here ---

Martin

Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm

2018-06-11 Thread Martin Lucina

Hi Nikhil,

On Monday, 11.06.2018 at 16:10, nikhil ap wrote:
> I'm not familiar with MirageOS. I need to look at how it works.
> Although, this implies that we need to change the tooling for every
> unikernel we support.

That's correct.

> For ex, MirageOS, IncludeOS, Rumprun etc. Also, whenever we need to add
> configuration parameters/resources
> to the manifest, we would have to change this everywhere to reflect this.
> Is it possible to have
> some kind of stand-alone tool which the other unikernels can trigger during
> the configuration time
> to be able generate the binary with the manifest?

Yes, obviously the goal would be to have easy to use tooling available to
assist the various guest unikernel build systems to generate the manifest.
Integrating it into the final unikernel binary should just be a case of
passing the resulting object to 'ld'.

> > If we split the manifest and associated artefact, then this has several
> > disadvantages:
> >
> > 1) The user/operator now has to keep track of two files. Or we have to ship
> > as a .tar/other container, in which case we may as well just embed the
> > manifest in the ELF binary.
> >
> > 2) The coupling is much looser, so the potential for messing things up is
> > accordingly higher.
> >
> > If we combine the two in a single artefact, with the associated build-time
> > runes on the libOS side being generated by the build system (i.e. the
> > developer does not create a manifest "manually") then these problems go
> > away.
> >
> 
> I guess an advantage of using a separate configuration file would be that
> we
> can compile only the module that is required.

I'm not sure what you mean. What selection of modules gets complied
in/enabled for a tender would be up to the operator of that tender to
determine as a policy decision. The tender would then, based on
interpreting the binary's manifest, determine whether or not it
"can/will/is allowed to" launch the (separately supplied) unikernel.

> > By the way, you mentioned JSON. JSON is absolutely unsuitable for this
> > purpose; given that the (untrusted) manifest needs to be interpreted by
> > e.g. a (trusted) tender, the format needs to be designed in a way that
> >
> .. a way that? Would like to know this ends :)

...that allows for secure deserialization of untrusted data? See here for
an (old) comparison of some formats:
https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html

Personally I'm very much in favour of minimalism, so there's also the
possibility of just inventing a custom binary format tailored to only the
information we need to store in the manifest. I don't yet know which
approach is best in the long term.

Note that we should be able to version the manifest as a whole, so we could
start with a simple custom format and move to something more involved later
if there is a need.

> 
> I thought of Json because IncludeOS uses json to configure the
> resources and specify them to Qemu.
> 
> An example:
> {
>   "image": "test_router.img",
>   "mem" : 320,
>   "net" : [{"device" : "virtio"},
>{"device" : "virtio"}],
> }
> 
> So in order for us to be able to support IncludeOS we would have to have
> their
> build system (cmake) parse the json file and emit a tender binary with the
> manifest.

No, the manifest is part of the change to a run-time coupling with a
tender. I.e. the IncludeOS build system only has to produce a unikernel
with a manifest, not a tender binary. Those would be supplied separately by
downloading and building the "new" Solo5.

-mato

Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm

2018-06-11 Thread Martin Lucina

Hi Nikhil,

On Monday, 11.06.2018 at 10:15, nikhil ap wrote:
> > The issue is not so much working out what the guest-side API (i.e. that in
> > solo5.h) should be, but keeping the conceptual integrity of how Solo5
> > (kernel/bindings), ukvm (monitor/tender) and application (unikernel) fit
> > together. This will require a digression into the project's history, so
> > please bear with me.
> >
> When you say binding, do you mean the current hypercall interface?

Based on the terminology discussed in #172 (ukvm renaming), the "bindings"
are the implementation of the Solo5 public API for a particular target.
"Target" in this case refers to a host/hypervisor/tender combination. In
terms of code organisation, bindings are what we have today in kernel/.

This should all be much clearer after the renaming and reorganisation is
done, hence the need for that to happen sooner rather than later.

> > My preferred approach (discussed with Dan & Ricardo several times, but not
> > in public) would be to replace the current model of build-time coupling
> > with run-time coupling. The main points of a  "back of the napkin"
> > implementation would be something like this:
> >
> > 3.1. The application binary, supplied by the "user", would contain a
> > "manifest" (e.g. inside a special ELF note) declaring which resources
> > (NICs, block storage, future IPC, etc.) it requires to operate.
> >
> 
> How should the user add this manifest? Do we need to use the toolchain to
> create such a binary?
> For example, do we use objcopy to add the elf note? May be using an example
> with Mirage or IncludeOS might
> help me in understanding better.

In the case of MirageOS, the "mirage" front-end tool already has all the
information it needs to produce such a manifest. So, for MirageOS the
relevant build-time commands (using objcopy, or generating a .c/.S which
produces the relevant special section, to be determined) would be generated
at "mirage configure" time, and then executed at "mirage build" time.

I'm not familiar with IncludeOS' build system, but I presume a similar
mechanism can be implemented there.

> Also, what is the advantage of using the manifest inside the binary instead
> of using a configuration file such as
> json to specify the resources? We could read the configuration file, which
> would include all the information you mentioned
> in a manifest and could generate a tender which is crafted for the
> application. Wouldn't this satisfy the 2 points above?

If we split the manifest and associated artefact, then this has several
disadvantages:

1) The user/operator now has to keep track of two files. Or we have to ship
as a .tar/other container, in which case we may as well just embed the
manifest in the ELF binary.

2) The coupling is much looser, so the potential for messing things up is
accordingly higher.

If we combine the two in a single artefact, with the associated build-time
runes on the libOS side being generated by the build system (i.e. the
developer does not create a manifest "manually") then these problems go
away.

By the way, you mentioned JSON. JSON is absolutely unsuitable for this
purpose; given that the (untrusted) manifest needs to be interpreted by
e.g. a (trusted) tender, the format needs to be designed in a way that

> > 3.3. It follows that the tender can now be supplied separately by the
> > "operator". At start-up, it would load the manifest and use it to determine
> > which modules to enable [4], based on this list it would also require the
> > "operator" to configure all host resources for all enabled modules.
> >
> 
> Does this mean all the modules need to be compiled and included in the
> tender
> even though the application might not load it?
> I'm assuming by loading you mean by adding it to the module table which we
> have now in tender.

Well, I've not thought through the implementation in full yet. "Loading"
might mean dynamic linking, or something else. Or, in an initial
implementation it might mean what you write (all modules are compiled in,
unused ones get disabled). If we get the design right then the
implementation can evolve.

> I would like to understand the scope of your design before proceeding.
> May be I can build a minimal prototype which affirms to the
> tender/unikernel requirement we have and *then*
> can look at supporting multiple NICs. (Although I do a gross hack for
> supporting multiple nics already
> but I don't think its going to be useful considering what is required)

Sure, I can try and write down more of the design, but need to get
releasing and renaming out of the way first, otherwise we're going to keep
getting confused about the terminology.

-mato

[solo5] Heads up: Release and code freeze plans

2018-06-01 Thread Martin Lucina

Hi all,

as of today all the outstanding PRs on the Solo5 repository has been
merged, so we are in a good position to set concrete plans for the next
release (tentatively 0.3.0). 

The main item I have on my list, to be done either before or immediately
after a release is #172 ("renaming ukvm to a more generic name"), and the
associated code and terminology reorganisation discussed there. I have been
putting off working on this partly becase it will touch everything in the
codebase, so it would have made rebasing the existing in-flight PRs hard
for contributors.

There are two options for how this can be done:

1) Release, freeze, rename: We cut a release now, with the code in a state
where existing features are tested and stable (well, OpenBSD is still new
and thus "experimental", but that's fine). After that there will be a
freeze period (say, ~2 weeks) during which I will work on #172.

2) Freeze, rename, release: We do the renaming first, then cut a release.

The disadvantage of option 1) is mainly for downstreams (MirageOS), since
the renaming will change the user-visible terminology for targets (ukvm ->
vt), and the goal of #172 is to get everyone (users, developers,
downstreams) on the same page terminology and code-wise.

However, even though it's "just" a rename and refactoring, after discussing
with Dan and Ricardo this week it does seem prudent to release known-good
code. I'll discuss this next week with the MirageOS core team and make a
decision then.

There is one other annoying "administrative" issue. Over the course of the
past few months I have mistakenly committed a bunch of commits to the
repository with a former employer's email address. I'd like to fix this for
legal/attribution reasons, however the only way to accomplish this is to
rewrite the history on the public Solo5 "master" branch. This will mean
that everyone's existing clones and/or outstanding PRs will need to be
manually fixed.

The freeze looks like a good (only?) opportunity to get this done with the
least amount of disruption. I'll send out a separate email about this after
investigating exactly what needs to be done and how it will break other
repositories in more detail.

-mato

Re: [solo5] Thoughts on supporting multiple NICs on solo5/ukvm

2018-05-30 Thread Martin Lucina

Hi Nikhil,

On Tuesday, 22.05.2018 at 13:18, nikhil ap wrote:
> Currently solo5/ukvm supports only one NIC. I wanted to know your thoughts
> on supporting multiple NICs. This is really important and essential
> requirement when trying to run on IncludeOS unikernels which is heavily
> focused on networking and NFV.
> 
> After talking to Ricardo, I understand that the difficulty is in coming up
> with a new API. I wanted to start off the discussion.

The issue is not so much working out what the guest-side API (i.e. that in
solo5.h) should be, but keeping the conceptual integrity of how Solo5
(kernel/bindings), ukvm (monitor/tender) and application (unikernel) fit
together. This will require a digression into the project's history, so
please bear with me.

One of the original goals for ukvm, outlined in Dan and Ricardo's original
2016 paper on the subject [1] was to "specialize" the monitor to match the
running unikernel. Among other things, the benefits of this approach are:

1.1. We ensure that the trusted tender code is minimal and does not contain
or enable any "unused" interfaces. Such interfaces are a well-known source
of security issues (see e.g. the now-famous QEMU floppy-drive bug).

1.2. At the same time, we ensure that operation of the tender is _coupled_
to the unikernel it is running. In other words, in order for an
operator/orchestrator to run a ukvm-based unikernel today, *all* the host
resources (devices) the unikernel requires to operate are configured at
start-up. You cannot [2] run a ukvm-bin configured with a "UKVM_MODULE_NET"
without also specifying with --net which host resource the guest device
should be bound to.

Both of these are intentional and constitute a large part of what makes
Solo5/ukvm unique compared to many similar systems that exist today. The
way they are accomplished today is by _coupling_ the tender to the
unikernel at _build time_.

We (Dan, Ricardo and I) have known for quite some time that the build-time
coupling of the tender and unikernel -- while it seemed like a good idea at
the time -- is not practical for use in real-world scenarios, for several
reasons:

2.1. "Users" (producers of "untrusted" applications that run under a Solo5
tender) should not be providing their own "trusted" tender binary. This
goes against any security model that e.g. a public/private cloud deploying
such applications would want to use in practice [2].

2.2. Build-time coupling is totally inflexible for multi-platform host OS
support, as it requires the "user" to build all possible combinations of
tender binaries for all targets they might want to deploy their application
on.  For example, a "user" building their unikernels on Linux but deploying
on FreeBSD has no way [3] to produce a "correct" tender binary.

2.3. Build-time coupling cannot cleanly support N instances of a resource
(be that NICs, or block devices) while at the same time staying consistent
with point (1.2) above.

The question is, how can we resolve the above points going forward, keeping
in mind (1.1) and (1.2)?

My preferred approach (discussed with Dan & Ricardo several times, but not
in public) would be to replace the current model of build-time coupling
with run-time coupling. The main points of a  "back of the napkin"
implementation would be something like this:

3.1. The application binary, supplied by the "user", would contain a
"manifest" (e.g. inside a special ELF note) declaring which resources
(NICs, block storage, future IPC, etc.) it requires to operate.

3.2. The "manifest" would include information regarding the exact ABI
version that the application requires, so we could actually provide a
stable ABI contract between the tender and application, which we do not do
today.

3.3. It follows that the tender can now be supplied separately by the
"operator". At start-up, it would load the manifest and use it to determine
which modules to enable [4], based on this list it would also require the
"operator" to configure all host resources for all enabled modules.

Getting to this point requires a significant amount of development work,
and time, however I think its absolutely necessary for Solo5 to be widely
usable in practice.

If you want to get support for multiple NICs before then, I'm interested in
interim compromise proposals (prose first, not code!) that consider the
changes needed to all components involved (tender, bindings, application)
and can work with the current source-time coupling while not being too
gross of a hack. I've thought of several options, but couldn't find one
that passed.  Rather than presenting them right now, I'd prefer to hear
ideas from others on what approaches might work.

Thanks for reading, and I hope this explains why it's not "just" an issue
of adding an index to the guest-side APIs. Please ask questions.

-mato

[1] 
https://www.usenix.org/system/files/conference/hotcloud16/hotcloud16_williams.pdf

[2] There are people playing various tricks today to make this work, it's
not

[solo5] OpenBSD status (was Re: IRC?)

2018-05-04 Thread Martin Lucina

(Cc:ing the list as others may have ideas or be able to help you with
diagnosing OCaml build/toolchain issues)

> [09:46]   ← │ adamsteen (~asteen@203.63.192.100) has left #mirage

Oh, I didn't realise you were around at all. Sorry!

I've left comments on the pkgconf issue in #226; however the more pressing
problem is the trap I'm seeing on 6.3. If you can find time to look into
that then we can hopefully merge when I get back (14th), or ping @djwillia
/ @ricarkol if things look good, they can also click the green button.

My intuition is that it's more likely to be an issue with the OCaml parts
of the build system and/or toolchain than ukvm itself since the C tests/
run fine...

... which is confirmed by the fact that I just tried running a
tutorial/hello/hello.ukvm built on Linux using the ukvm-bin built on
OpenBSD, and it works fine[*]:

$ doas ./ukvm-bin ./hello.ukvm.linux
|  ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
/\___/ _|\___//
Solo5: Memory map: 512 MB addressable:
Solo5: unused @ (0x0 - 0xf)
Solo5:   text @ (0x10 - 0x1e8fff)
Solo5: rodata @ (0x1e9000 - 0x220fff)
Solo5:   data @ (0x221000 - 0x2d0fff)
Solo5:   heap >= 0x2d1000 < stack < 0x2000
2018-05-03 22:25:55 -00:00: INF [application] hello
^Cukvm-bin: Exiting on signal 2

[*]: Well, it appears to hang (or sleep forever) after the first message,
but that may be due to timekeeping in general being screwed when running
OpenBSD under nested KVM (which I noticed recently), could be confirmed by
running on bare metal.

For reference, comparing the output of the OpenBSD-built unikernel:

$ doas ./ukvm-bin ./hello.ukvm  
|  ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
/\___/ _|\___//
Solo5: Memory map: 512 MB addressable:
Solo5: unused @ (0x0 - 0xf)
Solo5:   text @ (0x10 - 0x21dfff)
Solo5: rodata @ (0x21e000 - 0x254fff)
Solo5:   data @ (0x255000 - 0x4)
Solo5:   heap >= 0x5 < stack < 0x2000
Solo5: trap: type=#UD ec=0x0 rip=0x1008 rsp=0x1df8 rflags=0x10002
Solo5: ABORT: cpu_x86_64.c:171: Fatal trap

So, the OpenBSD-built binary is obviously wrong (bogus end of data segment,
RIP looks like execution went off into never never land). I would first
try if you can reproduce it yourself on OpenBSD 6.3, compare with
OpenBSD-current then try and track down the cause.

Aside: We should probably add some more sanity checks into the ukvm ELF
loader, to refuse to load binaries with obviously bogus PHDRs.

Cheers,

-mato

Re: [solo5] ARM64 CI now up and running

2018-04-30 Thread Martin Lucina

On Monday, 30.04.2018 at 22:21, Martin Lucina wrote:
> Due to there not being any nested KVM support for aarch64, this is a bit
> more involved and instead runs in a suitably privileged Docker container[2]
> using surf-build v2.0.0.beta.4, plus some custom driver scripts which I
> have yet to commit somewhere.

I have now committed the driver script (sans the wrapper with credentials)
to the Solo5/solo5-ci repository, and in the process have updated the
README to reflect the new setup better.

> The builder will show up on (new or rebased) PRs as
> "aarch64-Debian9-gcc630".

Actually, that's not true, it will also pick up pushes to existing open
PRs, where the aarch64 builder will fail if they are not rebased on master
post f4a4755f76ae1ddb8f70c4209dffa3c0438faaab (yesterday) due to changes
required in build.sh.

-mato

[solo5] ARM64 CI now up and running

2018-04-30 Thread Martin Lucina

Hi all,

After spending just over a week in Node.js hell, I'm happy to report that
ARM64 CI with end to end tests (i.e. not just the old compile-only setup)
is now back up and running on a Raspberry PI 3 B+ supplied by Hannes, and
installed with a mainline kernel (v4.16.x) and plain Debian thanks to 
derpeter's HOWTO[1].

Due to there not being any nested KVM support for aarch64, this is a bit
more involved and instead runs in a suitably privileged Docker container[2]
using surf-build v2.0.0.beta.4, plus some custom driver scripts which I
have yet to commit somewhere.

The builder will show up on (new or rebased) PRs as
"aarch64-Debian9-gcc630".

Now, finally time to get back to reviewing and merging Adam's OpenBSD work.

Cheers,

-mato

[1] https://mirage.io/wiki/arm64
[2] https://github.com/Solo5/solo5-ci/blob/master/any-Debian9-gcc630/Dockerfile

Re: [solo5] Improving solo5's network performance

2018-04-25 Thread Martin Lucina

Hi Nikhil,

On Tuesday, 24.04.2018 at 20:42, nikhil ap wrote:
> [...]
> > What exactly do you mean by "polling mode"? Just using poll() on
> > non-blocking sockets similarly to what the current ukvm implementation of
> > solo5_yield() does?
> >
> 
> Polling mode: IO thread keeps polling for packets on tapfd and shmstream
> and if there is nothing, sleeps for a millisecond. Hence the CPU
> consumption is high even when there is no data.

Right, so that's basically busy-waiting.

> Event mode: IO thread waits for events using epoll. Only wakes up when
> there is actually data to be read from tapfd and shmstream.  When there is
> no data CPU usage is 0% but it has more VMEXITs than polling mode because
> the solo5 kernel need to notify when the packets are queued on shmstream
> using the hypercall. Hence queuing multiple packets onto shmstream and
> notifying it with solo5_net_flush() will reduce the number of hypercalls.

Ok, makes sense.

> > This would require a portability layer to run on the BSDs. If I'm reading
> > the figures you sent correctly, it looks like polling performs better
> > anyway?
> >
> Polling mode is just to figure out what is the best we can do but I'm not
> sure if it is suitable for production. The goal is to achieve a very good
> performance which is comparable to qemu and which is not CPU intensive.

Indeed, busy-waiting is not something we'd want to use in production. In
any case, as you write, the goal is to achieve performance which is
comparable to QEMU, not necessarily faster at any cost.

> > The same thing could presumably be accomplished with a writev()-style API,
> > or is this doing something different?
> 
> With solo5_net_queue() API, we can queue packets in shmstream without
> notifying ukvm-bin to read from it and signal ukvm-bin with
> solo5_net_flush().

I haven't seen your APIs, but based on this description I'm assuming a
vector write (i.e. write X packets in one call) would work.

If we do end up modifying the Solo5 APIs we need to do so carefully to
accomodate other possible targets (seccomp, SGX), and also take into
account any impact on possible failure semantics.

> > Do you have any statistics comparing (host) CPU load for the various tests?
> >
> CPU load is mentioned in my initial email which is on the host or do you
> require something else?

I think that's sufficient for now, please include those figures when you
send an updated table with the TX-side performance.

> [...]
> > So, I suggest that you continue to iterate on this work, ideally in public,
> > and submit a PR after the next Solo5 release.
> >
> Any ETA on the next solo5 release?

The usual answer is "when it's ready" :-) Having said that:

I'm going to be on vacation May 4th through 13th, and I don't expect to
have things ready before then. The main thing which is dragging on at the
moment is getting CI set up for ARM64 and OpenBSD, since I don't want to
make a release without having this in place for all supported targets.

So, realistically, mid-late May, fingers crossed.

Cheers,

-mato

Re: [solo5] Improving solo5's network performance

2018-04-24 Thread Martin Lucina

Hi Nikhil,

On Friday, 20.04.2018 at 20:26, nikhil ap wrote:
> Hi Guys,
> 
> Summarising the discussions we've had so far and the work I've been doing:

Thank you for this summary and for doing these experiments. The numbers are
very impressive.

For those who were not on the private email thread about this, Nikhil has
been working on a minimal PoC implementation of the Solo5/ukvm network APIs
with a shared memory transport based on the "shmstream" protocol used by
the Muen Separation Kernel.

>- Implemented network interface using shmstream (shared memory) in order
>to reduce the number of hypercalls and thus reduce the number of VMEntries
>and VMExits.
>- Separate IO thread is implemented in ukvm-bin to read/write packets
>from the shared memory.
>- IO thread currently supports polling mode and event mode.

What exactly do you mean by "polling mode"? Just using poll() on
non-blocking sockets similarly to what the current ukvm implementation of
solo5_yield() does?

>- Event-driven model is implemented using eventfds and io-thread waits
>for events using epoll.

This would require a portability layer to run on the BSDs. If I'm reading
the figures you sent correctly, it looks like polling performs better
anyway?

>- The applications could run both the modes without any changes to their
>APIs.
>- Currently shmstream mode can be set with --shm option.
>   - Ex: ./ukvm-bin --net=tap100 --shm=poll test_ping_serve.ukvm

In the grand scheme of things, I'd like to eventually replace the current
implementation with a shmstream-based one entirely, if it proves to be
portable (host OS and arch-wise) and secure enough. So, it would not be
user-selectable. However, there are also other things that need to happen
before that (see below).

>- However, in case of event-mode, for better performance, the
>application can chose to notify ukvm-bin after queuing *all* the packets in
>the shared memory instead of a single packet transmit by using a new solo5
>public APIs:   solo5_net_queue() and solo5_net_flush();

The same thing could presumably be accomplished with a writev()-style API,
or is this doing something different? What is the performance gain in
practice? In the figures you sent, is that with or without this
optimization?

>- Solo5 performance was tested with IncludeOS (IncludeOS had to be
>modified to address the recent API changes in solo5) and with UDP traffic.
>- Summarising the results below

Do you have any statistics comparing (host) CPU load for the various tests?

Next steps:

I'd like to push out a formal release of Solo5 that integrates the work
done over the past year which does not have a proper formal release
(FreeBSD vmm support, ARM64 support) and some of the work which is "ready
to merge" ASAP (OpenBSD vmm support, possibly your work on guest-side core
dumps). This would also include the renaming of the various Solo5
components as discussed in #172.

Once that is done we will then have time to look at integrating your
shmstream-based network implementation, and, as I wrote above, ideally to
replace the current implementation entirely.

In keeping with the minimalist design of Solo5 and ukvm, rather than
providing several different network implementations, or variants of one
(polling vs.  event-driven, etc.) I'd prefer that we work towards choosing
one single variant to use and support [as a default], otherwise we run the
risk of going down the "QEMU-style" route of a proliferation of different
optional components, which is both hard to understand for users and hard to
comprehensively test and secure.

So, I suggest that you continue to iterate on this work, ideally in public,
and submit a PR after the next Solo5 release.

Cheers, and thanks again,

-mato

39 matches

Mail list logo