Antti,
Hmm, I can't imagine how a thread scheduler would fail so that you can
> detect it and reboot the service.
I'd absolutely love to chat about it, but the topic is likely a little
tangential for this list. For now, I'll leave it at sweat, blood, and
magic.
> But a rump kernel won't be useful for anything where you want to guarantee
> bound latency, no? Or does a tool exist which can figure things out?
> Sounds pretty difficult, even with annotations.
>
Correct. As it happens, most of the parts of the system that require
strict bounds do not require loads of functionality, and we have facilities
for strictly isolating the wild-west POSIX side from these predictable
parts. I would like to investigate using only the device drivers from RK,
and hooking them with a surrounding predictable core, and seeing what
happens.
> If by "3" you mean you'd like to run sysproxy (as described in chapter
> 3.12.4 of the blue book) on top of your IPC, sure.
Yes.
> The current POSIX implementation is unfortunately incorrect abstracted on
> top of sockets. Though, if your endpoints are not running on top of POSIX,
> it doesn't really matter, since you need to write your own implementation
> anyway. The sysproxy protocol is expected to be stable. Though, I want to
> add one more message type to signal "configuration complete", but never
> seem to get around to it. There are a bunch of parties interested in
> getting alternative sysproxy transports going. Hopefully there can be
> maximal sharing and minimal reinventing of the wheel (which is not to say
> that nobody needs to do any work ;)
>
I honestly haven't looked into the code enough to know how much we'll have
to adapt sysproxy. Our endpoints look identical to function invocation
(but with a round-trip overhead of ~600 cycles on x86-32), and I'd like to
eventually modify sysproxy (with the requisite changes to APIs above and
below) to support our zero-copy message passing.
> I like "4".
Me too. I see breaking the RK functionality into separate, isolated
components as a logical conclusion.
First of all, the NetBSD libc used in Rumprun does not include the standard
> implementation of errno -- see use of ".if ${RUMPRUN}" in the libc
> Makefile. As far as I can recall, there are two reasons: 1) errno was
> needed before a bunch of other stuff was figured out 2)
> https://github.com/rumpkernel/rumprun/blob/61e5b8d98bd8a3f665f030e4935d604ba90d11ff/lib/libbmk_core/sched.c#L729-L739
>
> (ok, technically we could just use a compile-time definition for
> kernonly-mode to solve "2", but since what was done for "1" continues to
> work, there has been no reason to change it)
>
Thanks for the context! We did not know about that option in the libc
Makefile.
> But, if you're using libc, you probably want some real applications to run
> on top, and you can't expect none of those to use __thread, so eliminating
> __thread from system components seems really like a non-starter.
>
I work in the research domain, so I'm fine with hacking applications to use
our own support for TLS. That said, we've moved on to both 1. manually
changing the offending code that uses __thread to make progress, while also
2. trying to hack in to Composite sufficient, but not complete, support for
using %gs. 1. is motivated by the fact that we'd like to not wait till 2.
is done while still making progress on booting the system.
So me just rephrase some things once more. The Rumprun unikernel is made
> up of multiple layers, only one of which is the rump kernel. The logical
> stack of layers looks something like this. Let's called this stack "1":
>
> POSIX application (including libs)
> libc + rumprun base
> rump kernel
> bmk (including rumpuser)
>
> Or, alternatively, if you're shooting for minimal or simply don't care
> about existing POSIX software, you can use "kernonly" Rumprun and write
> your application from scratch against the rump kernel syscall interfaces,
> stack "2":
>
> homegrown application
> rump kernel
> bmk (including rumpuser)
>
> And, though I think you already said you evaluated the case without bmk
> and your own implementation of rumpuser, for the sake of completeness: you
> can swap out bmk from the above and still have a working system. However,
> if you want to replace bmk in "1", you need to note that bmk is used to
> implement some functionality used by libc due to the rump kernel not
> providing said functionality (e.g. page allocator). So, if you're
> interested in your own bmk implementation for stack 1, you need to be
> mindful. For stack 2 you only have to worry about the rumpuser interface
> ... because, once again, the rump kernel was designed and implemented as an
> integratable framework with no standalone value, while Rumprun was designed
> and implemented as a "full stack" solution.
>
Thank you for the context. I think that our understanding (thankfully)
matches your explanation. It took a few months in the summer to understand
down to the code level how all of this fits together. And there's a lot
left to understand...
Some context from our side. Our current implementation plan is as follows:
1. Create a layer below bmk which removes the lowest-level functions, and
implements them using Composite primitives (context switch, initial memory
image allocation, interrupts). We're testing this now, and it has been in
place for the past few months.
2. Get the rumprun unikernel booting with trivial POSIX test programs and
no real devices. We're currently at the point where libc pthread data is
being initialized.
3. Get PCI working in the system by implementing the rumpuser interface for
PCI.
4. Get a networking device working in that environment.
5. Get the entire rumprun unikernel working in a single Composite component
with nginx (we've already testing nginx in bmk without Composite).
6. Integrate the scheduling infrastructure in the unikernel into the
surrounding Composite scheduling system to enable additional, isolated
subsystems. This will include replacing the simple bmk scheduling
policy. Additionally integrate support for the Composite system wide
memory management (replacing the bmk buddy allocator here). This is mostly
Composite work.
7. Multiple unikernels.
8. Attempt to port sysproxy to our IPC transport. nginx as a test case.
Multiple nginx as an additional test case.
9. Look into directly communicating with the rump kernel (avoiding libc as
in your stack "2" above), and see if we can support a. zero-copy
communication, and b. direct communication with the lower layers of the
rump kernel (i.e. talk almost directly to drivers).
So we're going to be starting with your stack "1", and moving over time to
"2" (while still hopefully still supporting "1"). There are a lot of more
research-y tasks along the way, and after all of that is done, but this
isn't likely the forum for discussing that.
Thanks again for all the context and explanations. They have been
informative, and have been an amazingly useful consistency check that our
understanding is mostly in line with reality.
Best,
Gabe