On Fri, Oct 05, 2018 at 01:50:10PM +0200, Sergio Lopez wrote: > Hi, > > I have an idea in mind that I'd like to share to ask you if you think > it's worth giving it a try.
I had a chance to think further on this. See comments inline. > > Right now, vmd already features an excellent privsep model to ensure > the process servicing the VM requests to the outside world is running > with the lowest possible privileges. > > I was wondering if we could take a step further, servicing each virtio > device from a different process. This design would simplify the > implementation and maintenance of those devices, improve the privsep > model and increase the resilience of VMs (the crash of a process > servicing a device won't bring down the whole VM, and a mechanism to > recover from this scenario could be explored). Our model is generally to not try to recover from crashes like that. Indeed, you *want* to crash so the underlying bug can be found and fixed. > > Doing this in an efficient way requires: > > - The ability to receive virtqueue notifications directly on the > process. I've already sent an RFC patch for this (see "Lightweight > mechanism to kick virtio VQs"), as it'd be useful for the non-separated > model too. I'll comment on that diff in that email. > > - An in-kernel IRQ chip. This one is the most controversial, as it > means emulating a device from a privileged domain, but I'm pretty sure > a lapic implementation with enough functionality to serve *BSD/Linux > Guests can be small and simple enough to be easily auditable. This needs to be done, for a number of reasons (device emulation being just one). pd@ and I are working on how to implement this using features in recent CPUs, since much of the LAPIC emulation can now be handled by the CPU itself. We're thinking skylake and later will be the line in the sand for this. Otherwise the software emulation is more complex and more prone to bugs. I've resisted the urge to put this stuff in the kernel for exactly that reason, but with later model CPUs we may be in better shape. We may also decide to focus solely on x2APIC. If you're interested in helping in this area, I'll keep you in the loop. > > - A way to map the VM memory into a third process context. Can > uvm_share for this? Here's also the opportunity to explore options to > avoid mapping the whole VM memory, though that'll possibly require > functionality non covered by the virtio specs. I don't think this is easily achievable without some uvm surgery. > > Do you think it's worth exploring this model? What are feelings > regarding the in-kernel IRQ chip? > > Sergio (slp). > All things considered, I'm less sold on the idea of splitting out devices into their own processes. I don't see any compelling reason. But we do need an IOAPIC and LAPIC implementation at some point, as you point out. -ml
