On Thu, 2018-10-11 at 09:02 -0700, Mike Larkin wrote: > On Wed, Oct 10, 2018 at 05:04:34PM +0200, Sergio Lopez wrote: > > On Mon, 2018-10-08 at 09:58 -0700, Mike Larkin wrote: > > > On Fri, Oct 05, 2018 at 01:50:10PM +0200, Sergio Lopez wrote: > > > > Hi, > > > > > > > > I have an idea in mind that I'd like to share to ask you if you think > > > > it's worth giving it a try. > > > > > > > > Right now, vmd already features an excellent privsep model to ensure > > > > the process servicing the VM requests to the outside world is running > > > > with the lowest possible privileges. > > > > > > > > I was wondering if we could take a step further, servicing each virtio > > > > device from a different process. This design would simplify the > > > > implementation and maintenance of those devices, improve the privsep > > > > model and increase the resilience of VMs (the crash of a process > > > > servicing a device won't bring down the whole VM, and a mechanism to > > > > recover from this scenario could be explored). > > > > > > > > Doing this in an efficient way requires: > > > > > > > > - The ability to receive virtqueue notifications directly on the > > > > process. I've already sent an RFC patch for this (see "Lightweight > > > > mechanism to kick virtio VQs"), as it'd be useful for the non-separated > > > > model too. > > > > > > > > - An in-kernel IRQ chip. This one is the most controversial, as it > > > > means emulating a device from a privileged domain, but I'm pretty sure > > > > a lapic implementation with enough functionality to serve *BSD/Linux > > > > Guests can be small and simple enough to be easily auditable. > > > > > > > > - A way to map the VM memory into a third process context. Can > > > > uvm_share for this? Here's also the opportunity to explore options to > > > > avoid mapping the whole VM memory, though that'll possibly require > > > > functionality non covered by the virtio specs. > > > > > > > > Do you think it's worth exploring this model? What are feelings > > > > regarding the in-kernel IRQ chip? > > > > > > > > Sergio (slp). > > > > > > > > > > Lots of things to read through in this and the attached diff. I'll try to > > > catch up and reply as soon as I can. > > > > Ack. Let me know if you need me to split it in different patches, or if > > you want to take a look at its use on vmd (the actual patch for vmd > > still needs polishing, but works). > > > > Sergio (slp). > > > > I'm not sure if splitting this into a separate process or using a taskq is > the right way to go ; dlg@ had done the latter before, but we dropped that > for some reason.
Well, it's not a question of one thing or the other. It'd be possible to have some devices serviced by code emmbedded in vmd, as it is today, and others serviced by separated processes. A nice use case for having a separated process servicing a device could be implementing a virtio_blk disk with a complex backend, like ceph or glusterfs. Not only you would avoid linking vmd against librbd/libgfapi, but also would be able to attach gdb to inspect it without halting the whole VM. Perhaps even restarting the process (hot- plug/unplug?). On the other hand, the underlying features (kickfd/in-kernel lapic) are useful on their own, even with all devices being served by vmd, providing lower latency, less vmexits, and parallel execution of I/O requests. The only drawback is enlarging the kernel a bit, but I guess this could be out of the GENERIC* configs by default. IMHO this is win-win scenario. Sergio (slp).