On Thu, 2018-10-11 at 09:02 -0700, Mike Larkin wrote:
> On Wed, Oct 10, 2018 at 05:04:34PM +0200, Sergio Lopez wrote:
> > On Mon, 2018-10-08 at 09:58 -0700, Mike Larkin wrote:
> > > On Fri, Oct 05, 2018 at 01:50:10PM +0200, Sergio Lopez wrote:
> > > > Hi,
> > > > 
> > > > I have an idea in mind that I'd like to share to ask you if you think
> > > > it's worth giving it a try.
> > > > 
> > > > Right now, vmd already features an excellent privsep model to ensure
> > > > the process servicing the VM requests to the outside world is running
> > > > with the lowest possible privileges.
> > > > 
> > > > I was wondering if we could take a step further, servicing each virtio
> > > > device from a different process. This design would simplify the
> > > > implementation and maintenance of those devices, improve the privsep
> > > > model and increase the resilience of VMs (the crash of a process
> > > > servicing a device won't bring down the whole VM, and a mechanism to
> > > > recover from this scenario could be explored).
> > > > 
> > > > Doing this in an efficient way requires:
> > > > 
> > > >  - The ability to receive virtqueue notifications directly on the
> > > > process. I've already sent an RFC patch for this (see "Lightweight
> > > > mechanism to kick virtio VQs"), as it'd be useful for the non-separated
> > > > model too.
> > > > 
> > > >  - An in-kernel IRQ chip. This one is the most controversial, as it
> > > > means emulating a device from a privileged domain, but I'm pretty sure
> > > > a lapic implementation with enough functionality to serve *BSD/Linux
> > > > Guests can be small and simple enough to be easily auditable.
> > > > 
> > > >  - A way to map the VM memory into a third process context. Can
> > > > uvm_share for this? Here's also the opportunity to explore options to
> > > > avoid mapping the whole VM memory, though that'll possibly require
> > > > functionality non covered by the virtio specs.
> > > > 
> > > > Do you think it's worth exploring this model? What are feelings
> > > > regarding the in-kernel IRQ chip?
> > > > 
> > > > Sergio (slp).
> > > > 
> > > 
> > > Lots of things to read through in this and the attached diff. I'll try to
> > > catch up and reply as soon as I can.
> > 
> > Ack. Let me know if you need me to split it in different patches, or if
> > you want to take a look at its use on vmd (the actual patch for vmd
> > still needs polishing, but works).
> > 
> > Sergio (slp).
> > 
> 
> I'm not sure if splitting this into a separate process or using a taskq is
> the right way to go ; dlg@ had done the latter before, but we dropped that
> for some reason.

Well, it's not a question of one thing or the other. It'd be possible
to have some devices serviced by code emmbedded in vmd, as it is today,
and others serviced by separated processes.

A nice use case for having a separated process servicing a device could
be implementing a virtio_blk disk with a complex backend, like ceph or
glusterfs. Not only you would avoid linking vmd against
librbd/libgfapi, but also would be able to attach gdb to inspect it
without halting the whole VM. Perhaps even restarting the process (hot-
plug/unplug?).

On the other hand, the underlying features (kickfd/in-kernel lapic) are
useful on their own, even with all devices being served by vmd,
providing lower latency, less vmexits, and parallel execution of I/O
requests. The only drawback is enlarging the kernel a bit, but I guess
this could be out of the GENERIC* configs by default.

IMHO this is win-win scenario.

Sergio (slp).

Reply via email to