On Thu, Oct 11, 2018 at 11:55:51PM +0200, Sergio Lopez wrote:
> On Thu, 2018-10-11 at 09:02 -0700, Mike Larkin wrote:
> > On Wed, Oct 10, 2018 at 05:04:34PM +0200, Sergio Lopez wrote:
> > > On Mon, 2018-10-08 at 09:58 -0700, Mike Larkin wrote:
> > > > On Fri, Oct 05, 2018 at 01:50:10PM +0200, Sergio Lopez wrote:
> > > > > Hi,
> > > > > 
> > > > > I have an idea in mind that I'd like to share to ask you if you think
> > > > > it's worth giving it a try.
> > > > > 
> > > > > Right now, vmd already features an excellent privsep model to ensure
> > > > > the process servicing the VM requests to the outside world is running
> > > > > with the lowest possible privileges.
> > > > > 
> > > > > I was wondering if we could take a step further, servicing each virtio
> > > > > device from a different process. This design would simplify the
> > > > > implementation and maintenance of those devices, improve the privsep
> > > > > model and increase the resilience of VMs (the crash of a process
> > > > > servicing a device won't bring down the whole VM, and a mechanism to
> > > > > recover from this scenario could be explored).
> > > > > 
> > > > > Doing this in an efficient way requires:
> > > > > 
> > > > >  - The ability to receive virtqueue notifications directly on the
> > > > > process. I've already sent an RFC patch for this (see "Lightweight
> > > > > mechanism to kick virtio VQs"), as it'd be useful for the 
> > > > > non-separated
> > > > > model too.
> > > > > 
> > > > >  - An in-kernel IRQ chip. This one is the most controversial, as it
> > > > > means emulating a device from a privileged domain, but I'm pretty sure
> > > > > a lapic implementation with enough functionality to serve *BSD/Linux
> > > > > Guests can be small and simple enough to be easily auditable.
> > > > > 
> > > > >  - A way to map the VM memory into a third process context. Can
> > > > > uvm_share for this? Here's also the opportunity to explore options to
> > > > > avoid mapping the whole VM memory, though that'll possibly require
> > > > > functionality non covered by the virtio specs.
> > > > > 
> > > > > Do you think it's worth exploring this model? What are feelings
> > > > > regarding the in-kernel IRQ chip?
> > > > > 
> > > > > Sergio (slp).
> > > > > 
> > > > 
> > > > Lots of things to read through in this and the attached diff. I'll try 
> > > > to
> > > > catch up and reply as soon as I can.
> > > 
> > > Ack. Let me know if you need me to split it in different patches, or if
> > > you want to take a look at its use on vmd (the actual patch for vmd
> > > still needs polishing, but works).
> > > 
> > > Sergio (slp).
> > > 
> > 
> > I'm not sure if splitting this into a separate process or using a taskq is
> > the right way to go ; dlg@ had done the latter before, but we dropped that
> > for some reason.
> 

Comments below.

> Well, it's not a question of one thing or the other. It'd be possible
> to have some devices serviced by code emmbedded in vmd, as it is today,
> and others serviced by separated processes.

This seems overly complicated for little benefit (as I pointed out in
the other email).

> 
> A nice use case for having a separated process servicing a device could
> be implementing a virtio_blk disk with a complex backend, like ceph or
> glusterfs. Not only you would avoid linking vmd against

I believe we have more urgent things to focus on WRT vmd than glusterfs
backends. But if you want to cook up a diff, I'll certainly take a look.

> librbd/libgfapi, but also would be able to attach gdb to inspect it
> without halting the whole VM. Perhaps even restarting the process (hot-
> plug/unplug?).
> 
> On the other hand, the underlying features (kickfd/in-kernel lapic) are
> useful on their own, even with all devices being served by vmd,
> providing lower latency, less vmexits, and parallel execution of I/O
> requests. The only drawback is enlarging the kernel a bit, but I guess
> this could be out of the GENERIC* configs by default.
> 
> IMHO this is win-win scenario.
> 
> Sergio (slp).

Like I said in the other email, a more pressing matter is the LAPIC/IOAPIC
work, so if you're interested in contributing, I'd say start there and we
can discuss the virtio stuff later.

-ml

Reply via email to