On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote:
> On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote:
> > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote:
> > > > Hi Benjamin and Paolo,
> > > > I would like to discuss changes to DM-Multipath and qemu-pr-helper to
> > > > handle SCSI Persistent Reservations in QEMU without privileged code.
> > > > 
> > > > SCSI Persistent Reservations support in QEMU is built on the
> > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
> > > > PERSISTENT RESERVATION OUT commands on behalf of the guest. The
> > > > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
> > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
> > > > process should not have those privileges.
> > > > 
> > > > There are issues with the current approach:
> > > > - Privileged code is a security attack surface.
> > > > - A bunch of code is required for privilege separation and for 
> > > > management
> > > >   tools to set up qemu-pr-helper with access to multipathd.
> > > > - The interface is SCSI-specific and does not support NVMe.
> > > > 
> > > > Several of us have pondered a different approach that I will summarize
> > > > here. The <linux/pr.h> ioctl interface provides an alternative to
> > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > > > SCSI and NVMe. Since privileges are not required, there would be no need
> > > > for the qemu-pr-helper daemon anymore.
> > > > 
> > > > The blocker is that <linux/pr.h> is not usable in multipath
> > > > environments. The Linux DM-Multipath driver has an incomplete ioctl
> > > > implementation that falls short of what libmpathpersist and multipathd
> > > > do in userspace. Kernel changes are necessary to fix this.
> > > > 
> > > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > > > to multipathd. That way applications like QEMU can consistently use
> > > > <linux/pr.h> across block device types and no longer have to go through
> > > > the privileged libmpathpersist interface.
> > > 
> > > This would take intercepting the pr commands to multipath devices right
> > > at the start of dm_call_pr(). In order to make some persistent
> > > reservation commands seem atomic, libmpathpersist needs to suspend the
> > > multipath device in certain situations. So device-mapper cannot call
> > > dm_get_live_table(), since this will block suspends. This should be o.k.
> > > Libmpathpersist is designed to handle the possiblity that the multipath
> > > device gets reloaded with different paths while it is running. And since
> > > the multipath target is an immutable singleton target, there is no
> > > possibility of it turning into another target type because of a table
> > > reload during suspend.
> > > 
> > > Also, just to clarify, the kernel code can't interface directly with
> > > multipathd. Most of the code for handling persistent reservations is in
> > > libmpathpersist, which just needs multipathd to do things like make sure
> > > that paths that are added in the furture get registered properly. There
> > > would likely need to be some new program (that is just a thin wrapper
> > > around libmpathpersist) which can be called with call_usermodehelper().
> 
> Adding Martin Wilck, since he will also be looking at these changes.
>  
> > Hi everyone,
> > I'm starting to work on the DM-Multipath changes. Some more details on
> > how I am approaching this:
> > 
> > - multipath-tools will create multipath device-mapper targets with a new
> >   ctr argument (pr_netlink) when this feature is enabled. When the
> >   feature is disabled, everything remains backwards compatible. With the
> >   pr_netlink ctr argument, the multipath target sends a netlink
> >   multicast group notification instead of handling PR operations (e.g.
> >   IOC_PR_* ioctls) in the kernel.
> > 
> > - There will be a new program in multipath-tools called mpathpersistd
> >   that listens on the netlink multicast group for notifications. The
> >   notification tells it which multipath device has a pending PR
> >   operation. It fetches the PR operation parameters by sending a netlink
> >   message, performs the persistent reservation operation via
> >   libmpathpersist, and then sends a response to the kernel via another
> >   netlink message.
> > 
> > - The multipath device-mapper target completes the PR operation upon
> >   receiving the netlink response.
> > 
> > I ended up choosing netlink because call_usermodehelper() seems less
> > appropriate for an operation triggered by untrusted userspace processes.
> > 
> > Your input is welcome. Let me know if a different approach would be
> > better.
> 
> Is the netlink interface going to be a generic persistent reservation
> upcall interface, or it this just for dm multipath? I'm not sure if
> there would ever be another user, and I don't have enough experience
> with the netlink code to know how ugly it might be to route
> communications from different kernel drivers to different userspace
> daemons through the same generic netlink family. But if there's not
> much extra complexity in building a generic interface, it seems like
> it would be preferable to a multipath specific one.

It can be generic. The messages will contain the block device
major:minor as well as information to describe <linux/pr.h> requests.

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to