On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote:
> On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote:
> > Hi Benjamin and Paolo,
> > I would like to discuss changes to DM-Multipath and qemu-pr-helper to
> > handle SCSI Persistent Reservations in QEMU without privileged code.
> > 
> > SCSI Persistent Reservations support in QEMU is built on the
> > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
> > PERSISTENT RESERVATION OUT commands on behalf of the guest. The
> > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
> > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
> > process should not have those privileges.
> > 
> > There are issues with the current approach:
> > - Privileged code is a security attack surface.
> > - A bunch of code is required for privilege separation and for management
> >   tools to set up qemu-pr-helper with access to multipathd.
> > - The interface is SCSI-specific and does not support NVMe.
> > 
> > Several of us have pondered a different approach that I will summarize
> > here. The <linux/pr.h> ioctl interface provides an alternative to
> > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > SCSI and NVMe. Since privileges are not required, there would be no need
> > for the qemu-pr-helper daemon anymore.
> > 
> > The blocker is that <linux/pr.h> is not usable in multipath
> > environments. The Linux DM-Multipath driver has an incomplete ioctl
> > implementation that falls short of what libmpathpersist and multipathd
> > do in userspace. Kernel changes are necessary to fix this.
> > 
> > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > to multipathd. That way applications like QEMU can consistently use
> > <linux/pr.h> across block device types and no longer have to go through
> > the privileged libmpathpersist interface.
> 
> This would take intercepting the pr commands to multipath devices right
> at the start of dm_call_pr(). In order to make some persistent
> reservation commands seem atomic, libmpathpersist needs to suspend the
> multipath device in certain situations. So device-mapper cannot call
> dm_get_live_table(), since this will block suspends. This should be o.k.
> Libmpathpersist is designed to handle the possiblity that the multipath
> device gets reloaded with different paths while it is running. And since
> the multipath target is an immutable singleton target, there is no
> possibility of it turning into another target type because of a table
> reload during suspend.
> 
> Also, just to clarify, the kernel code can't interface directly with
> multipathd. Most of the code for handling persistent reservations is in
> libmpathpersist, which just needs multipathd to do things like make sure
> that paths that are added in the furture get registered properly. There
> would likely need to be some new program (that is just a thin wrapper
> around libmpathpersist) which can be called with call_usermodehelper().

Hi everyone,
I'm starting to work on the DM-Multipath changes. Some more details on
how I am approaching this:

- multipath-tools will create multipath device-mapper targets with a new
  ctr argument (pr_netlink) when this feature is enabled. When the
  feature is disabled, everything remains backwards compatible. With the
  pr_netlink ctr argument, the multipath target sends a netlink
  multicast group notification instead of handling PR operations (e.g.
  IOC_PR_* ioctls) in the kernel.

- There will be a new program in multipath-tools called mpathpersistd
  that listens on the netlink multicast group for notifications. The
  notification tells it which multipath device has a pending PR
  operation. It fetches the PR operation parameters by sending a netlink
  message, performs the persistent reservation operation via
  libmpathpersist, and then sends a response to the kernel via another
  netlink message.

- The multipath device-mapper target completes the PR operation upon
  receiving the netlink response.

I ended up choosing netlink because call_usermodehelper() seems less
appropriate for an operation triggered by untrusted userspace processes.

Your input is welcome. Let me know if a different approach would be
better.

Thanks,
Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to