On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote:
> > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote:
> > > Hi Benjamin and Paolo,
> > > I would like to discuss changes to DM-Multipath and qemu-pr-helper to
> > > handle SCSI Persistent Reservations in QEMU without privileged code.
> > > 
> > > SCSI Persistent Reservations support in QEMU is built on the
> > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
> > > PERSISTENT RESERVATION OUT commands on behalf of the guest. The
> > > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
> > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
> > > process should not have those privileges.
> > > 
> > > There are issues with the current approach:
> > > - Privileged code is a security attack surface.
> > > - A bunch of code is required for privilege separation and for management
> > >   tools to set up qemu-pr-helper with access to multipathd.
> > > - The interface is SCSI-specific and does not support NVMe.
> > > 
> > > Several of us have pondered a different approach that I will summarize
> > > here. The <linux/pr.h> ioctl interface provides an alternative to
> > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > > SCSI and NVMe. Since privileges are not required, there would be no need
> > > for the qemu-pr-helper daemon anymore.
> > > 
> > > The blocker is that <linux/pr.h> is not usable in multipath
> > > environments. The Linux DM-Multipath driver has an incomplete ioctl
> > > implementation that falls short of what libmpathpersist and multipathd
> > > do in userspace. Kernel changes are necessary to fix this.
> > > 
> > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > > to multipathd. That way applications like QEMU can consistently use
> > > <linux/pr.h> across block device types and no longer have to go through
> > > the privileged libmpathpersist interface.
> > 
> > This would take intercepting the pr commands to multipath devices right
> > at the start of dm_call_pr(). In order to make some persistent
> > reservation commands seem atomic, libmpathpersist needs to suspend the
> > multipath device in certain situations. So device-mapper cannot call
> > dm_get_live_table(), since this will block suspends. This should be o.k.
> > Libmpathpersist is designed to handle the possiblity that the multipath
> > device gets reloaded with different paths while it is running. And since
> > the multipath target is an immutable singleton target, there is no
> > possibility of it turning into another target type because of a table
> > reload during suspend.
> > 
> > Also, just to clarify, the kernel code can't interface directly with
> > multipathd. Most of the code for handling persistent reservations is in
> > libmpathpersist, which just needs multipathd to do things like make sure
> > that paths that are added in the furture get registered properly. There
> > would likely need to be some new program (that is just a thin wrapper
> > around libmpathpersist) which can be called with call_usermodehelper().

Adding Martin Wilck, since he will also be looking at these changes.
 
> Hi everyone,
> I'm starting to work on the DM-Multipath changes. Some more details on
> how I am approaching this:
> 
> - multipath-tools will create multipath device-mapper targets with a new
>   ctr argument (pr_netlink) when this feature is enabled. When the
>   feature is disabled, everything remains backwards compatible. With the
>   pr_netlink ctr argument, the multipath target sends a netlink
>   multicast group notification instead of handling PR operations (e.g.
>   IOC_PR_* ioctls) in the kernel.
> 
> - There will be a new program in multipath-tools called mpathpersistd
>   that listens on the netlink multicast group for notifications. The
>   notification tells it which multipath device has a pending PR
>   operation. It fetches the PR operation parameters by sending a netlink
>   message, performs the persistent reservation operation via
>   libmpathpersist, and then sends a response to the kernel via another
>   netlink message.
> 
> - The multipath device-mapper target completes the PR operation upon
>   receiving the netlink response.
> 
> I ended up choosing netlink because call_usermodehelper() seems less
> appropriate for an operation triggered by untrusted userspace processes.
> 
> Your input is welcome. Let me know if a different approach would be
> better.

Is the netlink interface going to be a generic persistent reservation
upcall interface, or it this just for dm multipath? I'm not sure if
there would ever be another user, and I don't have enough experience
with the netlink code to know how ugly it might be to route
communications from different kernel drivers to different userspace
daemons through the same generic netlink family. But if there's not
much extra complexity in building a generic interface, it seems like
it would be preferable to a multipath specific one.

-Ben

> Thanks,
> Stefan



Reply via email to