On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote: > On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote: > > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote: > > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote: > > > > Hi Benjamin and Paolo, > > > > I would like to discuss changes to DM-Multipath and qemu-pr-helper to > > > > handle SCSI Persistent Reservations in QEMU without privileged code. > > > > > > > > SCSI Persistent Reservations support in QEMU is built on the > > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and > > > > PERSISTENT RESERVATION OUT commands on behalf of the guest. The > > > > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s > > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU > > > > process should not have those privileges. > > > > > > > > There are issues with the current approach: > > > > - Privileged code is a security attack surface. > > > > - A bunch of code is required for privilege separation and for > > > > management > > > > tools to set up qemu-pr-helper with access to multipathd. > > > > - The interface is SCSI-specific and does not support NVMe. > > > > > > > > Several of us have pondered a different approach that I will summarize > > > > here. The <linux/pr.h> ioctl interface provides an alternative to > > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both > > > > SCSI and NVMe. Since privileges are not required, there would be no need > > > > for the qemu-pr-helper daemon anymore. > > > > > > > > The blocker is that <linux/pr.h> is not usable in multipath > > > > environments. The Linux DM-Multipath driver has an incomplete ioctl > > > > implementation that falls short of what libmpathpersist and multipathd > > > > do in userspace. Kernel changes are necessary to fix this. > > > > > > > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath > > > > to multipathd. That way applications like QEMU can consistently use > > > > <linux/pr.h> across block device types and no longer have to go through > > > > the privileged libmpathpersist interface. > > > > > > This would take intercepting the pr commands to multipath devices right > > > at the start of dm_call_pr(). In order to make some persistent > > > reservation commands seem atomic, libmpathpersist needs to suspend the > > > multipath device in certain situations. So device-mapper cannot call > > > dm_get_live_table(), since this will block suspends. This should be o.k. > > > Libmpathpersist is designed to handle the possiblity that the multipath > > > device gets reloaded with different paths while it is running. And since > > > the multipath target is an immutable singleton target, there is no > > > possibility of it turning into another target type because of a table > > > reload during suspend. > > > > > > Also, just to clarify, the kernel code can't interface directly with > > > multipathd. Most of the code for handling persistent reservations is in > > > libmpathpersist, which just needs multipathd to do things like make sure > > > that paths that are added in the furture get registered properly. There > > > would likely need to be some new program (that is just a thin wrapper > > > around libmpathpersist) which can be called with call_usermodehelper(). > > Adding Martin Wilck, since he will also be looking at these changes. > > > Hi everyone, > > I'm starting to work on the DM-Multipath changes. Some more details on > > how I am approaching this: > > > > - multipath-tools will create multipath device-mapper targets with a new > > ctr argument (pr_netlink) when this feature is enabled. When the > > feature is disabled, everything remains backwards compatible. With the > > pr_netlink ctr argument, the multipath target sends a netlink > > multicast group notification instead of handling PR operations (e.g. > > IOC_PR_* ioctls) in the kernel. > > > > - There will be a new program in multipath-tools called mpathpersistd > > that listens on the netlink multicast group for notifications. The > > notification tells it which multipath device has a pending PR > > operation. It fetches the PR operation parameters by sending a netlink > > message, performs the persistent reservation operation via > > libmpathpersist, and then sends a response to the kernel via another > > netlink message. > > > > - The multipath device-mapper target completes the PR operation upon > > receiving the netlink response. > > > > I ended up choosing netlink because call_usermodehelper() seems less > > appropriate for an operation triggered by untrusted userspace processes. > > > > Your input is welcome. Let me know if a different approach would be > > better. > > Is the netlink interface going to be a generic persistent reservation > upcall interface, or it this just for dm multipath? I'm not sure if > there would ever be another user, and I don't have enough experience > with the netlink code to know how ugly it might be to route > communications from different kernel drivers to different userspace > daemons through the same generic netlink family. But if there's not > much extra complexity in building a generic interface, it seems like > it would be preferable to a multipath specific one.
It can be generic. The messages will contain the block device major:minor as well as information to describe <linux/pr.h> requests. Stefan
signature.asc
Description: PGP signature
