On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote: > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote: > > Hi Benjamin and Paolo, > > I would like to discuss changes to DM-Multipath and qemu-pr-helper to > > handle SCSI Persistent Reservations in QEMU without privileged code. > > > > SCSI Persistent Reservations support in QEMU is built on the > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and > > PERSISTENT RESERVATION OUT commands on behalf of the guest. The > > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU > > process should not have those privileges. > > > > There are issues with the current approach: > > - Privileged code is a security attack surface. > > - A bunch of code is required for privilege separation and for management > > tools to set up qemu-pr-helper with access to multipathd. > > - The interface is SCSI-specific and does not support NVMe. > > > > Several of us have pondered a different approach that I will summarize > > here. The <linux/pr.h> ioctl interface provides an alternative to > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both > > SCSI and NVMe. Since privileges are not required, there would be no need > > for the qemu-pr-helper daemon anymore. > > > > The blocker is that <linux/pr.h> is not usable in multipath > > environments. The Linux DM-Multipath driver has an incomplete ioctl > > implementation that falls short of what libmpathpersist and multipathd > > do in userspace. Kernel changes are necessary to fix this. > > > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath > > to multipathd. That way applications like QEMU can consistently use > > <linux/pr.h> across block device types and no longer have to go through > > the privileged libmpathpersist interface. > > This would take intercepting the pr commands to multipath devices right > at the start of dm_call_pr(). In order to make some persistent > reservation commands seem atomic, libmpathpersist needs to suspend the > multipath device in certain situations. So device-mapper cannot call > dm_get_live_table(), since this will block suspends. This should be o.k. > Libmpathpersist is designed to handle the possiblity that the multipath > device gets reloaded with different paths while it is running. And since > the multipath target is an immutable singleton target, there is no > possibility of it turning into another target type because of a table > reload during suspend. > > Also, just to clarify, the kernel code can't interface directly with > multipathd. Most of the code for handling persistent reservations is in > libmpathpersist, which just needs multipathd to do things like make sure > that paths that are added in the furture get registered properly. There > would likely need to be some new program (that is just a thin wrapper > around libmpathpersist) which can be called with call_usermodehelper().
Hi everyone, I'm starting to work on the DM-Multipath changes. Some more details on how I am approaching this: - multipath-tools will create multipath device-mapper targets with a new ctr argument (pr_netlink) when this feature is enabled. When the feature is disabled, everything remains backwards compatible. With the pr_netlink ctr argument, the multipath target sends a netlink multicast group notification instead of handling PR operations (e.g. IOC_PR_* ioctls) in the kernel. - There will be a new program in multipath-tools called mpathpersistd that listens on the netlink multicast group for notifications. The notification tells it which multipath device has a pending PR operation. It fetches the PR operation parameters by sending a netlink message, performs the persistent reservation operation via libmpathpersist, and then sends a response to the kernel via another netlink message. - The multipath device-mapper target completes the PR operation upon receiving the netlink response. I ended up choosing netlink because call_usermodehelper() seems less appropriate for an operation triggered by untrusted userspace processes. Your input is welcome. Let me know if a different approach would be better. Thanks, Stefan
signature.asc
Description: PGP signature
