Hi Stefan, On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote: > On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote: > > On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote: > > > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski > > > wrote: > > > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi > > > > wrote: > > > > > Hi Benjamin and Paolo, > > > > > I would like to discuss changes to DM-Multipath and qemu-pr- > > > > > helper to > > > > > handle SCSI Persistent Reservations in QEMU without > > > > > privileged code. > > > > > > > > > > SCSI Persistent Reservations support in QEMU is built on the > > > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN > > > > > and > > > > > PERSISTENT RESERVATION OUT commands on behalf of the guest. > > > > > The > > > > > qemu-pr-helper process provides privilege separation for > > > > > ioctl(SG_IO)'s > > > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the > > > > > main QEMU > > > > > process should not have those privileges. > > > > > > > > > > There are issues with the current approach: > > > > > - Privileged code is a security attack surface. > > > > > - A bunch of code is required for privilege separation and > > > > > for management > > > > > tools to set up qemu-pr-helper with access to multipathd. > > > > > - The interface is SCSI-specific and does not support NVMe. > > > > > > > > > > Several of us have pondered a different approach that I will > > > > > summarize > > > > > here. The <linux/pr.h> ioctl interface provides an > > > > > alternative to > > > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It > > > > > supports both > > > > > SCSI and NVMe. Since privileges are not required, there would > > > > > be no need > > > > > for the qemu-pr-helper daemon anymore. > > > > > > > > > > The blocker is that <linux/pr.h> is not usable in multipath > > > > > environments. The Linux DM-Multipath driver has an incomplete > > > > > ioctl > > > > > implementation that falls short of what libmpathpersist and > > > > > multipathd > > > > > do in userspace. Kernel changes are necessary to fix this. > > > > > > > > > > My suggestion is to implement <linux/pr.h> via upcalls from > > > > > DM-Multipath > > > > > to multipathd. That way applications like QEMU can > > > > > consistently use > > > > > <linux/pr.h> across block device types and no longer have to > > > > > go through > > > > > the privileged libmpathpersist interface. > > > > > > > > This would take intercepting the pr commands to multipath > > > > devices right > > > > at the start of dm_call_pr(). In order to make some persistent > > > > reservation commands seem atomic, libmpathpersist needs to > > > > suspend the > > > > multipath device in certain situations. So device-mapper cannot > > > > call > > > > dm_get_live_table(), since this will block suspends. This > > > > should be o.k. > > > > Libmpathpersist is designed to handle the possiblity that the > > > > multipath > > > > device gets reloaded with different paths while it is running. > > > > And since > > > > the multipath target is an immutable singleton target, there is > > > > no > > > > possibility of it turning into another target type because of a > > > > table > > > > reload during suspend. > > > > > > > > Also, just to clarify, the kernel code can't interface directly > > > > with > > > > multipathd. Most of the code for handling persistent > > > > reservations is in > > > > libmpathpersist, which just needs multipathd to do things like > > > > make sure > > > > that paths that are added in the furture get registered > > > > properly. There > > > > would likely need to be some new program (that is just a thin > > > > wrapper > > > > around libmpathpersist) which can be called with > > > > call_usermodehelper(). > > > > Adding Martin Wilck, since he will also be looking at these > > changes. > > > > > Hi everyone, > > > I'm starting to work on the DM-Multipath changes. Some more > > > details on > > > how I am approaching this: > > > > > > - multipath-tools will create multipath device-mapper targets > > > with a new > > > ctr argument (pr_netlink) when this feature is enabled. When > > > the > > > feature is disabled, everything remains backwards compatible. > > > With the > > > pr_netlink ctr argument, the multipath target sends a netlink > > > multicast group notification instead of handling PR operations > > > (e.g. > > > IOC_PR_* ioctls) in the kernel. > > > > > > - There will be a new program in multipath-tools called > > > mpathpersistd > > > that listens on the netlink multicast group for notifications. > > > The > > > notification tells it which multipath device has a pending PR > > > operation. It fetches the PR operation parameters by sending a > > > netlink > > > message, performs the persistent reservation operation via > > > libmpathpersist, and then sends a response to the kernel via > > > another > > > netlink message. > > > > > > - The multipath device-mapper target completes the PR operation > > > upon > > > receiving the netlink response. > > > > > > I ended up choosing netlink because call_usermodehelper() seems > > > less > > > appropriate for an operation triggered by untrusted userspace > > > processes. > > > > > > Your input is welcome. Let me know if a different approach would > > > be > > > better. > > > > Is the netlink interface going to be a generic persistent > > reservation > > upcall interface, or it this just for dm multipath? I'm not sure if > > there would ever be another user, and I don't have enough > > experience > > with the netlink code to know how ugly it might be to route > > communications from different kernel drivers to different userspace > > daemons through the same generic netlink family. But if there's not > > much extra complexity in building a generic interface, it seems > > like > > it would be preferable to a multipath specific one. > > It can be generic. The messages will contain the block device > major:minor as well as information to describe <linux/pr.h> requests.
So the ioctls will pass through qemu into the kernel, to be intercepted by the dm-mpath driver, which will use an upcall to have them handled by mpathpersistd (for the actual command) and multipathd (for the path registrations). I don't fully understand the advantage, security and complexity-wise, of this concept, compared to intercepting them qemu and using a socket to talk to mpathpersistd directly. If we did this, we could even support both generic and SCSI PR commands. Regards Martin
