Hi Stefan,

On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
> On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote:
> > On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> > > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski
> > > wrote:
> > > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi
> > > > wrote:
> > > > > Hi Benjamin and Paolo,
> > > > > I would like to discuss changes to DM-Multipath and qemu-pr-
> > > > > helper to
> > > > > handle SCSI Persistent Reservations in QEMU without
> > > > > privileged code.
> > > > > 
> > > > > SCSI Persistent Reservations support in QEMU is built on the
> > > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN
> > > > > and
> > > > > PERSISTENT RESERVATION OUT commands on behalf of the guest.
> > > > > The
> > > > > qemu-pr-helper process provides privilege separation for
> > > > > ioctl(SG_IO)'s
> > > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the
> > > > > main QEMU
> > > > > process should not have those privileges.
> > > > > 
> > > > > There are issues with the current approach:
> > > > > - Privileged code is a security attack surface.
> > > > > - A bunch of code is required for privilege separation and
> > > > > for management
> > > > >   tools to set up qemu-pr-helper with access to multipathd.
> > > > > - The interface is SCSI-specific and does not support NVMe.
> > > > > 
> > > > > Several of us have pondered a different approach that I will
> > > > > summarize
> > > > > here. The <linux/pr.h> ioctl interface provides an
> > > > > alternative to
> > > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It
> > > > > supports both
> > > > > SCSI and NVMe. Since privileges are not required, there would
> > > > > be no need
> > > > > for the qemu-pr-helper daemon anymore.
> > > > > 
> > > > > The blocker is that <linux/pr.h> is not usable in multipath
> > > > > environments. The Linux DM-Multipath driver has an incomplete
> > > > > ioctl
> > > > > implementation that falls short of what libmpathpersist and
> > > > > multipathd
> > > > > do in userspace. Kernel changes are necessary to fix this.
> > > > > 
> > > > > My suggestion is to implement <linux/pr.h> via upcalls from
> > > > > DM-Multipath
> > > > > to multipathd. That way applications like QEMU can
> > > > > consistently use
> > > > > <linux/pr.h> across block device types and no longer have to
> > > > > go through
> > > > > the privileged libmpathpersist interface.
> > > > 
> > > > This would take intercepting the pr commands to multipath
> > > > devices right
> > > > at the start of dm_call_pr(). In order to make some persistent
> > > > reservation commands seem atomic, libmpathpersist needs to
> > > > suspend the
> > > > multipath device in certain situations. So device-mapper cannot
> > > > call
> > > > dm_get_live_table(), since this will block suspends. This
> > > > should be o.k.
> > > > Libmpathpersist is designed to handle the possiblity that the
> > > > multipath
> > > > device gets reloaded with different paths while it is running.
> > > > And since
> > > > the multipath target is an immutable singleton target, there is
> > > > no
> > > > possibility of it turning into another target type because of a
> > > > table
> > > > reload during suspend.
> > > > 
> > > > Also, just to clarify, the kernel code can't interface directly
> > > > with
> > > > multipathd. Most of the code for handling persistent
> > > > reservations is in
> > > > libmpathpersist, which just needs multipathd to do things like
> > > > make sure
> > > > that paths that are added in the furture get registered
> > > > properly. There
> > > > would likely need to be some new program (that is just a thin
> > > > wrapper
> > > > around libmpathpersist) which can be called with
> > > > call_usermodehelper().
> > 
> > Adding Martin Wilck, since he will also be looking at these
> > changes.
> >  
> > > Hi everyone,
> > > I'm starting to work on the DM-Multipath changes. Some more
> > > details on
> > > how I am approaching this:
> > > 
> > > - multipath-tools will create multipath device-mapper targets
> > > with a new
> > >   ctr argument (pr_netlink) when this feature is enabled. When
> > > the
> > >   feature is disabled, everything remains backwards compatible.
> > > With the
> > >   pr_netlink ctr argument, the multipath target sends a netlink
> > >   multicast group notification instead of handling PR operations
> > > (e.g.
> > >   IOC_PR_* ioctls) in the kernel.
> > > 
> > > - There will be a new program in multipath-tools called
> > > mpathpersistd
> > >   that listens on the netlink multicast group for notifications.
> > > The
> > >   notification tells it which multipath device has a pending PR
> > >   operation. It fetches the PR operation parameters by sending a
> > > netlink
> > >   message, performs the persistent reservation operation via
> > >   libmpathpersist, and then sends a response to the kernel via
> > > another
> > >   netlink message.
> > > 
> > > - The multipath device-mapper target completes the PR operation
> > > upon
> > >   receiving the netlink response.
> > > 
> > > I ended up choosing netlink because call_usermodehelper() seems
> > > less
> > > appropriate for an operation triggered by untrusted userspace
> > > processes.
> > > 
> > > Your input is welcome. Let me know if a different approach would
> > > be
> > > better.
> > 
> > Is the netlink interface going to be a generic persistent
> > reservation
> > upcall interface, or it this just for dm multipath? I'm not sure if
> > there would ever be another user, and I don't have enough
> > experience
> > with the netlink code to know how ugly it might be to route
> > communications from different kernel drivers to different userspace
> > daemons through the same generic netlink family. But if there's not
> > much extra complexity in building a generic interface, it seems
> > like
> > it would be preferable to a multipath specific one.
> 
> It can be generic. The messages will contain the block device
> major:minor as well as information to describe <linux/pr.h> requests.

So the ioctls will pass through qemu into the kernel, to be intercepted
by the dm-mpath driver, which will use an upcall to have them handled
by mpathpersistd (for the actual command) and multipathd (for the path
registrations).

I don't fully understand the advantage, security and complexity-wise,
of this concept, compared to intercepting them qemu and using a socket
to talk to mpathpersistd directly. If we did this, we could even
support both generic and SCSI PR commands.

Regards
Martin

Reply via email to