On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> Hi Stefan,
> 
> On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
> > On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote:
> > > On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> > > > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski
> > > > wrote:
> > > > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi
> > > > > wrote:
> > > > > > Hi Benjamin and Paolo,
> > > > > > I would like to discuss changes to DM-Multipath and qemu-pr-
> > > > > > helper to
> > > > > > handle SCSI Persistent Reservations in QEMU without
> > > > > > privileged code.
> > > > > > 
> > > > > > SCSI Persistent Reservations support in QEMU is built on the
> > > > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN
> > > > > > and
> > > > > > PERSISTENT RESERVATION OUT commands on behalf of the guest.
> > > > > > The
> > > > > > qemu-pr-helper process provides privilege separation for
> > > > > > ioctl(SG_IO)'s
> > > > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the
> > > > > > main QEMU
> > > > > > process should not have those privileges.
> > > > > > 
> > > > > > There are issues with the current approach:
> > > > > > - Privileged code is a security attack surface.
> > > > > > - A bunch of code is required for privilege separation and
> > > > > > for management
> > > > > >   tools to set up qemu-pr-helper with access to multipathd.
> > > > > > - The interface is SCSI-specific and does not support NVMe.
> > > > > > 
> > > > > > Several of us have pondered a different approach that I will
> > > > > > summarize
> > > > > > here. The <linux/pr.h> ioctl interface provides an
> > > > > > alternative to
> > > > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It
> > > > > > supports both
> > > > > > SCSI and NVMe. Since privileges are not required, there would
> > > > > > be no need
> > > > > > for the qemu-pr-helper daemon anymore.
> > > > > > 
> > > > > > The blocker is that <linux/pr.h> is not usable in multipath
> > > > > > environments. The Linux DM-Multipath driver has an incomplete
> > > > > > ioctl
> > > > > > implementation that falls short of what libmpathpersist and
> > > > > > multipathd
> > > > > > do in userspace. Kernel changes are necessary to fix this.
> > > > > > 
> > > > > > My suggestion is to implement <linux/pr.h> via upcalls from
> > > > > > DM-Multipath
> > > > > > to multipathd. That way applications like QEMU can
> > > > > > consistently use
> > > > > > <linux/pr.h> across block device types and no longer have to
> > > > > > go through
> > > > > > the privileged libmpathpersist interface.
> > > > > 
> > > > > This would take intercepting the pr commands to multipath
> > > > > devices right
> > > > > at the start of dm_call_pr(). In order to make some persistent
> > > > > reservation commands seem atomic, libmpathpersist needs to
> > > > > suspend the
> > > > > multipath device in certain situations. So device-mapper cannot
> > > > > call
> > > > > dm_get_live_table(), since this will block suspends. This
> > > > > should be o.k.
> > > > > Libmpathpersist is designed to handle the possiblity that the
> > > > > multipath
> > > > > device gets reloaded with different paths while it is running.
> > > > > And since
> > > > > the multipath target is an immutable singleton target, there is
> > > > > no
> > > > > possibility of it turning into another target type because of a
> > > > > table
> > > > > reload during suspend.
> > > > > 
> > > > > Also, just to clarify, the kernel code can't interface directly
> > > > > with
> > > > > multipathd. Most of the code for handling persistent
> > > > > reservations is in
> > > > > libmpathpersist, which just needs multipathd to do things like
> > > > > make sure
> > > > > that paths that are added in the furture get registered
> > > > > properly. There
> > > > > would likely need to be some new program (that is just a thin
> > > > > wrapper
> > > > > around libmpathpersist) which can be called with
> > > > > call_usermodehelper().
> > > 
> > > Adding Martin Wilck, since he will also be looking at these
> > > changes.
> > >  
> > > > Hi everyone,
> > > > I'm starting to work on the DM-Multipath changes. Some more
> > > > details on
> > > > how I am approaching this:
> > > > 
> > > > - multipath-tools will create multipath device-mapper targets
> > > > with a new
> > > >   ctr argument (pr_netlink) when this feature is enabled. When
> > > > the
> > > >   feature is disabled, everything remains backwards compatible.
> > > > With the
> > > >   pr_netlink ctr argument, the multipath target sends a netlink
> > > >   multicast group notification instead of handling PR operations
> > > > (e.g.
> > > >   IOC_PR_* ioctls) in the kernel.
> > > > 
> > > > - There will be a new program in multipath-tools called
> > > > mpathpersistd
> > > >   that listens on the netlink multicast group for notifications.
> > > > The
> > > >   notification tells it which multipath device has a pending PR
> > > >   operation. It fetches the PR operation parameters by sending a
> > > > netlink
> > > >   message, performs the persistent reservation operation via
> > > >   libmpathpersist, and then sends a response to the kernel via
> > > > another
> > > >   netlink message.
> > > > 
> > > > - The multipath device-mapper target completes the PR operation
> > > > upon
> > > >   receiving the netlink response.
> > > > 
> > > > I ended up choosing netlink because call_usermodehelper() seems
> > > > less
> > > > appropriate for an operation triggered by untrusted userspace
> > > > processes.
> > > > 
> > > > Your input is welcome. Let me know if a different approach would
> > > > be
> > > > better.
> > > 
> > > Is the netlink interface going to be a generic persistent
> > > reservation
> > > upcall interface, or it this just for dm multipath? I'm not sure if
> > > there would ever be another user, and I don't have enough
> > > experience
> > > with the netlink code to know how ugly it might be to route
> > > communications from different kernel drivers to different userspace
> > > daemons through the same generic netlink family. But if there's not
> > > much extra complexity in building a generic interface, it seems
> > > like
> > > it would be preferable to a multipath specific one.
> > 
> > It can be generic. The messages will contain the block device
> > major:minor as well as information to describe <linux/pr.h> requests.
> 
> So the ioctls will pass through qemu into the kernel, to be intercepted
> by the dm-mpath driver, which will use an upcall to have them handled
> by mpathpersistd (for the actual command) and multipathd (for the path
> registrations).
> 
> I don't fully understand the advantage, security and complexity-wise,
> of this concept, compared to intercepting them qemu and using a socket
> to talk to mpathpersistd directly. If we did this, we could even
> support both generic and SCSI PR commands.

Hi Martin,
The simplification and security benefits are on the application side,
not on the DM-Multipath side, so I can see what you're getting at. From
the DM-Multipath perspective things get a little more complex.

From an application perspective, a single API that works across block
device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
sockets (they are a pain in container environments) is the most
convenient. The <linux/pr.h> ioctl API offers exactly this.

Unfortunately, DM-Multipath currently does not fully support
<linux/pr.h>. It sends PR operations down each path, but that is only a
subset of libmpathpersist's logic and multipathd is not kept in sync.

My impression is that libmpathpersist and multipathd logic cannot be
easily moved into the kernel. This is where the upcall idea comes from.
Let's notify multipath-tools from DM-Multipath so it can do its work in
userspace.

Getting back to the application vs DM-Multipath advantages: I think it's
worth simplifying things for applications because there are many
applications and only one DM-Multipath.

Thanks,
Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to