On 1/28/26 15:18, Stefan Hajnoczi wrote:
On Tue, Jan 27, 2026 at 08:45:39PM +0100, Paolo Bonzini wrote:
Il mar 27 gen 2026, 19:47 Stefan Hajnoczi <[email protected]> ha scritto:
Several of us have pondered a different approach that I will summarize
here. The <linux/pr.h> ioctl interface provides an alternative to
ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
SCSI and NVMe. Since privileges are not required, there would be no need
for the qemu-pr-helper daemon anymore.
Yes, no problem with that. It's easy to extend QEMU with a new pr-manager
subclass that converts SCSI commands to PR ioctls.
Yes. It will be possible to go further than that in the future:
Alberto has been working on QEMU block layer API support for persistent
reservations. When that becomes available, SCSI command parsing can
happen entirely within hw/scsi/scsi-disk.c for scsi-block and scsi-disk.
file-posix.c will then implement the new BlockDriver PR APIs via
<linux/pr.h> ioctls and other block drivers can implement them in
protocol-specific ways (e.g. iSCSI).
My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
to multipathd. That way applications like QEMU can consistently use
<linux/pr.h> across block device types and no longer have to go through
the privileged libmpathpersist interface.
What do you have in mind for the upcall protocol? Does it need to be done
with multipathd or can it be a separate daemon for privilege separation? I
am not sure if there is any channel between dm-mpath and multipathd that
can be extended (I think it only uses uevent?); maybe it would make sense
to reuse qemu-pr-helper's protocol even.
I don't have a strong opinion on the protocol. My thought was to do a
traditional upcall with call_usermodehelper() with an execve argv/envp
protocol. That way there is no need to register a file descriptor. The
downside is that this approach is less efficient and more likely to fail
when the host is under memory pressure, but PR operations are not that
frequent.
gnaa. call_usermodehelper() is _evil_. It might be executed with any
arbitrary fs context, and you better hope the executable is present
there ...
Maybe look at the handshake daemon. That's solving a very similar issue
which we had for TLS.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich