from:"Laszlo Ersek"

Re: [RFC PATCH 4/5] hw/i386/q35: Wire virtual SMI# lines to ICH9 chipset

2024-03-08 Thread Laszlo Ersek

On 3/8/24 09:08, Philippe Mathieu-Daudé wrote:
> On 7/3/24 20:43, Thomas Huth wrote:
>> On 28/02/2024 17.43, Zhao Liu wrote:
>>> Hi Philippe,
>>>
 +/*
 + * Real ICH9 contains a single SMI output line and doesn't
 broadcast CPUs.
 + * Virtualized ICH9 allows broadcasting upon negatiation with
 guest, see
 + * commit 5ce45c7a2b.
 + */
 +enum {
 +    ICH9_VIRT_SMI_BROADCAST,
 +    ICH9_VIRT_SMI_CURRENT,
 +#define ICH9_VIRT_SMI_COUNT 2
 +};
 +
>>>
>>> Just quick look here. Shouldn't ICH9_VIRT_SMI_COUNT be defined
>>> outside of
>>> enum {}?
>>
>> Or even better, do it without a #define:
>>
>> enum {
>>  ICH9_VIRT_SMI_BROADCAST,
>>  ICH9_VIRT_SMI_CURRENT,
>>  ICH9_VIRT_SMI_COUNT
> 
> This form isn't recommended as it confuses static analyzers,
> considering ICH9_VIRT_SMI_COUNT as part of the enum.

Side comment: I didn't know about this (so thanks for the info), but
that's really a shame for those static analyzers. It's an ancient and
valid pattern. :/

> 
>> };
>>
>>   Thomas
>>
>

Re: [RFC PATCH 0/5] hw/i386/q35: Decouple virtual SMI# lines and wire them to ICH9 chipset

2024-02-27 Thread Laszlo Ersek

Hi Phil,

On 2/26/24 17:49, Philippe Mathieu-Daudé wrote:
> Hi,
>
> This is an experimental series to reduce calls to the
> cpu_interrupt() API from generic HW/. I'm trying to use
> the ICH9 chipset from a non-x86 machine. Without this
> experiment, we can not because cpu_interrupt() is target
> specific. Here the interrupt is decoupled using the QDev
> GPIO API. Even if the SMI# line is left unconnected, the
> device is still usable by a guest.
>
> Based-on: <20240226111416.39217-1-phi...@linaro.org>
>
> Philippe Mathieu-Daudé (5):
>   target/i386/cpu: Expose SMI# IRQ line via QDev
>   hw/i386/piix: Set CPU SMI# interrupt using QDev GPIO API
>   hw/ahci/ich9_tco: Set CPU SMI# interrupt using QDev GPIO API
>   hw/i386/q35: Wire virtual SMI# lines to ICH9 chipset
>   hw/isa: Build ich9_lpc.c once
>
>  include/hw/acpi/ich9.h|  1 +
>  include/hw/acpi/ich9_tco.h|  4 ++--
>  include/hw/i386/pc.h  |  2 --
>  include/hw/isa/ich9_lpc.h | 12 
>  include/hw/southbridge/ich9.h |  1 +
>  target/i386/cpu-internal.h|  1 +
>  hw/acpi/ich9.c|  3 ++-
>  hw/acpi/ich9_tco.c| 13 ++---
>  hw/i386/pc.c  |  9 -
>  hw/i386/pc_piix.c |  4 ++--
>  hw/i386/pc_q35.c  | 26 ++
>  hw/isa/ich9_lpc.c | 15 ---
>  hw/southbridge/ich9.c |  1 +
>  target/i386/cpu-sysemu.c  | 11 +++
>  target/i386/cpu.c |  2 ++
>  hw/isa/meson.build|  3 +--
>  16 files changed, 76 insertions(+), 32 deletions(-)
>

This series is over my head for a review, so the best I could offer
would be to test it.

However, even testing it seems like a challenge. First, I've found that,
when building QEMU at dccbaf0cc0f1, my usual libvirt guests don't start
-- I needed to search the web for the error message, and then apply the
revert series

  [PATCH 0/2] Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"
  https://patchew.org/QEMU/20240226215909.30884-1-shen...@gmail.com/

With that, I managed to establish a "baseline" (test some OVMF SMM
stuff, such as UEFI variable services, ACPI S3 suspend/resume, VCPU
hotplug/hot-unplug).

Then I wanted to apply this series (on top of those reverts on top of
dccbaf0cc0f1). It doesn't apply.

Then I noticed you mentioned the dependency on:

  [PATCH v2 00/15] hw/southbridge: Extract ICH9 QOM container model
  https://patchew.org/QEMU/20240226111416.39217-1-phi...@linaro.org/

That only seems to make things more complicated:

- patchew says "Failed in applying to current master"

- in the blurb, you mention "Rebased on top of Bernhard patches";
however, the above reverts appear to undo some of those patches
precisely, so I'm unsure how stable that foundation should be
considered.

I'd prefer waiting until all these patches stabilized a bit, and the
foundation all went upstream, and then I'd have to apply (a new version
of) this particular series only, on the then-master branch, for testing.

Laszlo

Re: [PATCH V2 1/1] loongarch: Change the UEFI loading mode to loongarch

2024-02-25 Thread Laszlo Ersek

On 2/22/24 16:49, Andrea Bolognani wrote:
> On Thu, Feb 22, 2024 at 04:10:20PM +0100, Philippe Mathieu-Daudé wrote:
>> On 19/2/24 11:34, Xianglai Li wrote:
>>> The UEFI loading mode in loongarch is very different
>>> from that in other architectures:loongarch's UEFI code
>>> is in rom, while other architectures' UEFI code is in flash.
>>>
>>> loongarch UEFI can be loaded as follows:
>>> -machine virt,pflash=pflash0-format
>>> -bios ./QEMU_EFI.fd
>>>
>>> Other architectures load UEFI using the following methods:
>>> -machine virt,pflash0=pflash0-format,pflash1=pflash1-format
>>>
>>> loongarch's UEFI loading method makes qemu and libvirt incompatible
>>> when using NVRAM, and the cost of loongarch's current loading method
>>> far outweighs the benefits, so we decided to use the same UEFI loading
>>> scheme as other architectures.
>>
>> This is unfortunate, since LoongArch was a fresh new target added,
>> we had the possibility to make this right. Are you saying libvirt
>> didn't accept to add support for the correct HW behavior which is
>> to simply load a ROM instead of a PNOR flash device? Could you
>> point me to the libvirt discussion please? libvirt is very good at
>> supporting a broad range of legacy options, so I'm surprise 'Doing
>> The Right Thing' is too costly.
>>
>> What is really the problem here, is it your use of the the -bios
>> CLI option?
> 
> Hi Philippe,
> 
> the thread is here:
> 
>   
> https://lists.libvirt.org/archives/list/de...@lists.libvirt.org/thread/7PV3IXWNX3UXQN2BNV5UA5ASVXNVOQIF/
> 
> Unfortunately hyperkitty makes it impossible to link to a subthread
> directly, so you're going to have to scroll around. The relevant part
> of the discussion happens entirely as reply to the cover letter.
> 
> You were actually CC'd to that subthread right after my first reply,
> so you should be able to find the relevant messages locally as well,
> which is probably going to be more convenient.
> 
> In short, the discussion is similar to the one we had a while ago
> about RISC-V, and my argument in favor of this change is largely the
> same: barring exceptional circumstances, the overall (maintenance,
> cognitive) cost of straying from the established norm, now spanning
> three existing architectures, likely outweighs the benefits.
> 

I'm surprised that the UEFI payload (?) on *physical* loongarch machines
is supposed (?) to launch from ROM. That means "no firmware updates",
which is quite unusual nowadays. Recent versions of the UEFI spec have
introduced a bunch of interfaces just for standardizing firmware
updates, meaning both add-on card firmware, and platform/system firmware.

(Unfortunately, I have nothing "constructive" to add; apologies.)

Laszlo

Re: [PATCH v7 2/3] hw/isa/lpc_ich9: add broadcast SMI feature

2024-02-20 Thread Laszlo Ersek

On 2/20/24 08:58, Philippe Mathieu-Daudé wrote:
> Hi Laszlo, Igor, Gerd,
> 
> (old patch, now commit 5ce45c7a2b)
> 
> On 26/1/17 02:44, Laszlo Ersek wrote:
>> The generic edk2 SMM infrastructure prefers
>> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each
>> processor. If
>> Trigger() only brings the current processor into SMM, then edk2
>> handles it
>> in the following ways:
>>
>> (1) If Trigger() is executed by the BSP (which is guaranteed before
>>  ExitBootServices(), but is not necessarily true at runtime), then:
>>
>>  (a) If edk2 has been configured for "traditional" SMM
>> synchronization,
>>  then the BSP sends directed SMIs to the APs with APIC delivery,
>>  bringing them into SMM individually. Then the BSP runs the SMI
>>  handler / dispatcher.
>>
>>  (b) If edk2 has been configured for "relaxed" SMM synchronization,
>>  then the APs that are not already in SMM are not brought in, and
>>  the BSP runs the SMI handler / dispatcher.
>>
>> (2) If Trigger() is executed by an AP (which is possible after
>>  ExitBootServices(), and can be forced e.g. by "taskset -c 1
>>  efibootmgr"), then the AP in question brings in the BSP with a
>>  directed SMI, and the BSP runs the SMI handler / dispatcher.
>>
>> The smaller problem with (1a) and (2) is that the BSP and AP
>> synchronization is slow. For example, the "taskset -c 1 efibootmgr"
>> command from (2) can take more than 3 seconds to complete, because
>> efibootmgr accesses non-volatile UEFI variables intensively.
>>
>> The larger problem is that QEMU's current behavior diverges from the
>> behavior usually seen on physical hardware, and that keeps exposing
>> obscure corner cases, race conditions and other instabilities in edk2,
>> which generally expects / prefers a software SMI to affect all CPUs at
>> once.
>>
>> Therefore introduce the "broadcast SMI" feature that causes QEMU to
>> inject
>> the SMI on all VCPUs.
> 
> I'm trying to remove cpu_interrupt() API from hw/ and found this odd
> case.
> 
> IIUC, the code you added is closer to what real HW is doing:
> 
>   CPU_FOREACH(cs) { cpu_interrupt(cs, CPU_INTERRUPT_SMI); }
> 
> and previous implementation was bogus:
> 
>   cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
> 
> but to avoid breaking older VMs ready to deal with bogus impl,
> you have to add a virtual (non-HW) ICH9_LPC_SMI_F_BROADCAST bit
> so new VMs can detect (negotiating) it and use normal expected
> HW behavior.
> 
> If so, and since this change was almost 7 years ago, can we
> expect that most of today's VMs use ICH9_LPC_SMI_F_BROADCAST_BIT,
> and would it be possible to deprecate it, so it become the only
> possibility, allowing us to remove this bogus call?
> 
>   cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

For OVMF guests: yes, said deprecation should be safe.

Note however that the "current_cpu" case (the original case) had been in
place minimally for SeaBIOS. I don't know how exactly the deprecation /
removal in QEMU would work, but if you build SeaBIOS with
CONFIG_USE_SMM, it might still depend on the "current_cpu" branch.

FWIW, "roms/config.seabios-128k" and "roms/config.seabios-microvm" both
contain CONFIG_USE_SMM=n, so the deprecation likely wouldn't matter for
those SeaBIOS binaries (bundled with QEMU). But it could matter for
SeaBIOS binaries from other sources; plus "roms/config.seabios-256k"
does *not* contain a setting like that (and the SeaBIOS default is "y",
when building for QEMU).

For another data point: as far as I remember, we had disabled
CONFIG_USE_SMM in RHEL; there had been stability issues.

... I can't describe *all* the uses that SeaBIOS has for SMM. *One* use
is from commit 55215cd425d3 ("Implement call32 mechanism using SMIs.",
2014-10-15) -- "this allows SeaBIOS to transition to 32bit mode even
when called in vm86 mode". Because that commit modifies "stacks.c", I
think it must be related to the "cooperative multi-tasking system"
described here: <https://www.seabios.org/Execution_and_code_flow#Threads>.

I'm really rusty on this [*], but here's one potential symptom I can
theorize about: assuming you silently make broadcast SMI the default in
QEMU, and SeaBIOS raises an SMI (expecting it to only affect the BSP),
the SMI could become pending on all the APs (which would all be in RESET
state at that point [**]). And when Linux booted those APs with
INIT-SIPI-SIPI sequences, the pending SMIs could be delivered
immediately, and the APs would launch immedi

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

2024-01-24 Thread Laszlo Ersek

On 1/24/24 13:58, Philippe Mathieu-Daudé wrote:
> On 24/1/24 12:53, Cédric Le Goater wrote:
>> On 1/18/24 20:20, Vivek Kasireddy wrote:
>>> Recent updates in OVMF and Seabios have resulted in MMIO regions
>>> being placed at the upper end of the physical address space. As a
>>> result, when a Host device is assigned to the Guest via VFIO, the
>>> following mapping failures occur when VFIO tries to map the MMIO
>>> regions of the device:
>>> VFIO_MAP_DMA failed: Invalid argument
>>> vfio_dma_map(0x557b2f2736d0, 0x3800, 0x100,
>>> 0x7f98ac40) = -22 (Invalid argument)
>>>
>>> The above failures are mainly seen on some Intel platforms where
>>> the physical address width is larger than the Host's IOMMU
>>> address width. In these cases, VFIO fails to map the MMIO regions
>>> because the IOVAs would be larger than the IOMMU aperture regions.
>>>
>>> Therefore, one way to solve this problem would be to ensure that
>>> cpu->phys_bits = 
>>> This can be done by parsing the IOMMU caps value from sysfs and
>>> extracting the address width and using it to override the
>>> phys_bits value as shown in this patch.
>>>
>>> Previous attempt at solving this issue in OVMF:
>>> https://edk2.groups.io/g/devel/topic/102359124
>>>
>>> Cc: Gerd Hoffmann 
>>> Cc: Philippe Mathieu-Daudé 
>>> Cc: Alex Williamson 
>>> Cc: Cédric Le Goater 
>>> Cc: Laszlo Ersek 
>>> Cc: Dongwon Kim 
>>> Acked-by: Gerd Hoffmann 
>>> Tested-by: Yanghang Liu 
>>> Signed-off-by: Vivek Kasireddy 
>>>
>>> ---
>>> v2:
>>> - Replace the term passthrough with assigned (Laszlo)
>>> - Update the commit message to note that both OVMF and Seabios
>>>    guests are affected (Cédric)
>>> - Update the subject to indicate what is done in the patch
>>> ---
>>>   target/i386/host-cpu.c | 61 +-
>>>   1 file changed, 60 insertions(+), 1 deletion(-)
> 
> 
>>> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error
>>> **errp)
>>> +{
>>> +    g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
>>> +    const char *driver = qemu_opt_get(opts, "driver");
>>> +    const char *device = qemu_opt_get(opts, "host");
>>> +    uint32_t *iommu_phys_bits = opaque;
>>> +    struct stat st;
>>> +    uint64_t iommu_caps;
>>> +
>>> +    /*
>>> + * Check if the user requested VFIO device assignment. We don't
>>> have
>>> + * to limit phys_bits if there are no valid assigned devices.
>>> + */
>>> +    if (g_strcmp0(driver, "vfio-pci") || !device) {
>>> +    return 0;
>>> +    }
>>> +
>>> +    dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
>>> +    if (stat(dev_path, ) < 0) {
>>> +    return 0;
>>> +    }
>>> +
>>> +    iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
>>> +    if (stat(iommu_path, ) < 0) {
>>> +    return 0;
>>> +    }
>>> +
>>> +    if (g_file_get_contents(iommu_path, , NULL, NULL)) {
>>> +    if (sscanf(caps, "%lx", _caps) != 1) {
>>
>> nit. This should use a PRIx64 define.
>>
>>> +    return 0;
>>> +    }
>>> +    *iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
>>
>> Please use 0x3fULL
> 
> or:
> 
>    *iommu_phys_bits = 1 + extract32(iommu_caps, 16, 6);

Huh, interesting; I've never seen this recommended before, even though
it comes from a very old commit -- 84988cf910a6 ("bitops.h: Add
functions to extract and deposit bitfields", 2012-07-07).

I thought only edk2 had BitFieldRead32() :)

Laszlo

Re: [PATCH v2] target/i386/host-cpu: Use iommu phys_bits with VFIO assigned devices on Intel h/w

2024-01-22 Thread Laszlo Ersek

On 1/18/24 20:20, Vivek Kasireddy wrote:
> Recent updates in OVMF and Seabios have resulted in MMIO regions
> being placed at the upper end of the physical address space. As a
> result, when a Host device is assigned to the Guest via VFIO, the
> following mapping failures occur when VFIO tries to map the MMIO
> regions of the device:
> VFIO_MAP_DMA failed: Invalid argument
> vfio_dma_map(0x557b2f2736d0, 0x3800, 0x100, 0x7f98ac40) = -22 
> (Invalid argument)
> 
> The above failures are mainly seen on some Intel platforms where
> the physical address width is larger than the Host's IOMMU
> address width. In these cases, VFIO fails to map the MMIO regions
> because the IOVAs would be larger than the IOMMU aperture regions.
> 
> Therefore, one way to solve this problem would be to ensure that
> cpu->phys_bits = 
> This can be done by parsing the IOMMU caps value from sysfs and
> extracting the address width and using it to override the
> phys_bits value as shown in this patch.
> 
> Previous attempt at solving this issue in OVMF:
> https://edk2.groups.io/g/devel/topic/102359124
> 
> Cc: Gerd Hoffmann 
> Cc: Philippe Mathieu-Daudé 
> Cc: Alex Williamson 
> Cc: Cédric Le Goater 
> Cc: Laszlo Ersek 
> Cc: Dongwon Kim 
> Acked-by: Gerd Hoffmann 
> Tested-by: Yanghang Liu 
> Signed-off-by: Vivek Kasireddy 
> 
> ---
> v2:
> - Replace the term passthrough with assigned (Laszlo)

v1 of the patch was posted in last November; I've now re-read my
(superficial) comments from back then.

Acked-by: Laszlo Ersek

Re: [PATCH 3/3] tests/acpi: Update virt/SSDT.memhp

2024-01-19 Thread Laszlo Ersek

On 1/19/24 15:29, Peter Maydell wrote:
> On Mon, 15 Jan 2024 at 04:35, Bin Meng  wrote:
>>
>> The Arm dtb changes caused an address change:
>>
>>  DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
>>  {
>>  [ ... ]
>> -Name (MEMA, 0x43C8)
>> +Name (MEMA, 0x43D8)
>>  }
>>
>> Signed-off-by: Bin Meng 
>>
>> ---
> 
> You should follow up (with Laszlo?) to make sure we understand
> why reducing the size of the generated dtb has caused this
> change in the ACPI tables. In particular, if we made the
> dtb *smaller* why has the allocated address here got *larger*?

As a very roughly stated trait (i.e., I'm not claiming this is an exact,
hard rule), the UEFI memory allocator hands out chunks top-down. An
earlier allocation (such as the DTB's) shrinking is consistent with
further allocations being serviced at higher addresses.

> 
> This particular bit of the ACPI tables does seem to be
> annoyingly unstable, though -- for instance commit 55abfc1ffbe54c0
> we had to change this figure when we updated to a newer EDK2
> version, and similarly commit 5f88dd43d0 for the same reason.
> I wonder if we can or should make our data-check be more
> loose about the address reported here, given what Laszlo
> says about how we're basically looking at the address of some
> memory the guest allocated. (cc'd the bios-tables-test
> maintainers for their opinion.)

Right, the allocation address is generally unpredictable. (That's why
the ACPI linker/loader "language" had to be extended with an extra
command, for the sake of the vmgenid device -- so that the firmware
could send the allocation GPA back to QEMU in an "architected" way.)

> 
> I'm also a little concerned that if the ACPI generated
> tables care about the dtb size then we're now going to
> have a situation where any patch we make to the virt board
> that changes the generated dtb at all will result in the
> ACPI tables changing. That would be annoying.

This is generally inevitable, it's just how the ACPI linker/loader
works. The guest allocator can only work with the memory map it gets
from QEMU. The same effect is triggered BTW if you don't change the DTB
but change (on the QEMU command line) the guest RAM size. The ACPI
tables will be allocated at different addresses than before, and so the
pointer fields in other tables, to those tables, will also change.

> 
> Finally, if we do need to update the reference data in
> tests/data/acpi, there is a multi-stage procedure for
> this, documented in the comment at the top of
> tests/qtest/bios-tables-test.c -- basically you need
> first to have a patch that says "ignore discrepancies in
> these files", then the patch that makes the actual change to
> QEMU (in this case your patch 2 in this series), then the
> patch which updates the reference data and removes the files
> from the ignore-this list. (It is because this is a bit of a
> pain that I definitely don't want "any small change to the dtb"
> to turn into "ACPI tables change"...)

Laszlo

Re: [PATCH 3/3] tests/acpi: Update virt/SSDT.memhp

2024-01-15 Thread Laszlo Ersek

On 1/15/24 15:46, Bin Meng wrote:
> On Mon, Jan 15, 2024 at 7:40 PM Alex Bennée  wrote:
>>
>> Bin Meng  writes:
>>
>>> The Arm dtb changes caused an address change:
>>>
>>>  DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
>>>  {
>>>  [ ... ]
>>> -Name (MEMA, 0x43C8)
>>> +Name (MEMA, 0x43D8)
>>>  }
>>
>> I'm confused by why this changes. Isn't this declaring the size of a
>> NVDIMM region of the memory map? Why does a DTB change affect an ACPI
>> based boot?
>>
> 
> I have no idea too. I suspect that's because the AllocateAlignedPages
> call to allocate a 1 MiB aligned address in the BiosTableTest.c is
> affected by the shrinked DTB now.
> 
> + Laszlo who might know the root cause.

Just speculating:

from "docs/specs/acpi_nvdimm.rst":

Memory:
   QEMU uses BIOS Linker/loader feature to ask BIOS to allocate a memory
   page and dynamically patch its address into an int32 object named "MEMA"
   in ACPI.

Therefore any QEMU-side change that affects memory allocations in the guest may 
affect the ACPI contents (captured later).

I don't know what the DTB change at hand was, but if (for example) the DTB has 
grown significantly, that could lead to this. The guest firmware stashes a 
dynamically allocated copy of the DTB, early on in the PEI phase. Some growth 
there may change the initial memory map of the DXE phase, which could affect 
the ACPI linker/loader's allocation operations.

If you can attach the DTB before-after, and the *verbose* firmware log 
before-after, we might find out finer details.

Laszlo

Re: [PATCH 05/16] hw/uefi: add var-service-core.c

2023-12-08 Thread Laszlo Ersek

On 11/22/23 17:30, Gerd Hoffmann wrote:
>   Hi,
> 
>> - in general, we should filter out surrogate code points, for any use.
>> any UCS2 string from the guest that contains a surrogate code point
>> should be considered invalid, and the request should be rejected based
>> just on that.
> 
> Something like this?

yes please (except I'd recommend s/outlaw/reject/ in the comment)

Thanks
laszlo

> 
> edk2 seems to be inconsistent with strings, sometimes they are expected
> to include a terminating '\0' char (most of the time), sometimes not
> (in variable policies for example).
> 
> gboolean uefi_str_is_valid(const uint16_t *str, size_t len,
>gboolean must_be_null_terminated)
> {
> size_t pos = 0;
> 
> for (;;) {
> if (pos == len) {
> if (must_be_null_terminated) {
> return false;
> } else {
> return true;
> }
> }
> switch (str[pos]) {
> case 0:
> /* end of string */
> return true;
> ;;
> case 0xd800 ... 0xdfff:
> /* outlaw surrogates */
> return false;
> default:
> /* char is good, check next */
> break;
> }
> pos++;
> }
> }
> 
> take care,
>   Gerd
>

Re: [PATCH 05/16] hw/uefi: add var-service-core.c

2023-11-22 Thread Laszlo Ersek

On 11/15/23 16:12, Gerd Hoffmann wrote:
> This is the core code for guest <-> host communication.  This accepts
> request messages from the guest, dispatches them to the service called,
> and sends back the response message.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/uefi/var-service-core.c | 350 +
>  1 file changed, 350 insertions(+)
>  create mode 100644 hw/uefi/var-service-core.c

If other reviewers don't object, I'd like to request a respin (of the
series) at this point, and I'll stop reviewing this version here:

- This patch is too large for me. It does migration, UCS2 string
utilities, tracing and device code all in one. I think it should be
split in at least three patches (the changes can go in the same "core" C
source file, but in smaller building blocks).

- in general, we should filter out surrogate code points, for any use.
any UCS2 string from the guest that contains a surrogate code point
should be considered invalid, and the request should be rejected based
just on that.

- after splitting, the resultant parts of this patch should be moved
near the end of the series. It references a bunch of helper functions
(which is fine), but for mentally resolving those, in particular for
understanding the life cycles of the various "uefi_vars_state" fields
*here*, I first need to see the internals of the helper functions. If I
jump back and forth between the patches, that's unwieldy for
interrupting the review one day & resuming it another day. The patches
should be included in "topological order" (dependency order) in the
series. The series already follows this idea roughly, AFAICT, but the
placement of this particular patch seems to stick out.

- life cycle comments on the "uefi_vars_state" fields would be
appreciated in the header file, too.

- a tidbit: "uefi_vars_policies_clear" should be spelled
"uefi_var_policies_clear" (i.e., "var" in singular), for consistency
with the field "var_policies" (it's not called "vars_policies").

Thanks,
Laszlo

> 
> diff --git a/hw/uefi/var-service-core.c b/hw/uefi/var-service-core.c
> new file mode 100644
> index ..b37f5c403d2f
> --- /dev/null
> +++ b/hw/uefi/var-service-core.c
> @@ -0,0 +1,350 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * uefi vars device
> + */
> +#include "qemu/osdep.h"
> +#include "sysemu/dma.h"
> +#include "migration/vmstate.h"
> +
> +#include "hw/uefi/var-service.h"
> +#include "hw/uefi/var-service-api.h"
> +#include "hw/uefi/var-service-edk2.h"
> +
> +#include "trace/trace-hw_uefi.h"
> +
> +static int uefi_vars_pre_load(void *opaque)
> +{
> +uefi_vars_state *uv = opaque;
> +
> +uefi_vars_clear_all(uv);
> +uefi_vars_policies_clear(uv);
> +g_free(uv->buffer);
> +return 0;
> +}
> +
> +static int uefi_vars_post_load(void *opaque, int version_id)
> +{
> +uefi_vars_state *uv = opaque;
> +
> +uefi_vars_update_storage(uv);
> +uv->buffer = g_malloc(uv->buf_size);
> +return 0;
> +}
> +
> +const VMStateDescription vmstate_uefi_vars = {
> +.name = "uefi-vars",
> +.pre_load = uefi_vars_pre_load,
> +.post_load = uefi_vars_post_load,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT16(sts, uefi_vars_state),
> +VMSTATE_UINT32(buf_size, uefi_vars_state),
> +VMSTATE_UINT32(buf_addr_lo, uefi_vars_state),
> +VMSTATE_UINT32(buf_addr_hi, uefi_vars_state),
> +VMSTATE_BOOL(end_of_dxe, uefi_vars_state),
> +VMSTATE_BOOL(ready_to_boot, uefi_vars_state),
> +VMSTATE_BOOL(exit_boot_service, uefi_vars_state),
> +VMSTATE_BOOL(policy_locked, uefi_vars_state),
> +VMSTATE_UINT64(used_storage, uefi_vars_state),
> +VMSTATE_QTAILQ_V(variables, uefi_vars_state, 0,
> + vmstate_uefi_variable, uefi_variable, next),
> +VMSTATE_QTAILQ_V(var_policies, uefi_vars_state, 0,
> + vmstate_uefi_var_policy, uefi_var_policy, next),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
> +size_t uefi_strlen(const uint16_t *str, size_t len)
> +{
> +size_t pos = 0;
> +
> +for (;;) {
> +if (pos == len) {
> +return pos;
> +}
> +if (str[pos] == 0) {
> +return pos;
> +}
> +pos++;
> +}
> +}
> +
> +gboolean uefi_str_equal(const uint16_t *a, size_t alen,
> +const uint16_t *b, size_t blen)
> +{
> +size_t pos = 0;
> +
> +alen = alen / 2;
> +blen = blen / 2;
> +for (;;) {
> +if (pos == alen && pos == blen) {
> +return true;
> +}
> +if (pos == alen && b[pos] == 0) {
> +return true;
> +}
> +if (pos == blen && a[pos] == 0) {
> +return true;
> +}
> +if (pos == alen || pos == blen) {
> +return false;
> +}
> +if (a[pos] == 0 && b[pos] == 0) {
> +return true;
> +}
> +if (a[pos] != b[pos]) {
> +

Re: [PATCH 00/16] hw/uefi: add uefi variable service

2023-11-21 Thread Laszlo Ersek

On 11/20/23 17:50, Gerd Hoffmann wrote:
> On Mon, Nov 20, 2023 at 12:53:45PM +0100, Alexander Graf wrote:
>> Hey Gerd!
>>
>> On 15.11.23 16:12, Gerd Hoffmann wrote:
>>> This patch adds a virtual device to qemu which the uefi firmware can use
>>> to store variables.  This moves the UEFI variable management from
>>> privileged guest code (managing vars in pflash) to the host.  Main
>>> advantage is that the need to have privilege separation in the guest
>>> goes away.
>>>
>>> On x86 privileged guest code runs in SMM.  It's supported by kvm, but
>>> not liked much by various stakeholders in cloud space due to the
>>> complexity SMM emulation brings.
>>>
>>> On arm privileged guest code runs in el3 (aka secure world).  This is
>>> not supported by kvm, which is unlikely to change anytime soon given
>>> that even el2 support (nested virt) is being worked on for years and is
>>> not yet in mainline.
>>>
>>> The design idea is to reuse the request serialization protocol edk2 uses
>>> for communication between SMM and non-SMM code, so large chunks of the
>>> edk2 variable driver stack can be used unmodified.  Only the driver
>>> which traps into SMM mode must be replaced by a driver which talks to
>>> qemu instead.
>>
>>
>> I'm not sure I like the split :). If we cut things off at the SMM
>> communication layer, we still have a lot of code inside the Runtime Services
>> (RTS) code that is edk2 specific which means we're tying ourselves tightly
>> to the edk2 data format.
> 
> Which data format?
> 
> Request serialization format?  Yes.  I can't see what is wrong with
> that.

... I can't even see what's wrong with *that*. Realistically /
practically, I think only edk2 has been considered as guest UEFI
firmware for QEMU/KVM virtual machines, as far as upstream projects go.
It's certainly edk2 that's bundled with QEMU.

My understanding is that firmware is just considered a part of the
virtualization platform, so teaching edk2 specifics to QEMU doesn't seem
wrong. (As long as we have the personpower to maintain interoperability.)

> We need one anyway, and I don't see why inventing a new one is
> any better than the one we already have in edk2.
> 
> Variable storage format?  Nope, that is not the case.  The variable
> driver supports a cache, which I think is a read-only mapping of the
> variable store, so using that might imply we have to use edk2 storage
> format.  Didn't check in detail through.  The cache is optional, can be
> disabled at compile time using PcdEnableVariableRuntimeCache=FALSE, and
> I intentionally do not use the cache feature, exactly to avoid unwanted
> constrains to the host side implementation.
> 
>> It also means we can not easily expose UEFI
>> variables that QEMU owns,
> 
> Qemu owning variables should be no problem.  Adding monitor commands to
> read/write UEFI variables should be possible too.

This patch set is actually an improvement towards QEMU-owned variables.
Right now, all variables are internal to the guest; QEMU only has a
pflash-level view.

> 
>> which can come in very handy when implementing MOR
>> - another feature that depends on SMM today.
> 
> Have a pointer for me?  Google finds me
> https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-guard-requirements,
> which describes the variable behavior (which I think should be no
> problem to implement), but doesn't say a word about what exactly gets
> locked when enabled ...

See:

  TCG PC Client Platform
  Reset Attack Mitigation Specification

My copy is

  Family “2.0”
  Version 1.10 Revision 17
  January 21, 2019
  Published

You should find it somewhere in the download area of
.

E.g



In the past we've had a bunch of discussions / patches around this. Some
examples:

- [edk2] multiple levels of support for MOR / MORLock

http://mid.mail-archive.com/039cf353-80fb-9f20-6ad2-f52517ab4de7@redhat.com

- https://bugzilla.tianocore.org/show_bug.cgi?id=727

(see edk2 commit range listed there, too)

- commit 704b71d7e11f115a3b5b03471d6420a7a70f1585

- commit d20ae95a13e851d56c6618108b18c93526505ca2

- https://bugzilla.redhat.com/show_bug.cgi?id=1854212

- https://bugzilla.redhat.com/show_bug.cgi?id=1498159


> 
>> In EC2, we are simply serializing all variable RTS calls to the hypervisor,
> 
> The edk2 code effectively does the same (with 
> PcdEnableVariableRuntimeCache=FALSE).
> 
>> similar to the Xen implementation
>> (https://www.youtube.com/watch?v=jiR8khaECEk).
> 
> Is the Xen implementation upstream?  Can't see a xen variable driver in
> OvmfPkg.  The video is from 2019.  What is the state of this?

Not sure about the current state, but when that presentation came out,
we discussed it briefly internally. I don't have time to review that old
discussion now for the sake of potentially publishing it inside this
discussion, but for interested Red Hatters, I

Re: [PATCH 04/16] hw/uefi: add var-service-guid.c

2023-11-21 Thread Laszlo Ersek

On 11/15/23 16:12, Gerd Hoffmann wrote:
> Add variables for a bunch of GUIDs we will need.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/uefi/var-service-guid.c | 61 ++
>  1 file changed, 61 insertions(+)
>  create mode 100644 hw/uefi/var-service-guid.c
> 
> diff --git a/hw/uefi/var-service-guid.c b/hw/uefi/var-service-guid.c
> new file mode 100644
> index ..afdc15c4e7e6
> --- /dev/null
> +++ b/hw/uefi/var-service-guid.c
> @@ -0,0 +1,61 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * uefi vars device - GUIDs
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/dma.h"
> +
> +#include "hw/uefi/var-service.h"
> +
> +/* variable namespaces */
> +
> +QemuUUID EfiGlobalVariable = {
> +.data = UUID_LE(0x8be4df61, 0x93ca, 0x11d2, 0xaa, 0x0d,
> +0x00, 0xe0, 0x98, 0x03, 0x2b, 0x8c)
> +};

(1) should have asked under patch#3:

can we constify these?

> +
> +QemuUUID EfiImageSecurityDatabase = {
> +.data = UUID_LE(0xd719b2cb, 0x3d3a, 0x4596, 0xa3, 0xbc,
> +0xda, 0xd0, 0x0e, 0x67, 0x65, 0x6f)
> +};
> +
> +QemuUUID EfiCustomModeEnable = {
> +.data = UUID_LE(0xc076ec0c, 0x7028, 0x4399, 0xa0, 0x72,
> +0x71, 0xee, 0x5c, 0x44, 0x8b, 0x9f)
> +};
> +
> +QemuUUID EfiSecureBootEnableDisable = {
> +.data = UUID_LE(0xf0a30bc7, 0xaf08, 0x4556, 0x99, 0xc4,
> +0x0, 0x10, 0x9, 0xc9, 0x3a, 0x44)
> +};
> +
> +/* protocols */
> +
> +QemuUUID EfiSmmVariableProtocolGuid = {
> +.data = UUID_LE(0xed32d533, 0x99e6, 0x4209, 0x9c, 0xc0,
> +0x2d, 0x72, 0xcd, 0xd9, 0x98, 0xa7)
> +};
> +
> +QemuUUID VarCheckPolicyLibMmiHandlerGuid = {
> +.data = UUID_LE(0xda1b0d11, 0xd1a7, 0x46c4, 0x9d, 0xc9,
> +0xf3, 0x71, 0x48, 0x75, 0xc6, 0xeb)
> +};

This isn't really a  protocol, but a GUID for "mm_header.guid".

gEfiSmmVariableProtocolGuid is a bit messy in edk2 because (IIUC) it is
used for *three* purposes:

(a) it's an actual SMM protocol GUID (MmVariableServiceInitialize() --
MdeModulePkg/Universal/Variable/RuntimeDxe/VariableSmm.c),

(b) it's used as "mm_header.guid" (see the MmiHandlerRegister call in
the same location), i.e. basically an SMI "upcall" identifier

(c) it's used as a NULL interface DXE protocol GUID for notification
purposes (VariableNotifySmmReady() --
MdeModulePkg/Universal/Variable/RuntimeDxe/VariableTraditionalMm.c).

So calling gEfiSmmVariableProtocolGuid a "protocol" in this header file
is relatively justified. But calling "VarCheckPolicyLibMmiHandlerGuid" a
protocol doesn't seem justified, because (I think) it only qualifies for
usage (b).

> +
> +/* events */

More precisely, this would be "event groups"; but peaking ahead to the
next patch, I think all five of these GUIDs (protocols + events) are
just "mm_header.guid" values.

So here's what I propose:

(2) replace

  /* protocols */

with

  /* mm_header.guid values that the guest DXE/BDS phases use for
   * sending requests to management mode
   */

and

(3) replace

  /* events */

with

  /* mm_header.guid values that the guest DXE/BDS phases use for
   * reporting event groups being signaled to management mode
   */

> +
> +QemuUUID EfiEndOfDxeEventGroupGuid = {
> +.data = UUID_LE(0x02CE967A, 0xDD7E, 0x4FFC, 0x9E, 0xE7,
> +0x81, 0x0C, 0xF0, 0x47, 0x08, 0x80)
> +};

(4) I suggest consistently using either the lowercase hex characters
[a-f] or the uppercase ones [A-F] across all GUID constants in this header.


> +
> +QemuUUID EfiEventReadyToBootGuid = {
> +.data = UUID_LE(0x7CE88FB3, 0x4BD7, 0x4679, 0x87, 0xA8,
> +0xA8, 0xD8, 0xDE, 0xE5, 0x0D, 0x2B)
> +};
> +
> +QemuUUID EfiEventExitBootServicesGuid = {
> +.data = UUID_LE(0x27ABF055, 0xB1B8, 0x4C26, 0x80, 0x48,
> +0x74, 0x8F, 0x37, 0xBA, 0xA2, 0xDF)
> +};

I've made an effort to verify the constants themselves; they look good.

Thanks
Laszlo

Re: [PATCH 03/16] hw/uefi: add include/hw/uefi/var-service.h

2023-11-17 Thread Laszlo Ersek

> +void uefi_vars_json_save(uefi_vars_state *uv);
> +void uefi_vars_json_load(uefi_vars_state *uv, Error **errp);
> +
> +/* vars-service-vars.c */
> +extern const VMStateDescription vmstate_uefi_variable;
> +uefi_variable *uefi_vars_find_variable(uefi_vars_state *uv, QemuUUID guid,
> +   const uint16_t *name,
> +   uint64_t name_size);
> +void uefi_vars_set_variable(uefi_vars_state *uv, QemuUUID guid,
> +const uint16_t *name, uint64_t name_size,
> +uint32_t attributes,
> +void *data, uint64_t data_size);
> +void uefi_vars_clear_volatile(uefi_vars_state *uv);
> +void uefi_vars_clear_all(uefi_vars_state *uv);
> +void uefi_vars_update_storage(uefi_vars_state *uv);
> +uint32_t uefi_vars_mm_vars_proto(uefi_vars_state *uv);
> +
> +/* vars-service-auth.c */
> +void uefi_vars_auth_init(uefi_vars_state *uv);
> +
> +/* vars-service-policy.c */
> +extern const VMStateDescription vmstate_uefi_var_policy;
> +efi_status uefi_vars_policy_check(uefi_vars_state *uv,
> +  uefi_variable *var,
> +  gboolean is_newvar);
> +void uefi_vars_policies_clear(uefi_vars_state *uv);
> +uefi_var_policy *uefi_vars_add_policy(uefi_vars_state *uv,
> +  variable_policy_entry *pe);
> +uint32_t uefi_vars_mm_check_policy_proto(uefi_vars_state *uv);
> +
> +#endif /* QEMU_UEFI_VAR_SERVICE_H */

I guess I'll have to see these in use to think anything of them.

(I prefer a more "functional" structuring for a series, where the thing
sort of builds & works from patch#1 onwards, it's only the actual
functionality that is introduced layer by layer. But, that's not an
objection; this patch certainly works as the collection of APIs the rest
is going to implement and call later.)

Again we'll have to keep an eye on the integer types.

with some comments inserted:

Reviewed-by: Laszlo Ersek 


Laszlo

Re: [PATCH 02/16] hw/uefi: add include/hw/uefi/var-service-edk2.h

2023-11-16 Thread Laszlo Ersek

On 11/15/23 16:12, Gerd Hoffmann wrote:
> A bunch of #defines and structs copied over from edk2,
> mostly needed to decode and encode the messages in the
> communication buffer.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  include/hw/uefi/var-service-edk2.h | 184 +
>  1 file changed, 184 insertions(+)
>  create mode 100644 include/hw/uefi/var-service-edk2.h
> 
> diff --git a/include/hw/uefi/var-service-edk2.h 
> b/include/hw/uefi/var-service-edk2.h
> new file mode 100644
> index ..354b74d1d71c
> --- /dev/null
> +++ b/include/hw/uefi/var-service-edk2.h
> @@ -0,0 +1,184 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * uefi-vars device - structs and defines from edk2
> + *
> + * Note: The edk2 UINTN type has been mapped to uint64_t,
> + *   so the structs are compatible with 64bit edk2 builds.

(1) What is the failure mode if the guest runs a 32-bit DXE phase and
tries to access this device?

> + */
> +#ifndef QEMU_UEFI_VAR_SERVICE_EDK2_H
> +#define QEMU_UEFI_VAR_SERVICE_EDK2_H
> +
> +#include "qemu/uuid.h"
> +
> +#define MAX_BIT   0x8000ULL
> +#define ENCODE_ERROR(StatusCode)  (MAX_BIT | (StatusCode))
> +#define EFI_SUCCESS   0

(2) Probably better to make this 0ULL, so that its type is consistent
with that of the error codes.

(3) BTW, these error codes are not edk2-specific; they are from the UEFI
spec (Appendix D, "Status Codes"). I'm mentioning it because it might
clarify the commit message.

> +#define EFI_INVALID_PARAMETER ENCODE_ERROR(2)
> +#define EFI_UNSUPPORTED   ENCODE_ERROR(3)
> +#define EFI_BAD_BUFFER_SIZE   ENCODE_ERROR(4)
> +#define EFI_BUFFER_TOO_SMALL  ENCODE_ERROR(5)

(4) any particular reason for skipping NOT_READY (6) and DEVICE_ERROR (7)?

(If this file only defines status codes that the code actually uses,
that's best!)

> +#define EFI_WRITE_PROTECTED   ENCODE_ERROR(8)
> +#define EFI_OUT_OF_RESOURCES  ENCODE_ERROR(9)
> +#define EFI_NOT_FOUND ENCODE_ERROR(14)
> +#define EFI_ACCESS_DENIED ENCODE_ERROR(15)
> +#define EFI_ALREADY_STARTED   ENCODE_ERROR(20)
> +
> +#define EFI_VARIABLE_NON_VOLATILE   0x01
> +#define EFI_VARIABLE_BOOTSERVICE_ACCESS 0x02
> +#define EFI_VARIABLE_RUNTIME_ACCESS 0x04
> +#define EFI_VARIABLE_HARDWARE_ERROR_RECORD  0x08
> +#define EFI_VARIABLE_AUTHENTICATED_WRITE_ACCESS 0x10  // 
> deprecated
> +#define EFI_VARIABLE_TIME_BASED_AUTHENTICATED_WRITE_ACCESS  0x20
> +#define EFI_VARIABLE_APPEND_WRITE   0x40
> +
> +/* SecureBootEnable */

(5) L"SecureBootEnable"

(6) mentioning the variable namespace GUID might be worthwhile as well
(gEfiSecureBootEnableDisableGuid -- F0A30BC7-AF08-4556-99C4-001009C93A44)

(7) this variable is an edk2 extension indeed; might be worth mentioning

> +#define SECURE_BOOT_ENABLE 1
> +#define SECURE_BOOT_DISABLE0
> +
> +/* SecureBoot */

(8) L"SecureBoot"

(9) this one is standard, so the namespace GUID is not important to
mention -- but maybe mention that it is standard

> +#define SECURE_BOOT_MODE_ENABLE1
> +#define SECURE_BOOT_MODE_DISABLE   0
> +
> +/* CustomMode */

the usual comments:

(10) L"CustomMode"

(11) GUID: gEfiCustomModeEnableGuid -- C076EC0C-7028-4399-A072-71EE5C448B9F

(12) edk2 extension

> +#define CUSTOM_SECURE_BOOT_MODE1
> +#define STANDARD_SECURE_BOOT_MODE  0
> +
> +/* SetupMode */

(13) L"SetupMode"

(14) standard

> +#define SETUP_MODE 1
> +#define USER_MODE  0
> +
> +typedef uint64_t efi_status;
> +typedef struct mm_header mm_header;
> +
> +/* EFI_MM_COMMUNICATE_HEADER */
> +struct mm_header {
> +QemuUUID  guid;
> +uint64_t  length;
> +};

(15) QEMU_PACKED

> +
> +/* --- EfiSmmVariableProtocol  */

(16) this is a bit too cryptic like this; what we mean here is that
mm_header.guid is gEfiSmmVariableProtocolGuid
(ED32D533-99E6-4209-9CC0-2D72CDD998A7) for the following functions

(17) do you want to define SMM_VARIABLE_COMMUNICATE_HEADER as well?
(because the function codes make sense inside that header) -- ah wait,
those are below; OK

(18) slight inconsistency: the function macros are named SMM_*, but the
types are named mm_*.

However... that inconsistency seems to be there in edk2 to as well; we
don't have (for example) "MM_VARIABLE_FUNCTION_GET_VARIABLE"

> +
> +#define SMM_VARIABLE_FUNCTION_GET_VARIABLE1
> +#define SMM_VARIABLE_FUNCTION_GET_NEXT_VARIABLE_NAME  2
> +#define SMM_VARIABLE_FUNCTION_SET_VARIABLE3
> +#define SMM_VARIABLE_FUNCTION_QUERY_VARIABLE_INFO 4
> +#define SMM_VARIABLE_FUNCTION_READY_TO_BOOT   5
> +#define SMM_VARIABLE_FUNCTION_EXIT_BOOT_SERVICE   6
> +#define SMM_VARIABLE_FUNCTION_LOCK_VARIABLE   8
> +#define SMM_VARIABLE_FUNCTION_GET_PAYLOAD_SIZE   11
> +
> +typedef struct

Re: [PATCH 01/16] hw/uefi: add include/hw/uefi/var-service-api.h

2023-11-16 Thread Laszlo Ersek

On 11/15/23 16:12, Gerd Hoffmann wrote:
> This file defines the register interface of the uefi-vars device.
> It's only a handful of registers: magic value, command and status
> registers, location and size of the communication buffer.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  include/hw/uefi/var-service-api.h | 40 +++
>  1 file changed, 40 insertions(+)
>  create mode 100644 include/hw/uefi/var-service-api.h
> 
> diff --git a/include/hw/uefi/var-service-api.h 
> b/include/hw/uefi/var-service-api.h
> new file mode 100644
> index ..37fdab32741f
> --- /dev/null
> +++ b/include/hw/uefi/var-service-api.h
> @@ -0,0 +1,40 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * uefi-vars device - API of the virtual device for guest/host communication.
> + */
> +#ifndef QEMU_UEFI_VAR_SERVICE_API_H
> +#define QEMU_UEFI_VAR_SERVICE_API_H
> +
> +
> +/* isa: io range */
> +#define UEFI_VARS_IO_BASE   0x520
> +
> +/* sysbus: fdt node path */
> +#define UEFI_VARS_FDT_NODE   "qemu-uefi-vars"
> +#define UEFI_VARS_FDT_COMPAT "qemu,uefi-vars"
> +
> +/* registers */
> +#define UEFI_VARS_REG_MAGIC  0x00  /* 16 bit */
> +#define UEFI_VARS_REG_CMD_STS0x02  /* 16 bit */
> +#define UEFI_VARS_REG_BUFFER_SIZE0x04  /* 32 bit */
> +#define UEFI_VARS_REG_BUFFER_ADDR_LO 0x08  /* 32 bit */
> +#define UEFI_VARS_REG_BUFFER_ADDR_HI 0x0c  /* 32 bit */
> +#define UEFI_VARS_REGS_SIZE  0x10
> +
> +/* magic value */
> +#define UEFI_VARS_MAGIC_VALUE   0xef1
> +
> +/* command values */
> +#define UEFI_VARS_CMD_RESET  0x01
> +#define UEFI_VARS_CMD_MM 0x02
> +
> +/* status values */
> +#define UEFI_VARS_STS_SUCCESS0x00
> +#define UEFI_VARS_STS_BUSY   0x01
> +#define UEFI_VARS_STS_ERR_UNKNOWN0x10
> +#define UEFI_VARS_STS_ERR_NOT_SUPPORTED  0x11
> +#define UEFI_VARS_STS_ERR_BAD_BUFFER_SIZE0x12
> +
> +
> +#endif /* QEMU_UEFI_VAR_SERVICE_API_H */

Reviewed-by: Laszlo Ersek

Re: [PATCH v1] target/i386/host-cpu: Use IOMMU addr width for passthrough devices on Intel platforms

2023-11-14 Thread Laszlo Ersek

On 11/14/23 07:38, Kasireddy, Vivek wrote:
> Hi Laszlo,
> 
>>
>> On 11/13/23 08:32, Vivek Kasireddy wrote:
>>> A recent OVMF update has resulted in MMIO regions being placed at
>>> the upper end of the physical address space. As a result, when a
>>> Host device is passthrough'd to the Guest via VFIO, the following
>>> mapping failures occur when VFIO tries to map the MMIO regions of
>>> the device:
>>> VFIO_MAP_DMA failed: Invalid argument
>>> vfio_dma_map(0x557b2f2736d0, 0x3800, 0x100,
>> 0x7f98ac40) = -22 (Invalid argument)
>>>
>>> The above failures are mainly seen on some Intel platforms where
>>> the physical address width is larger than the Host's IOMMU
>>> address width. In these cases, VFIO fails to map the MMIO regions
>>> because the IOVAs would be larger than the IOMMU aperture regions.
>>>
>>> Therefore, one way to solve this problem would be to ensure that
>>> cpu->phys_bits = 
>>> This can be done by parsing the IOMMU caps value from sysfs and
>>> extracting the address width and using it to override the
>>> phys_bits value as shown in this patch.
>>>
>>> Previous attempt at solving this issue in OVMF:
>>> https://edk2.groups.io/g/devel/topic/102359124
>>>
>>> Cc: Gerd Hoffmann 
>>> Cc: Philippe Mathieu-Daudé 
>>> Cc: Alex Williamson 
>>> Cc: Laszlo Ersek 
>>> Cc: Dongwon Kim 
>>> Signed-off-by: Vivek Kasireddy 
>>> ---
>>>  target/i386/host-cpu.c | 61
>> +-
>>>  1 file changed, 60 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
>>> index 92ecb7254b..8326ec95bc 100644
>>> --- a/target/i386/host-cpu.c
>>> +++ b/target/i386/host-cpu.c
>>> @@ -12,6 +12,8 @@
>>>  #include "host-cpu.h"
>>>  #include "qapi/error.h"
>>>  #include "qemu/error-report.h"
>>> +#include "qemu/config-file.h"
>>> +#include "qemu/option.h"
>>>  #include "sysemu/sysemu.h"
>>>
>>>  /* Note: Only safe for use on x86(-64) hosts */
>>> @@ -51,11 +53,58 @@ static void host_cpu_enable_cpu_pm(X86CPU
>> *cpu)
>>>  env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>>>  }
>>>
>>> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error
>> **errp)
>>> +{
>>> +g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
>>> +const char *driver = qemu_opt_get(opts, "driver");
>>> +const char *device = qemu_opt_get(opts, "host");
>>> +uint32_t *iommu_phys_bits = opaque;
>>> +struct stat st;
>>> +uint64_t iommu_caps;
>>> +
>>> +/*
>>> + * Check if the user is passthroughing any devices via VFIO. We don't
>>> + * have to limit phys_bits if there are no valid passthrough devices.
>>> + */
>>> +if (g_strcmp0(driver, "vfio-pci") || !device) {
>>> +return 0;
>>> +}
>>> +
>>> +dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
>>> +if (stat(dev_path, ) < 0) {
>>> +return 0;
>>> +}
>>> +
>>> +iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap",
>> dev_path);
>>> +if (stat(iommu_path, ) < 0) {
>>> +return 0;
>>> +}
>>> +
>>> +if (g_file_get_contents(iommu_path, , NULL, NULL)) {
>>> +if (sscanf(caps, "%lx", _caps) != 1) {
>>> +return 0;
>>> +}
>>> +*iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
>>> +}
>>> +
>>> +return 0;
>>> +}
>>> +
>>> +static uint32_t host_iommu_phys_bits(void)
>>> +{
>>> +uint32_t iommu_phys_bits = 0;
>>> +
>>> +qemu_opts_foreach(qemu_find_opts("device"),
>>> +  intel_iommu_check, _phys_bits, NULL);
>>> +return iommu_phys_bits;
>>> +}
>>> +
>>>  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>>>  {
>>>  uint32_t host_phys_bits = host_cpu_phys_bits();
>>> +uint32_t iommu_phys_bits = host_iommu_phys_bits();
>>>  uint32_t phys_bits = cpu->phys_bits;
>>> -static bool warned;
>>> +stat

Re: [PATCH v1] target/i386/host-cpu: Use IOMMU addr width for passthrough devices on Intel platforms

2023-11-13 Thread Laszlo Ersek

On 11/13/23 08:32, Vivek Kasireddy wrote:
> A recent OVMF update has resulted in MMIO regions being placed at
> the upper end of the physical address space. As a result, when a
> Host device is passthrough'd to the Guest via VFIO, the following
> mapping failures occur when VFIO tries to map the MMIO regions of
> the device:
> VFIO_MAP_DMA failed: Invalid argument
> vfio_dma_map(0x557b2f2736d0, 0x3800, 0x100, 0x7f98ac40) = -22 
> (Invalid argument)
> 
> The above failures are mainly seen on some Intel platforms where
> the physical address width is larger than the Host's IOMMU
> address width. In these cases, VFIO fails to map the MMIO regions
> because the IOVAs would be larger than the IOMMU aperture regions.
> 
> Therefore, one way to solve this problem would be to ensure that
> cpu->phys_bits = 
> This can be done by parsing the IOMMU caps value from sysfs and
> extracting the address width and using it to override the
> phys_bits value as shown in this patch.
> 
> Previous attempt at solving this issue in OVMF:
> https://edk2.groups.io/g/devel/topic/102359124
> 
> Cc: Gerd Hoffmann 
> Cc: Philippe Mathieu-Daudé 
> Cc: Alex Williamson 
> Cc: Laszlo Ersek 
> Cc: Dongwon Kim 
> Signed-off-by: Vivek Kasireddy 
> ---
>  target/i386/host-cpu.c | 61 +-
>  1 file changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
> index 92ecb7254b..8326ec95bc 100644
> --- a/target/i386/host-cpu.c
> +++ b/target/i386/host-cpu.c
> @@ -12,6 +12,8 @@
>  #include "host-cpu.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
> +#include "qemu/config-file.h"
> +#include "qemu/option.h"
>  #include "sysemu/sysemu.h"
>  
>  /* Note: Only safe for use on x86(-64) hosts */
> @@ -51,11 +53,58 @@ static void host_cpu_enable_cpu_pm(X86CPU *cpu)
>  env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>  }
>  
> +static int intel_iommu_check(void *opaque, QemuOpts *opts, Error **errp)
> +{
> +g_autofree char *dev_path = NULL, *iommu_path = NULL, *caps = NULL;
> +const char *driver = qemu_opt_get(opts, "driver");
> +const char *device = qemu_opt_get(opts, "host");
> +uint32_t *iommu_phys_bits = opaque;
> +struct stat st;
> +uint64_t iommu_caps;
> +
> +/*
> + * Check if the user is passthroughing any devices via VFIO. We don't
> + * have to limit phys_bits if there are no valid passthrough devices.
> + */
> +if (g_strcmp0(driver, "vfio-pci") || !device) {
> +return 0;
> +}
> +
> +dev_path = g_strdup_printf("/sys/bus/pci/devices/%s", device);
> +if (stat(dev_path, ) < 0) {
> +return 0;
> +}
> +
> +iommu_path = g_strdup_printf("%s/iommu/intel-iommu/cap", dev_path);
> +if (stat(iommu_path, ) < 0) {
> +return 0;
> +}
> +
> +if (g_file_get_contents(iommu_path, , NULL, NULL)) {
> +if (sscanf(caps, "%lx", _caps) != 1) {
> +return 0;
> +}
> +*iommu_phys_bits = ((iommu_caps >> 16) & 0x3f) + 1;
> +}
> +
> +return 0;
> +}
> +
> +static uint32_t host_iommu_phys_bits(void)
> +{
> +uint32_t iommu_phys_bits = 0;
> +
> +qemu_opts_foreach(qemu_find_opts("device"),
> +  intel_iommu_check, _phys_bits, NULL);
> +return iommu_phys_bits;
> +}
> +
>  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>  {
>  uint32_t host_phys_bits = host_cpu_phys_bits();
> +uint32_t iommu_phys_bits = host_iommu_phys_bits();
>  uint32_t phys_bits = cpu->phys_bits;
> -static bool warned;
> +static bool warned, warned2;
>  
>  /*
>   * Print a warning if the user set it to a value that's not the
> @@ -78,6 +127,16 @@ static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>  }
>  }
>  
> +if (iommu_phys_bits && phys_bits > iommu_phys_bits) {
> +phys_bits = iommu_phys_bits;
> +if (!warned2) {
> +warn_report("Using physical bits (%u)"
> +" to prevent VFIO mapping failures",
> +iommu_phys_bits);
> +warned2 = true;
> +}
> +}
> +
>  return phys_bits;
>  }
>  

I only have very superficial comments here (sorry about that -- I find
it too bad that this QEMU source file seems to have no designated
reviewer or maintainer in QEMU, so I don't want to ignore it).

- Terminology: I think we like to call these devices "assigned", and not
"passed through". Also, in noun form, "device assignment" and not
"device passthrough". Sorry about being pedantic.

- As I (may have) mentioned in my OVMF comments, I'm unsure if narrowing
the VCPU "phys address bits" property due to host IOMMU limitations is a
good design. To me it feels like hacking one piece of information into
another (unrelated) piece of information. It vaguely makes me think
we're going to regret this later. But I don't have any specific, current
counter-argument, admittedly.

Laszlo

Re: [PATCH v2 0/5] virtio-gpu: add blob migration support

2023-11-07 Thread Laszlo Ersek

On 11/6/23 11:44, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Sep 19, 2023 at 7:09 PM Peter Xu  wrote:
>>
>> On Tue, Sep 19, 2023 at 04:51:21PM +0400, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Thu, Sep 7, 2023 at 5:15 PM  wrote:

 From: Marc-André Lureau 

 Hi,

 This is a follow-up of the previous patch "[PATCH] virtio-gpu: block 
 migration
 of VMs with blob=true". Now that migration support is implemented, we can 
 decide
 to drop the migration blocker patch, or apply and revert it, so that
 backporting of a quick fix is made easier.

 Fixes:
 https://bugzilla.redhat.com/show_bug.cgi?id=2236353

 Marc-André Lureau (5):
   virtio-gpu: block migration of VMs with blob=true
   virtio-gpu: factor out restore mapping
   virtio-gpu: move scanout restoration to post_load
   virtio-gpu: add virtio-gpu/blob vmstate subsection
   Revert "virtio-gpu: block migration of VMs with blob=true"

  hw/display/virtio-gpu.c | 174 +---
  1 file changed, 146 insertions(+), 28 deletions(-)
>>
>> For migration:
>>
>> Acked-by: Peter Xu 
>>
> 
> Anyone else to check this series? Laszlo perhaps?

Sorry, I don't have the patches, plus I'm already loaded with other
reviews elsehwere :/

> Or should I just send it as part of the next gpu-stuff PR?

You appear to have an ACK from Peter; I'd say run with it. Gerd is on CC
so I'm comfortable saying this.

Laszlo

Re: [PATCH 3/3] hw/arm/virt: allow creation of a second NonSecure UART

2023-10-24 Thread Laszlo Ersek

On 10/23/23 18:15, Peter Maydell wrote:
> For some use-cases, it is helpful to have more than one UART
> available to the guest. If the second UART slot is not already
> used for a TrustZone Secure-World-only UART, create it as a
> NonSecure UART only when the user provides a serial backend
> (e.g. via a second -serial command line option).
> 
> This avoids problems where existing guest software only expects
> a single UART, and gets confused by the second UART in the DTB.
> The major example of this is older EDK2 firmware, which will
> send the GRUB bootloader output to UART1 and the guest
> serial output to UART0. Users who want to use both UARTs
> with a guest setup including EDK2 are advised to update
> to a newer EDK2.
> 
> TODO: give specifics of which EDK2 version has this fix,
> once the patches which fix EDK2 are upstream.

The patches should hopefully land in edk2-stable202311 (i.e., in the
November release).

The new ArmVirtQemu behavior is as follows:

- just one UART: same as before

- two UARTs: the UEFI console is on the "chosen" UART, and the edk2
DEBUG log is on the "first non-chosen" UART (i.e., on the "other" UART,
in practice).

series
Tested-by: Laszlo Ersek 

Thanks
Laszlo


> 
> Inspired-by: Axel Heider 
> Signed-off-by: Peter Maydell 
> ---
> This patch was originally based on the one from Axel Heider
> that aimed to do the same thing:
> https://lore.kernel.org/qemu-devel/166990501232.22022.1658256124453401108...@git.sr.ht/
> but by the time I had added the ACPI support and dealt with
> the EDK2 compatibility awkwardness, I found I had pretty
> much rewritten it. So this combination of author and tags
> seemed to me the most appropriate, but I'm happy to adjust
> if people (esp. Axel!) would prefer otherwise.
> 
> It is in theory possible to slightly work around the
> incorrect behaviour of old EDK2 binaries by listing the
> two UARTs in the opposite order in the DTB. However since
> old EDK2 ends up using the two UARTs in different orders
> depending on which phase of boot it is in (and in particular
> with EDK2 debug builds debug messages go to a mix of both
> UARTs) this doesn't seem worthwhile. I think most users
> who are interested in the second UART are likely to be
> using a bare-metal or direct Linux boot anyway.
> ---
>  docs/system/arm/virt.rst |  6 +-
>  include/hw/arm/virt.h|  1 +
>  hw/arm/virt-acpi-build.c | 12 
>  hw/arm/virt.c| 38 +++---
>  4 files changed, 49 insertions(+), 8 deletions(-)
> 
> diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
> index e1697ac8f48..028d2416d5b 100644
> --- a/docs/system/arm/virt.rst
> +++ b/docs/system/arm/virt.rst
> @@ -26,7 +26,7 @@ The virt board supports:
>  
>  - PCI/PCIe devices
>  - Flash memory
> -- One PL011 UART
> +- Either one or two PL011 UARTs for the NonSecure World
>  - An RTC
>  - The fw_cfg device that allows a guest to obtain data from QEMU
>  - A PL061 GPIO controller
> @@ -48,6 +48,10 @@ The virt board supports:
>- A secure flash memory
>- 16MB of secure RAM
>  
> +The second NonSecure UART only exists if a backend is configured
> +explicitly (e.g. with a second -serial command line option) and
> +TrustZone emulation is not enabled.
> +
>  Supported guest CPU types:
>  
>  - ``cortex-a7`` (32-bit)
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 0de58328b2f..da15eb342bd 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -150,6 +150,7 @@ struct VirtMachineState {
>  bool ras;
>  bool mte;
>  bool dtb_randomness;
> +bool second_ns_uart_present;
>  OnOffAuto acpi;
>  VirtGICType gic_version;
>  VirtIOMMUType iommu;
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 54f26640982..b812f33c929 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -77,11 +77,11 @@ static void acpi_dsdt_add_cpus(Aml *scope, 
> VirtMachineState *vms)
>  }
>  
>  static void acpi_dsdt_add_uart(Aml *scope, const MemMapEntry *uart_memmap,
> -   uint32_t uart_irq)
> +   uint32_t uart_irq, int uartidx)
>  {
> -Aml *dev = aml_device("COM0");
> +Aml *dev = aml_device("COM%d", uartidx);
>  aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0011")));
> -aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +aml_append(dev, aml_name_decl("_UID", aml_int(uartidx)));
>  
>  Aml *crs = aml_resource_template();
>  aml_append(crs, aml_memory32_fixed(uart_memmap->

Re: [PATCH v3 0/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-18 Thread Laszlo Ersek

Hi Michael,

still waiting for you to pick this up, please.

In
<http://mid.mail-archive.com/20231004122927-mutt-send-email-mst@kernel.org>,
you wrote:

> OK. I'll need to do another PR soonish since a bunch of patchsets
> which I wanted in this PR had issues and I had to drop them.
> v3 will be there.

(Alt. link:
<https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg01164.html>.)

That was on 04 Oct 2023 -- exactly two weeks ago :(

Stefan, can you perhaps apply this v3 series directly from the list?

Thanks,
Laszlo

On 10/2/23 22:32, Laszlo Ersek wrote:
> v2:
> 
> - http://mid.mail-archive.com/20230830134055.106812-1-lersek@redhat.com
> - 
> https://patchwork.ozlabs.org/project/qemu-devel/cover/20230830134055.106812-1-ler...@redhat.com/
> 
> v3 picks up tags from Phil, Eugenio and Albert, and updates the commit
> message on patch#7 according to Eugenio's comments.
> 
> Retested.
> 
> Laszlo Ersek (7):
>   vhost-user: strip superfluous whitespace
>   vhost-user: tighten "reply_supported" scope in "set_vring_addr"
>   vhost-user: factor out "vhost_user_write_sync"
>   vhost-user: flatten "enforce_reply" into "vhost_user_write_sync"
>   vhost-user: hoist "write_sync", "get_features", "get_u64"
>   vhost-user: allow "vhost_set_vring" to wait for a reply
>   vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously
> 
>  hw/virtio/vhost-user.c | 216 ++--
>  1 file changed, 108 insertions(+), 108 deletions(-)
> 
> 
> base-commit: 36e9aab3c569d4c9ad780473596e18479838d1aa

Re: [PATCH v5 3/3] hw/vfio: add ramfb migration support

2023-10-09 Thread Laszlo Ersek

On 10/9/23 08:32, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on.
> 
> Turn it off by default on machines <= 8.1 for compatibility reasons.
> 
> Signed-off-by: Marc-André Lureau 
> Reviewed-by: Laszlo Ersek 
> ---
>  hw/vfio/pci.h |  3 +++
>  hw/core/machine.c |  1 +
>  hw/vfio/display.c | 21 +
>  hw/vfio/pci.c | 44 
>  stubs/ramfb.c |  2 ++
>  5 files changed, 71 insertions(+)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 2d836093a8..fd06695542 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -173,6 +173,7 @@ struct VFIOPCIDevice {
>  bool no_kvm_ioeventfd;
>  bool no_vfio_ioeventfd;
>  bool enable_ramfb;
> +OnOffAuto ramfb_migrate;
>  bool defer_kvm_irq_routing;
>  bool clear_parent_atomics_on_exit;
>  VFIODisplay *dpy;
> @@ -226,4 +227,6 @@ void vfio_display_reset(VFIOPCIDevice *vdev);
>  int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
>  void vfio_display_finalize(VFIOPCIDevice *vdev);
>  
> +extern const VMStateDescription vfio_display_vmstate;
> +
>  #endif /* HW_VFIO_VFIO_PCI_H */
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 6305f2d7a4..05aef2cf9f 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -35,6 +35,7 @@
>  GlobalProperty hw_compat_8_1[] = {
>  { TYPE_PCI_BRIDGE, "x-pci-express-writeable-slt-bug", "true" },
>  { "ramfb", "x-migrate", "off" },
> +{ "vfio-pci-nohotplug", "x-ramfb-migrate", "off" }
>  };
>  const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>  
> diff --git a/hw/vfio/display.c b/hw/vfio/display.c
> index bec864f482..2739ba56ec 100644
> --- a/hw/vfio/display.c
> +++ b/hw/vfio/display.c
> @@ -542,3 +542,24 @@ void vfio_display_finalize(VFIOPCIDevice *vdev)
>  vfio_display_edid_exit(vdev->dpy);
>  g_free(vdev->dpy);
>  }
> +
> +static bool migrate_needed(void *opaque)
> +{
> +VFIODisplay *dpy = opaque;
> +bool ramfb_exists = dpy->ramfb != NULL;
> +
> +/* see vfio_display_migration_needed() */
> +assert(ramfb_exists);
> +return ramfb_exists;
> +}
> +
> +const VMStateDescription vfio_display_vmstate = {
> +.name = "VFIODisplay",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = migrate_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_STRUCT_POINTER(ramfb, VFIODisplay, ramfb_vmstate, 
> RAMFBState),
> +VMSTATE_END_OF_LIST(),
> +}
> +};
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 3b2ca3c24c..e44ed21180 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2608,6 +2608,32 @@ static bool vfio_msix_present(void *opaque, int 
> version_id)
>  return msix_present(pdev);
>  }
>  
> +static bool vfio_display_migration_needed(void *opaque)
> +{
> +VFIOPCIDevice *vdev = opaque;
> +
> +/*
> + * We need to migrate the VFIODisplay object if ramfb *migration* was
> + * explicitly requested (in which case we enforced both ramfb=on and
> + * display=on), or ramfb migration was left at the default "auto"
> + * setting, and *ramfb* was explicitly requested (in which case we
> + * enforced display=on).
> + */
> +return vdev->ramfb_migrate == ON_OFF_AUTO_ON ||
> +(vdev->ramfb_migrate == ON_OFF_AUTO_AUTO && vdev->enable_ramfb);
> +}
> +
> +const VMStateDescription vmstate_vfio_display = {
> +.name = "VFIOPCIDevice/VFIODisplay",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = vfio_display_migration_needed,
> +.fields = (VMStateField[]){
> +VMSTATE_STRUCT_POINTER(dpy, VFIOPCIDevice, vfio_display_vmstate, 
> VFIODisplay),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  const VMStateDescription vmstate_vfio_pci_config = {
>  .name = "VFIOPCIDevice",
>  .version_id = 1,
> @@ -2616,6 +2642,10 @@ const VMStateDescription vmstate_vfio_pci_config = {
>  VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
>  VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present),
>  VMSTATE_END_OF_LIST()
> +},
> +.subsections = (const VMStateDescription*[]) {
> +_vfio_display,
> +NULL
>  }
>  };
>  
> @@ -3271,6 +3301,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  }
>  }
>  
> +if (vdev->ramfb_migrate == ON_OFF_AUTO_ON && !vd

Re: [PATCH v4 3/3] hw/vfio: add ramfb migration support

2023-10-05 Thread Laszlo Ersek

On 10/5/23 18:34, Cédric Le Goater wrote:
> On 10/5/23 13:30, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on.
>>
>> Turn it off by default on machines <= 8.1 for compatibility reasons.
>>
>> Signed-off-by: Marc-André Lureau 
>> ---
>>   hw/vfio/pci.h |  3 +++
>>   hw/core/machine.c |  1 +
>>   hw/vfio/display.c | 20 
>>   hw/vfio/pci.c | 44 
>>   stubs/ramfb.c |  2 ++
>>   5 files changed, 70 insertions(+)
>>
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 2d836093a8..fd06695542 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -173,6 +173,7 @@ struct VFIOPCIDevice {
>>   bool no_kvm_ioeventfd;
>>   bool no_vfio_ioeventfd;
>>   bool enable_ramfb;
>> +    OnOffAuto ramfb_migrate;
>>   bool defer_kvm_irq_routing;
>>   bool clear_parent_atomics_on_exit;
>>   VFIODisplay *dpy;
>> @@ -226,4 +227,6 @@ void vfio_display_reset(VFIOPCIDevice *vdev);
>>   int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
>>   void vfio_display_finalize(VFIOPCIDevice *vdev);
>>   +extern const VMStateDescription vfio_display_vmstate;
>> +
>>   #endif /* HW_VFIO_VFIO_PCI_H */
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index e4361e3d48..f2c59a293c 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -33,6 +33,7 @@
>>     GlobalProperty hw_compat_8_1[] = {
>>   { "ramfb", "x-migrate", "off" },
>> +    { "vfio-pci-nohotplug", "x-ramfb-migrate", "off" }
>>   };
>>   const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>>   diff --git a/hw/vfio/display.c b/hw/vfio/display.c
>> index bec864f482..0bdb807642 100644
>> --- a/hw/vfio/display.c
>> +++ b/hw/vfio/display.c
>> @@ -542,3 +542,23 @@ void vfio_display_finalize(VFIOPCIDevice *vdev)
>>   vfio_display_edid_exit(vdev->dpy);
>>   g_free(vdev->dpy);
>>   }
>> +
>> +static bool migrate_needed(void *opaque)
>> +{
>> +    VFIODisplay *dpy = opaque;
>> +    bool ramfb_exists = dpy->ramfb != NULL;
>> +
>> +    /* see vfio_display_migration_needed() */
>> +    assert(ramfb_exists);
>> +    return ramfb_exists;
>> +}
>> +
>> +const VMStateDescription vfio_display_vmstate = {
>> +    .name = "VFIODisplay",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = migrate_needed,
>> +    .fields = (VMStateField[]) {
>> +    VMSTATE_STRUCT_POINTER(ramfb, VFIODisplay, ramfb_vmstate,
>> RAMFBState),
>> +    }
>> +};
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 3b2ca3c24c..d2ede2f1a2 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2608,6 +2608,32 @@ static bool vfio_msix_present(void *opaque, int
>> version_id)
>>   return msix_present(pdev);
>>   }
>>   +static bool vfio_display_migration_needed(void *opaque)
>> +{
>> +    VFIOPCIDevice *vdev = opaque;
>> +
>> +    /*
>> + * We need to migrate the VFIODisplay object if ramfb *migration*
>> was
>> + * explicitly requested (in which case we enforced both ramfb=on and
>> + * display=on), or ramfb migration was left at the default "auto"
>> + * setting, and *ramfb* was explicitly requested (in which case we
>> + * enforced display=on).
>> + */
>> +    return vdev->ramfb_migrate == ON_OFF_AUTO_ON ||
>> +    (vdev->ramfb_migrate == ON_OFF_AUTO_AUTO &&
>> vdev->enable_ramfb);> +}
>> +
>> +const VMStateDescription vmstate_vfio_display = {
>> +    .name = "VFIOPCIDevice/VFIODisplay",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = vfio_display_migration_needed,
>> +    .fields = (VMStateField[]){
>> +    VMSTATE_STRUCT_POINTER(dpy, VFIOPCIDevice,
>> vfio_display_vmstate, VFIODisplay),
>> +    VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>>   const VMStateDescription vmstate_vfio_pci_config = {
>>   .name = "VFIOPCIDevice",
>>   .version_id = 1,
>> @@ -2616,6 +2642,10 @@ const VMStateDescription
>> vmstate_vfio_pci_config = {
>>   VMSTATE_PCI_DEVICE(pdev, VFIOPCIDevice),
>>   VMSTATE_MSIX_TEST(pdev, VFIOPCIDevice, vfio_msix_present),
>>   VMSTATE_END_OF_LIST()
>> +    },
>> +    .subsections = (const VMStateDescription*[]) {
>> +    _vfio_display,
>> +    NULL
>>   }
>>   };
>>   @@ -3271,6 +3301,19 @@ static void vfio_realize(PCIDevice *pdev,
>> Error **errp)
>>   }
>>   }
>>   +    if (vdev->ramfb_migrate == ON_OFF_AUTO_ON &&
>> !vdev->enable_ramfb) {
>> +    error_setg(errp, "x-ramfb-migrate requires ramfb=on");
> 
> This is a case where QEMU would be run from the command line. Could we
> ouput a warning and force "ramfb_migrate" to OFF in that case ? since
> the machine would still run.

Sounds like a valid idea to me:

- consistency between x-ramfb-migrate and ramfb would be maintained

- x-* properties are not meant as user interface, so QEMU doesn't
guarantee anything about them, AIUI

Laszlo

> 
> Thanks,
> 
> C.
> 
> 
>> +

Re: [PATCH v4 3/3] hw/vfio: add ramfb migration support

2023-10-05 Thread Laszlo Ersek

}
> +if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> +if (vdev->ramfb_migrate == ON_OFF_AUTO_AUTO) {
> +vdev->ramfb_migrate = ON_OFF_AUTO_OFF;
> +} else if (vdev->ramfb_migrate == ON_OFF_AUTO_ON) {
> +error_setg(errp, "x-ramfb-migrate requires enable-migration");
> +goto out_deregister;
> +}
> +}
> +
>  if (!pdev->failover_pair_id) {
>  if (!vfio_migration_realize(vbasedev, errp)) {
>  goto out_deregister;
> @@ -3484,6 +3527,7 @@ static const TypeInfo vfio_pci_dev_info = {
>  
>  static Property vfio_pci_dev_nohotplug_properties[] = {
>  DEFINE_PROP_BOOL("ramfb", VFIOPCIDevice, enable_ramfb, false),
> +DEFINE_PROP_ON_OFF_AUTO("x-ramfb-migrate", VFIOPCIDevice, ramfb_migrate, 
> ON_OFF_AUTO_AUTO),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/stubs/ramfb.c b/stubs/ramfb.c
> index 48143f3354..cf64733b10 100644
> --- a/stubs/ramfb.c
> +++ b/stubs/ramfb.c
> @@ -2,6 +2,8 @@
>  #include "qapi/error.h"
>  #include "hw/display/ramfb.h"
>  
> +const VMStateDescription ramfb_vmstate = {};
> +
>  void ramfb_display_update(QemuConsole *con, RAMFBState *s)
>  {
>  }

Reviewed-by: Laszlo Ersek

Re: [PATCH v4 1/3] ramfb: add migration support

2023-10-05 Thread Laszlo Ersek

On 10/5/23 13:30, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Implementing RAMFB migration is quite straightforward. One caveat is to
> treat the whole RAMFBCfg as a blob, since that's what is exposed to the
> guest directly. This avoid having to fiddle with endianness issues if we
> were to migrate fields individually as integers.
> 
> The devices using RAMFB will have to include ramfb_vmstate in their
> migration description.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  include/hw/display/ramfb.h |  4 
>  hw/display/ramfb.c | 19 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/hw/display/ramfb.h b/include/hw/display/ramfb.h
> index b33a2c467b..a7e0019144 100644
> --- a/include/hw/display/ramfb.h
> +++ b/include/hw/display/ramfb.h
> @@ -1,11 +1,15 @@
>  #ifndef RAMFB_H
>  #define RAMFB_H
>  
> +#include "migration/vmstate.h"
> +
>  /* ramfb.c */
>  typedef struct RAMFBState RAMFBState;
>  void ramfb_display_update(QemuConsole *con, RAMFBState *s);
>  RAMFBState *ramfb_setup(Error **errp);
>  
> +extern const VMStateDescription ramfb_vmstate;
> +
>  /* ramfb-standalone.c */
>  #define TYPE_RAMFB_DEVICE "ramfb"
>  
> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
> index c2b002d534..477ef7272a 100644
> --- a/hw/display/ramfb.c
> +++ b/hw/display/ramfb.c
> @@ -28,6 +28,8 @@ struct QEMU_PACKED RAMFBCfg {
>  uint32_t stride;
>  };
>  
> +typedef struct RAMFBCfg RAMFBCfg;
> +
>  struct RAMFBState {
>  DisplaySurface *ds;
>  uint32_t width, height;
> @@ -116,6 +118,23 @@ void ramfb_display_update(QemuConsole *con, RAMFBState 
> *s)
>  dpy_gfx_update_full(con);
>  }
>  
> +static int ramfb_post_load(void *opaque, int version_id)
> +{
> +ramfb_fw_cfg_write(opaque, 0, 0);
> +return 0;
> +}
> +
> +const VMStateDescription ramfb_vmstate = {
> +.name = "ramfb",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.post_load = ramfb_post_load,
> +.fields = (VMStateField[]) {
> +VMSTATE_BUFFER_UNSAFE(cfg, RAMFBState, 0, sizeof(RAMFBCfg)),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  RAMFBState *ramfb_setup(Error **errp)
>  {
>  FWCfgState *fw_cfg = fw_cfg_find();

Identical to v3, which I reviewed, so:

Reviewed-by: Laszlo Ersek

Re: [PATCH v4 2/3] ramfb-standalone: add migration support

2023-10-05 Thread Laszlo Ersek

On 10/5/23 13:30, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Add a "ramfb-dev" section whenever "x-migrate" is turned on. Turn it off
> by default on machines <= 8.1 for compatibility reasons.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/core/machine.c |  4 +++-
>  hw/display/ramfb-standalone.c | 27 +++
>  2 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 281ef0dccd..e4361e3d48 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -31,7 +31,9 @@
>  #include "hw/virtio/virtio-net.h"
>  #include "audio/audio.h"
>  
> -GlobalProperty hw_compat_8_1[] = {};
> +GlobalProperty hw_compat_8_1[] = {
> +{ "ramfb", "x-migrate", "off" },
> +};
>  const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>  
>  GlobalProperty hw_compat_8_0[] = {
> diff --git a/hw/display/ramfb-standalone.c b/hw/display/ramfb-standalone.c
> index 8c0094397f..a96e7ebcd9 100644
> --- a/hw/display/ramfb-standalone.c
> +++ b/hw/display/ramfb-standalone.c
> @@ -1,4 +1,5 @@
>  #include "qemu/osdep.h"
> +#include "migration/vmstate.h"
>  #include "qapi/error.h"
>  #include "qemu/module.h"
>  #include "hw/loader.h"
> @@ -15,6 +16,7 @@ struct RAMFBStandaloneState {
>  SysBusDevice parent_obj;
>  QemuConsole *con;
>  RAMFBState *state;
> +bool migrate;
>  };
>  
>  static void display_update_wrapper(void *dev)
> @@ -40,14 +42,39 @@ static void ramfb_realizefn(DeviceState *dev, Error 
> **errp)
>  ramfb->state = ramfb_setup(errp);
>  }
>  
> +static bool migrate_needed(void *opaque)
> +{
> +RAMFBStandaloneState *ramfb = RAMFB(opaque);
> +
> +return ramfb->migrate;
> +}
> +
> +static const VMStateDescription ramfb_dev_vmstate = {
> +.name = "ramfb-dev",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = migrate_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_STRUCT_POINTER(state, RAMFBStandaloneState, ramfb_vmstate, 
> RAMFBState),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static Property ramfb_properties[] = {
> +DEFINE_PROP_BOOL("x-migrate", RAMFBStandaloneState, migrate,  true),
> +DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static void ramfb_class_initfn(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
> +dc->vmsd = _dev_vmstate;
>  dc->realize = ramfb_realizefn;
>  dc->desc = "ram framebuffer standalone device";
>  dc->user_creatable = true;
> +device_class_set_props(dc, ramfb_properties);
>  }
>  
>  static const TypeInfo ramfb_info = {

Identical to v3, which I reviewed, so:

Reviewed-by: Laszlo Ersek

Re: [PATCH v4 0/3] WIP: ramfb: migration support

2023-10-05 Thread Laszlo Ersek

On 10/5/23 16:16, Laszlo Ersek wrote:
> On 10/5/23 14:01, Cédric Le Goater wrote:
>> On 10/5/23 13:30, marcandre.lur...@redhat.com wrote:
>>> From: Marc-André Lureau 
>>>
>>> Hi,
>>>
>>> Implement RAMFB migration, and add properties to enable it only on >= 8.2
>>> machines, + a few related cleanups.
>>>
>>> Cedric, did you get the chance to test the VFIO display/ramfb code?
>>
>> Nope. I was busy with VFIO stuff. I haven't even read Laszlo's
>> email yet. I will try this or next week.
>>
>> That said, could we avoid adding another migration property in
>> VFIOPCIDevice and use the available "enable-migration" ?
> 
> I'm not entirely sure, but I suspect we can't / shouldn't do that.
> "x-ramfb-migrate" is effectively a machine type compat prop, so if it
> doesn't *precisely* line up with enable-migration (i.e., if they aren't
> equivalent), then we shouldn't merge them. AFAICT, a 8.1 machine type
> may have "enable-migration" set, but it should still have
> "x-ramfb-migrate" clear.

or more precisely, not clear, but "auto".

Laszlo

Re: [PATCH v4 0/3] WIP: ramfb: migration support

2023-10-05 Thread Laszlo Ersek

On 10/5/23 14:01, Cédric Le Goater wrote:
> On 10/5/23 13:30, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Hi,
>>
>> Implement RAMFB migration, and add properties to enable it only on >= 8.2
>> machines, + a few related cleanups.
>>
>> Cedric, did you get the chance to test the VFIO display/ramfb code?
> 
> Nope. I was busy with VFIO stuff. I haven't even read Laszlo's
> email yet. I will try this or next week.
> 
> That said, could we avoid adding another migration property in
> VFIOPCIDevice and use the available "enable-migration" ?

I'm not entirely sure, but I suspect we can't / shouldn't do that.
"x-ramfb-migrate" is effectively a machine type compat prop, so if it
doesn't *precisely* line up with enable-migration (i.e., if they aren't
equivalent), then we shouldn't merge them. AFAICT, a 8.1 machine type
may have "enable-migration" set, but it should still have
"x-ramfb-migrate" clear.

Laszlo

Re: [PATCH v3 1/7] vhost-user: strip superfluous whitespace

2023-10-04 Thread Laszlo Ersek

On 10/4/23 14:54, Michael S. Tsirkin wrote:
> On Wed, Oct 04, 2023 at 12:08:52PM +0200, Laszlo Ersek wrote:
>> On 10/4/23 11:06, Michael S. Tsirkin wrote:
>>> On Mon, Oct 02, 2023 at 10:32:15PM +0200, Laszlo Ersek wrote:
>>>> Cc: "Michael S. Tsirkin"  (supporter:vhost)
>>>
>>> why the (supporter:vhost) part? not all scripts will cope
>>> well with text after the mail. If you really want to keep
>>> it around, I think you should add a hash tag # before that -
>>> more tools know to ignore that.
>>
>> It looked too tiresome to strip all these comments, plus I expected
>> that, if the get_maintainer.pl script output these lines, they were fit
>> for inclusion in "Cc:" tags in the commit message.
>>
>> If they're not, then the tool should indeed insert a # in-between, or
>> else provide the explanation for each name+email printed on separate
>> (preceding) lines, potentially prefixed with "#". That makes for easy
>> human reading and also for easy machine reading (filtering them out).
>>
>> Laszlo
> 
> /me shrugs
> 
> get_maintainer.pl doesn't output Cc tags either. Just pipe to
> sed 's/(.*//' ?

Yes, I'll seek to remember that.

Laszlo

> 
>>>
>>>
>>>> Cc: Eugenio Perez Martin 
>>>> Cc: German Maglione 
>>>> Cc: Liu Jiang 
>>>> Cc: Sergio Lopez Pascual 
>>>> Cc: Stefano Garzarella 
>>>> Signed-off-by: Laszlo Ersek 
>>>> Reviewed-by: Stefano Garzarella 
>>>> Reviewed-by: Philippe Mathieu-Daudé 
>>>> Tested-by: Albert Esteve 
>>>> Reviewed-by: Eugenio Pérez 
>>>> ---
>>>>
>>>> Notes:
>>>> v3:
>>>> 
>>>> - pick up R-b from Phil and Eugenio, T-b from Albert
>>>> 
>>>> v2:
>>>> 
>>>> - pick up Stefano's R-b
>>>>
>>>>  hw/virtio/vhost-user.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>>> index 8dcf049d422b..b4b677c1ce66 100644
>>>> --- a/hw/virtio/vhost-user.c
>>>> +++ b/hw/virtio/vhost-user.c
>>>> @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, 
>>>> VhostUserMsg *msg,
>>>>   * operations such as configuring device memory mappings or issuing 
>>>> device
>>>>   * resets, which affect the whole device instead of individual VQs,
>>>>   * vhost-user messages should only be sent once.
>>>> - * 
>>>> + *
>>>>   * Devices with multiple vhost_devs are given an associated 
>>>> dev->vq_index
>>>>   * so per_device requests are only sent if vq_index is 0.
>>>>   */
>>>>
>>>
>

Re: [PULL 30/63] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-04 Thread Laszlo Ersek

On 10/4/23 14:53, Michael S. Tsirkin wrote:
> On Wed, Oct 04, 2023 at 12:11:44PM +0200, Laszlo Ersek wrote:
>> On 10/4/23 10:44, Michael S. Tsirkin wrote:
>>> From: Laszlo Ersek 
>>>
>>> (1) The virtio-1.2 specification
>>> <http://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html> writes:
>>>
>>>> 3 General Initialization And Device Operation
>>>> 3.1   Device Initialization
>>>> 3.1.1 Driver Requirements: Device Initialization
>>>>
>>>> [...]
>>>>
>>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>>the device, optional per-bus setup, reading and possibly writing the
>>>>device’s virtio configuration space, and population of virtqueues.
>>>>
>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>
>>> and
>>>
>>>> 4 Virtio Transport Options
>>>> 4.1   Virtio Over PCI Bus
>>>> 4.1.4 Virtio Structure PCI Capabilities
>>>> 4.1.4.3   Common configuration structure layout
>>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>>
>>>> [...]
>>>>
>>>> The driver MUST configure the other virtqueue fields before enabling the
>>>> virtqueue with queue_enable.
>>>>
>>>> [...]
>>>
>>> (The same statements are present in virtio-1.0 identically, at
>>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html>.)
>>>
>>> These together mean that the following sub-sequence of steps is valid for
>>> a virtio-1.0 guest driver:
>>>
>>> (1.1) set "queue_enable" for the needed queues as the final part of device
>>> initialization step (7),
>>>
>>> (1.2) set DRIVER_OK in step (8),
>>>
>>> (1.3) immediately start sending virtio requests to the device.
>>>
>>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>>> special virtio feature is negotiated, then virtio rings start in disabled
>>> state, according to
>>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
>>> enabling vrings.
>>>
>>> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
>>> operation, which travels from the guest through QEMU to the vhost-user
>>> backend, using a unix domain socket.
>>>
>>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>>> evades QEMU -- it travels from guest to the vhost-user backend via
>>> eventfd.
>>>
>>> This means that steps (1.1) and (1.3) travel through different channels,
>>> and their relative order can be reversed, as perceived by the vhost-user
>>> backend.
>>>
>>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
>>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>>> crate.)
>>>
>>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>>> device initialization steps (i.e., control plane operations), and
>>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>>> operation). In the Rust-language virtiofsd, this creates a race between
>>> two components that run *concurrently*, i.e., in different threads or
>>> processes:
>>>
>>> - Control plane, handling vhost-user protocol messages:
>>>
>>>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>>   [crates/vhost-user-backend/src/handler.rs] handles
>>>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>>>   flag according to the message processed.
>>>
>>> - Data plane, handling virtio / FUSE requests:
>>>
>>>   The "VringEpollHandler::handle_event" method
>>>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>>>   virtio / FUSE request, consuming the virtio kick at the same time. If
>>>   the vring's "enabled" flag is set, the virtio / FUSE request is
>>>   processed genuinely. If the vring's "enabled" flag is clear, then the
>>>   virtio / FUSE request is discarded.
>>>
>>>

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-04 Thread Laszlo Ersek

On 10/3/23 17:55, Stefan Hajnoczi wrote:
> On Tue, 3 Oct 2023 at 10:41, Michael S. Tsirkin  wrote:
>>
>> On Sun, Aug 27, 2023 at 08:29:37PM +0200, Laszlo Ersek wrote:
>>> (1) The virtio-1.0 specification
>>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:
>>>
>>>> 3 General Initialization And Device Operation
>>>> 3.1   Device Initialization
>>>> 3.1.1 Driver Requirements: Device Initialization
>>>>
>>>> [...]
>>>>
>>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>>the device, optional per-bus setup, reading and possibly writing the
>>>>device’s virtio configuration space, and population of virtqueues.
>>>>
>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>
>>> and
>>>
>>>> 4 Virtio Transport Options
>>>> 4.1   Virtio Over PCI Bus
>>>> 4.1.4 Virtio Structure PCI Capabilities
>>>> 4.1.4.3   Common configuration structure layout
>>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>>
>>>> [...]
>>>>
>>>> The driver MUST configure the other virtqueue fields before enabling the
>>>> virtqueue with queue_enable.
>>>>
>>>> [...]
>>>
>>> These together mean that the following sub-sequence of steps is valid for
>>> a virtio-1.0 guest driver:
>>>
>>> (1.1) set "queue_enable" for the needed queues as the final part of device
>>> initialization step (7),
>>>
>>> (1.2) set DRIVER_OK in step (8),
>>>
>>> (1.3) immediately start sending virtio requests to the device.
>>>
>>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>>> special virtio feature is negotiated, then virtio rings start in disabled
>>> state, according to
>>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
>>> enabling vrings.
>>>
>>> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
>>> operation, which travels from the guest through QEMU to the vhost-user
>>> backend, using a unix domain socket.
>>>
>>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>>> evades QEMU -- it travels from guest to the vhost-user backend via
>>> eventfd.
>>>
>>> This means that steps (1.1) and (1.3) travel through different channels,
>>> and their relative order can be reversed, as perceived by the vhost-user
>>> backend.
>>>
>>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
>>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>>> crate.)
>>>
>>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>>> device initialization steps (i.e., control plane operations), and
>>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>>> operation). In the Rust-language virtiofsd, this creates a race between
>>> two components that run *concurrently*, i.e., in different threads or
>>> processes:
>>>
>>> - Control plane, handling vhost-user protocol messages:
>>>
>>>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>>   [crates/vhost-user-backend/src/handler.rs] handles
>>>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>>>   flag according to the message processed.
>>>
>>> - Data plane, handling virtio / FUSE requests:
>>>
>>>   The "VringEpollHandler::handle_event" method
>>>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>>>   virtio / FUSE request, consuming the virtio kick at the same time. If
>>>   the vring's "enabled" flag is set, the virtio / FUSE request is
>>>   processed genuinely. If the vring's "enabled" flag is clear, then the
>>>   virtio / FUSE request is discarded.
>>>
>>> Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
>>> However, if the data plane processor in virtiofsd wins the race, then it
>>> sees the FUSE_INIT *before* the control plane processor to

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-04 Thread Laszlo Ersek

On 10/3/23 17:55, Stefan Hajnoczi wrote:
> On Tue, 3 Oct 2023 at 10:41, Michael S. Tsirkin  wrote:
>>
>> On Sun, Aug 27, 2023 at 08:29:37PM +0200, Laszlo Ersek wrote:
>>> (1) The virtio-1.0 specification
>>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:
>>>
>>>> 3 General Initialization And Device Operation
>>>> 3.1   Device Initialization
>>>> 3.1.1 Driver Requirements: Device Initialization
>>>>
>>>> [...]
>>>>
>>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>>the device, optional per-bus setup, reading and possibly writing the
>>>>device’s virtio configuration space, and population of virtqueues.
>>>>
>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>
>>> and
>>>
>>>> 4 Virtio Transport Options
>>>> 4.1   Virtio Over PCI Bus
>>>> 4.1.4 Virtio Structure PCI Capabilities
>>>> 4.1.4.3   Common configuration structure layout
>>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>>
>>>> [...]
>>>>
>>>> The driver MUST configure the other virtqueue fields before enabling the
>>>> virtqueue with queue_enable.
>>>>
>>>> [...]
>>>
>>> These together mean that the following sub-sequence of steps is valid for
>>> a virtio-1.0 guest driver:
>>>
>>> (1.1) set "queue_enable" for the needed queues as the final part of device
>>> initialization step (7),
>>>
>>> (1.2) set DRIVER_OK in step (8),
>>>
>>> (1.3) immediately start sending virtio requests to the device.
>>>
>>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>>> special virtio feature is negotiated, then virtio rings start in disabled
>>> state, according to
>>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
>>> enabling vrings.
>>>
>>> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
>>> operation, which travels from the guest through QEMU to the vhost-user
>>> backend, using a unix domain socket.
>>>
>>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>>> evades QEMU -- it travels from guest to the vhost-user backend via
>>> eventfd.
>>>
>>> This means that steps (1.1) and (1.3) travel through different channels,
>>> and their relative order can be reversed, as perceived by the vhost-user
>>> backend.
>>>
>>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
>>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>>> crate.)
>>>
>>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>>> device initialization steps (i.e., control plane operations), and
>>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>>> operation). In the Rust-language virtiofsd, this creates a race between
>>> two components that run *concurrently*, i.e., in different threads or
>>> processes:
>>>
>>> - Control plane, handling vhost-user protocol messages:
>>>
>>>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>>   [crates/vhost-user-backend/src/handler.rs] handles
>>>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>>>   flag according to the message processed.
>>>
>>> - Data plane, handling virtio / FUSE requests:
>>>
>>>   The "VringEpollHandler::handle_event" method
>>>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>>>   virtio / FUSE request, consuming the virtio kick at the same time. If
>>>   the vring's "enabled" flag is set, the virtio / FUSE request is
>>>   processed genuinely. If the vring's "enabled" flag is clear, then the
>>>   virtio / FUSE request is discarded.
>>>
>>> Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
>>> However, if the data plane processor in virtiofsd wins the race, then it
>>> sees the FUSE_INIT *before* the control plane processor to

Re: [PULL 30/63] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-04 Thread Laszlo Ersek

On 10/4/23 10:44, Michael S. Tsirkin wrote:
> From: Laszlo Ersek 
> 
> (1) The virtio-1.2 specification
> <http://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html> writes:
> 
>> 3 General Initialization And Device Operation
>> 3.1   Device Initialization
>> 3.1.1 Driver Requirements: Device Initialization
>>
>> [...]
>>
>> 7. Perform device-specific setup, including discovery of virtqueues for
>>the device, optional per-bus setup, reading and possibly writing the
>>device’s virtio configuration space, and population of virtqueues.
>>
>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> 
> and
> 
>> 4 Virtio Transport Options
>> 4.1   Virtio Over PCI Bus
>> 4.1.4 Virtio Structure PCI Capabilities
>> 4.1.4.3   Common configuration structure layout
>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>
>> [...]
>>
>> The driver MUST configure the other virtqueue fields before enabling the
>> virtqueue with queue_enable.
>>
>> [...]
> 
> (The same statements are present in virtio-1.0 identically, at
> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html>.)
> 
> These together mean that the following sub-sequence of steps is valid for
> a virtio-1.0 guest driver:
> 
> (1.1) set "queue_enable" for the needed queues as the final part of device
> initialization step (7),
> 
> (1.2) set DRIVER_OK in step (8),
> 
> (1.3) immediately start sending virtio requests to the device.
> 
> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
> special virtio feature is negotiated, then virtio rings start in disabled
> state, according to
> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
> enabling vrings.
> 
> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
> operation, which travels from the guest through QEMU to the vhost-user
> backend, using a unix domain socket.
> 
> Whereas sending a virtio request (1.3) is a *data plane* operation, which
> evades QEMU -- it travels from guest to the vhost-user backend via
> eventfd.
> 
> This means that steps (1.1) and (1.3) travel through different channels,
> and their relative order can be reversed, as perceived by the vhost-user
> backend.
> 
> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
> against the Rust-language virtiofsd version 1.7.2. (Which uses version
> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
> crate.)
> 
> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
> device initialization steps (i.e., control plane operations), and
> immediately sends a FUSE_INIT request too (i.e., performs a data plane
> operation). In the Rust-language virtiofsd, this creates a race between
> two components that run *concurrently*, i.e., in different threads or
> processes:
> 
> - Control plane, handling vhost-user protocol messages:
> 
>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>   [crates/vhost-user-backend/src/handler.rs] handles
>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>   flag according to the message processed.
> 
> - Data plane, handling virtio / FUSE requests:
> 
>   The "VringEpollHandler::handle_event" method
>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>   virtio / FUSE request, consuming the virtio kick at the same time. If
>   the vring's "enabled" flag is set, the virtio / FUSE request is
>   processed genuinely. If the vring's "enabled" flag is clear, then the
>   virtio / FUSE request is discarded.
> 
> Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
> However, if the data plane processor in virtiofsd wins the race, then it
> sees the FUSE_INIT *before* the control plane processor took notice of
> VHOST_USER_SET_VRING_ENABLE and green-lit the queue for the data plane
> processor. Therefore the latter drops FUSE_INIT on the floor, and goes
> back to waiting for further virtio / FUSE requests with epoll_wait.
> Meanwhile OVMF is stuck waiting for the FUSET_INIT response -- a deadlock.
> 
> The deadlock is not deterministic. OVMF hangs infrequently during first
> boot. However, OVMF hangs almost certainly during reboots from the UEFI
> shell.
> 
> The race can be "reliably masked" by inserting a very small delay -- a
> single debug message -- at the top of "VringEpollHandler::handle_

Re: [PATCH v3 1/7] vhost-user: strip superfluous whitespace

2023-10-04 Thread Laszlo Ersek

On 10/4/23 11:06, Michael S. Tsirkin wrote:
> On Mon, Oct 02, 2023 at 10:32:15PM +0200, Laszlo Ersek wrote:
>> Cc: "Michael S. Tsirkin"  (supporter:vhost)
> 
> why the (supporter:vhost) part? not all scripts will cope
> well with text after the mail. If you really want to keep
> it around, I think you should add a hash tag # before that -
> more tools know to ignore that.

It looked too tiresome to strip all these comments, plus I expected
that, if the get_maintainer.pl script output these lines, they were fit
for inclusion in "Cc:" tags in the commit message.

If they're not, then the tool should indeed insert a # in-between, or
else provide the explanation for each name+email printed on separate
(preceding) lines, potentially prefixed with "#". That makes for easy
human reading and also for easy machine reading (filtering them out).

Laszlo

> 
> 
>> Cc: Eugenio Perez Martin 
>> Cc: German Maglione 
>> Cc: Liu Jiang 
>> Cc: Sergio Lopez Pascual 
>> Cc: Stefano Garzarella 
>> Signed-off-by: Laszlo Ersek 
>> Reviewed-by: Stefano Garzarella 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Tested-by: Albert Esteve 
>> Reviewed-by: Eugenio Pérez 
>> ---
>>
>> Notes:
>> v3:
>> 
>> - pick up R-b from Phil and Eugenio, T-b from Albert
>> 
>> v2:
>> 
>> - pick up Stefano's R-b
>>
>>  hw/virtio/vhost-user.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> index 8dcf049d422b..b4b677c1ce66 100644
>> --- a/hw/virtio/vhost-user.c
>> +++ b/hw/virtio/vhost-user.c
>> @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, 
>> VhostUserMsg *msg,
>>   * operations such as configuring device memory mappings or issuing 
>> device
>>   * resets, which affect the whole device instead of individual VQs,
>>   * vhost-user messages should only be sent once.
>> - * 
>> + *
>>   * Devices with multiple vhost_devs are given an associated 
>> dev->vq_index
>>   * so per_device requests are only sent if vq_index is 0.
>>   */
>>
>

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-03 Thread Laszlo Ersek

On 10/3/23 16:25, Michael S. Tsirkin wrote:
> On Tue, Oct 03, 2023 at 03:23:24PM +0200, Laszlo Ersek wrote:
>> On 10/3/23 15:08, Stefan Hajnoczi wrote:
>>> On Tue, 3 Oct 2023 at 08:27, Michael S. Tsirkin  wrote:
>>>>
>>>> On Mon, Oct 02, 2023 at 05:13:26PM -0400, Stefan Hajnoczi wrote:
>>>>> One more question:
>>>>>
>>>>> Why is the disabled state not needed by regular (non-vhost) virtio-net 
>>>>> devices?
>>>>
>>>> Tap does the same - it purges queued packets:
>>>>
>>>> int tap_disable(NetClientState *nc)
>>>> {
>>>> TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>>> int ret;
>>>>
>>>> if (s->enabled == 0) {
>>>> return 0;
>>>> } else {
>>>> ret = tap_fd_disable(s->fd);
>>>> if (ret == 0) {
>>>> qemu_purge_queued_packets(nc);
>>>> s->enabled = false;
>>>> tap_update_fd_handler(s);
>>>> }
>>>> return ret;
>>>> }
>>>> }
>>>
>>> tap_disable() is not equivalent to the vhost-user "started but
>>> disabled" ring state. tap_disable() is a synchronous one-time action,
>>> while "started but disabled" is a continuous state.
>>>
>>> The "started but disabled" ring state isn't needed to achieve this.
>>> The back-end can just drop tx buffers upon receiving
>>> VHOST_USER_SET_VRING_ENABLE .num=0.
>>>
>>> The history of the spec is curious. VHOST_USER_SET_VRING_ENABLE was
>>> introduced before the the "started but disabled" state was defined,
>>> and it explicitly mentions tap attach/detach:
>>>
>>> commit 7263a0ad784b719ebed736a1119cc2e08110
>>> Author: Changchun Ouyang 
>>> Date:   Wed Sep 23 12:20:01 2015 +0800
>>>
>>> vhost-user: add a new message to disable/enable a specific virt queue.
>>>
>>> Add a new message, VHOST_USER_SET_VRING_ENABLE, to enable or disable
>>> a specific virt queue, which is similar to attach/detach queue for
>>> tap device.
>>>
>>> and then later:
>>>
>>> commit c61f09ed855b5009f816242ce281fd01586d4646
>>> Author: Michael S. Tsirkin 
>>> Date:   Mon Nov 23 12:48:52 2015 +0200
>>>
>>> vhost-user: clarify start and enable
>>>
>>>>
>>>> what about non tap backends? I suspect they just aren't
>>>> used widely with multiqueue so no one noticed.
>>>
>>> I still don't understand why "started but disabled" is needed instead
>>> of just two ring states: enabled and disabled.
>>>
>>> It seems like the cleanest path going forward is to keep the "ignore
>>> rx, discard tx" semantics for virtio-net devices but to clarify in the
>>> spec that other device types do not process the ring:
>>>
>>> "
>>> * started but disabled: the back-end must not process the ring. For legacy
>>>   reasons there is an exception for the networking device, where the
>>>   back-end must process and discard any TX packets and not process
>>>   other rings.
>>> "
>>>
>>> What do you think?
>>
>> ... from a vhost-user backend perspective, won't this create a need for
>> all "ring processor" (~ virtio event loop) implementations to support
>> both methods? IIUC, the "virtio pop" is usually independent of the
>> particular device to which the requests are ultimately delivered. So the
>> event loop would have to grow a new parameter regarding "what to do in
>> the started-but-disabled state", the network device would have to pass
>> in one value (-> pop & drop), and all other devices would have to pass
>> in the other value (stop popping).
>>
>> ... I figure in rust-vmm/vhost it would affect the "handle_event"
>> function in "crates/vhost-user-backend/src/event_loop.rs".
>>
>> Do I understand right? (Not disagreeing, just pondering the impact on
>> backends.)
>>
>> Laszlo
> 
> Already the case I guess - RX ring is not processed, TX is. Right?
> 

Ah I see your point, this distinction must already exist in event loops.

But... as far as I can tell, it's not there in rust-vmm/vhost.

Laszlo

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-03 Thread Laszlo Ersek

On 10/3/23 15:08, Stefan Hajnoczi wrote:
> On Tue, 3 Oct 2023 at 08:27, Michael S. Tsirkin  wrote:
>>
>> On Mon, Oct 02, 2023 at 05:13:26PM -0400, Stefan Hajnoczi wrote:
>>> One more question:
>>>
>>> Why is the disabled state not needed by regular (non-vhost) virtio-net 
>>> devices?
>>
>> Tap does the same - it purges queued packets:
>>
>> int tap_disable(NetClientState *nc)
>> {
>> TAPState *s = DO_UPCAST(TAPState, nc, nc);
>> int ret;
>>
>> if (s->enabled == 0) {
>> return 0;
>> } else {
>> ret = tap_fd_disable(s->fd);
>> if (ret == 0) {
>> qemu_purge_queued_packets(nc);
>> s->enabled = false;
>> tap_update_fd_handler(s);
>> }
>> return ret;
>> }
>> }
> 
> tap_disable() is not equivalent to the vhost-user "started but
> disabled" ring state. tap_disable() is a synchronous one-time action,
> while "started but disabled" is a continuous state.
> 
> The "started but disabled" ring state isn't needed to achieve this.
> The back-end can just drop tx buffers upon receiving
> VHOST_USER_SET_VRING_ENABLE .num=0.
> 
> The history of the spec is curious. VHOST_USER_SET_VRING_ENABLE was
> introduced before the the "started but disabled" state was defined,
> and it explicitly mentions tap attach/detach:
> 
> commit 7263a0ad784b719ebed736a1119cc2e08110
> Author: Changchun Ouyang 
> Date:   Wed Sep 23 12:20:01 2015 +0800
> 
> vhost-user: add a new message to disable/enable a specific virt queue.
> 
> Add a new message, VHOST_USER_SET_VRING_ENABLE, to enable or disable
> a specific virt queue, which is similar to attach/detach queue for
> tap device.
> 
> and then later:
> 
> commit c61f09ed855b5009f816242ce281fd01586d4646
> Author: Michael S. Tsirkin 
> Date:   Mon Nov 23 12:48:52 2015 +0200
> 
> vhost-user: clarify start and enable
> 
>>
>> what about non tap backends? I suspect they just aren't
>> used widely with multiqueue so no one noticed.
> 
> I still don't understand why "started but disabled" is needed instead
> of just two ring states: enabled and disabled.
> 
> It seems like the cleanest path going forward is to keep the "ignore
> rx, discard tx" semantics for virtio-net devices but to clarify in the
> spec that other device types do not process the ring:
> 
> "
> * started but disabled: the back-end must not process the ring. For legacy
>   reasons there is an exception for the networking device, where the
>   back-end must process and discard any TX packets and not process
>   other rings.
> "
> 
> What do you think?

... from a vhost-user backend perspective, won't this create a need for
all "ring processor" (~ virtio event loop) implementations to support
both methods? IIUC, the "virtio pop" is usually independent of the
particular device to which the requests are ultimately delivered. So the
event loop would have to grow a new parameter regarding "what to do in
the started-but-disabled state", the network device would have to pass
in one value (-> pop & drop), and all other devices would have to pass
in the other value (stop popping).

... I figure in rust-vmm/vhost it would affect the "handle_event"
function in "crates/vhost-user-backend/src/event_loop.rs".

Do I understand right? (Not disagreeing, just pondering the impact on
backends.)

Laszlo

Re: [PATCH v3 5/5] hw/vfio: add ramfb migration support

2023-10-03 Thread Laszlo Ersek

On 10/3/23 12:47, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Oct 3, 2023 at 2:17 PM Cédric Le Goater  wrote:
>>
>> On 10/3/23 10:56, marcandre.lur...@redhat.com wrote:
>>> From: Marc-André Lureau 
>>>
>>> Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on.
>>>
>>> Turn it off by default on machines <= 8.1 for compatibility reasons.
>>
>>
>> This change breaks linking on various platforms with :
>>
>> /usr/bin/ld: 
>> libqemu-xtensa-softmmu.fa.p/hw_vfio_display.c.o:(.data.rel+0x50): undefined 
>> reference to `ramfb_vmstate'
>>
>> Some stubs updates are missing it seems..
>>
> 
> diff --git a/stubs/ramfb.c b/stubs/ramfb.c
> index 48143f3354..cf64733b10 100644
> --- a/stubs/ramfb.c
> +++ b/stubs/ramfb.c
> @@ -2,6 +2,8 @@
>  #include "qapi/error.h"
>  #include "hw/display/ramfb.h"
> 
> +const VMStateDescription ramfb_vmstate = {};
> +
> 
> 
> And I think we should also change the "needed" condition to:
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 4689f2e5c1..b327844764 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2613,7 +2613,7 @@ static bool vfio_display_needed(void *opaque)
>  VFIOPCIDevice *vdev = opaque;
> 
>  /* the only thing that justifies the VFIODisplay sub-section atm */
> -return vdev->ramfb_migrate != ON_OFF_AUTO_OFF;
> +return vdev->enable_ramfb && vdev->ramfb_migrate != ON_OFF_AUTO_OFF;
>  }
> 

Exactly, this was one of my comments in review.

(But, see there -- I think the other formulation is easier to read and
understand.)

Laszlo

> 
> 
>> Thanks,
>>
>> C.
>>
>>>
>>> Signed-off-by: Marc-André Lureau 
>>> ---
>>>   hw/vfio/pci.h |  3 +++
>>>   hw/core/machine.c |  1 +
>>>   hw/vfio/display.c | 23 +++
>>>   hw/vfio/pci.c | 32 
>>>   4 files changed, 59 insertions(+)
>>>
>>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>>> index 2d836093a8..fd06695542 100644
>>> --- a/hw/vfio/pci.h
>>> +++ b/hw/vfio/pci.h
>>> @@ -173,6 +173,7 @@ struct VFIOPCIDevice {
>>>   bool no_kvm_ioeventfd;
>>>   bool no_vfio_ioeventfd;
>>>   bool enable_ramfb;
>>> +OnOffAuto ramfb_migrate;
>>>   bool defer_kvm_irq_routing;
>>>   bool clear_parent_atomics_on_exit;
>>>   VFIODisplay *dpy;
>>> @@ -226,4 +227,6 @@ void vfio_display_reset(VFIOPCIDevice *vdev);
>>>   int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
>>>   void vfio_display_finalize(VFIOPCIDevice *vdev);
>>>
>>> +extern const VMStateDescription vfio_display_vmstate;
>>> +
>>>   #endif /* HW_VFIO_VFIO_PCI_H */
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index 47a07d1d9b..f2f8940a85 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -32,6 +32,7 @@
>>>
>>>   GlobalProperty hw_compat_8_1[] = {
>>>   { "ramfb", "x-migrate", "off" },
>>> +{ "vfio-pci-nohotplug", "x-ramfb-migrate", "off" }
>>>   };
>>>   const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>>>
>>> diff --git a/hw/vfio/display.c b/hw/vfio/display.c
>>> index bec864f482..de5bf71dd1 100644
>>> --- a/hw/vfio/display.c
>>> +++ b/hw/vfio/display.c
>>> @@ -542,3 +542,26 @@ void vfio_display_finalize(VFIOPCIDevice *vdev)
>>>   vfio_display_edid_exit(vdev->dpy);
>>>   g_free(vdev->dpy);
>>>   }
>>> +
>>> +static bool migrate_needed(void *opaque)
>>> +{
>>> +/*
>>> + * If we are here, it's because vfio_display_needed(), which is only 
>>> true
>>> + * when dpy->ramfb_migrate atm.
>>> + *
>>> + * If the migration condition is changed, we should check here if
>>> + * ramfb_migrate is true. (this will need a way to lookup the 
>>> associated
>>> + * VFIOPCIDevice somehow, or fields to be moved, ..)
>>> + */
>>> +return true;
>>> +}
>>> +
>>> +const VMStateDescription vfio_display_vmstate = {
>>> +.name = "VFIODisplay",
>>> +.version_id = 1,
>>> +.minimum_version_id = 1,
>>> +.needed = migrate_needed,
>>> +.fields = (VMStateField[]) {
>>> +VMSTATE_STRUCT_POINTER(ramfb, VFIODisplay, ramfb_vmstate, 
>>> RAMFBState),
>>> +}
>>> +};
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 3b2ca3c24c..4689f2e5c1 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -2608,6 +2608,25 @@ static bool vfio_msix_present(void *opaque, int 
>>> version_id)
>>>   return msix_present(pdev);
>>>   }
>>>
>>> +static bool vfio_display_needed(void *opaque)
>>> +{
>>> +VFIOPCIDevice *vdev = opaque;
>>> +
>>> +/* the only thing that justifies the VFIODisplay sub-section atm */
>>> +return vdev->ramfb_migrate != ON_OFF_AUTO_OFF;
>>> +}
>>> +
>>> +const VMStateDescription vmstate_vfio_display = {
>>> +.name = "VFIOPCIDevice/VFIODisplay",
>>> +.version_id = 1,
>>> +.minimum_version_id = 1,
>>> +.needed = vfio_display_needed,
>>> +.fields = (VMStateField[]){
>>> +VMSTATE_STRUCT_POINTER(dpy, VFIOPCIDevice, vfio_display_vmstate, 
>>> VFIODisplay),
>>> +VMSTATE_END_OF_LIST()
>>> +}
>>> +};
>>> +
>>>

Re: [PATCH v3 5/5] hw/vfio: add ramfb migration support

2023-10-03 Thread Laszlo Ersek

On 10/3/23 10:56, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
>
> Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on.
>
> Turn it off by default on machines <= 8.1 for compatibility reasons.
>
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/vfio/pci.h |  3 +++
>  hw/core/machine.c |  1 +
>  hw/vfio/display.c | 23 +++
>  hw/vfio/pci.c | 32 
>  4 files changed, 59 insertions(+)

Quoting the hunks somewhat out of order (so that I can better understand
the dependencies):

>
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 2d836093a8..fd06695542 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -173,6 +173,7 @@ struct VFIOPCIDevice {
>  bool no_kvm_ioeventfd;
>  bool no_vfio_ioeventfd;
>  bool enable_ramfb;
> +OnOffAuto ramfb_migrate;
>  bool defer_kvm_irq_routing;
>  bool clear_parent_atomics_on_exit;
>  VFIODisplay *dpy;

So this is where the complications start. The structure that contains
the "RAMFBState *ramfb" field is VFIODisplay (i.e., the "dpy" field
here), not VFIOPCIDevice. In order to stay consistent with the previous
patch in the series, we'd have to add the flag to VFIODisplay.

But it seems that VFIODisplay cannot have its own properties. Because:

- properties are associated in a TypeInfo.class_init function, with
  device_class_set_props()

- there is no TypeInfo for VFIODisplay, only for VFIOPCIDevice.

To make things even more complicated, both TYPE_VFIO_PCI and
TYPE_VFIO_PCI_NOHOTPLUG use VFIOPCIDevice directly. The latter only
differs in the exposure of the property "ramfb". Commit b290659fc3dd
("hw/vfio/display: add ramfb support", 2018-10-15) introduced the
property with TYPE_VFIO_PCI_NOHOTPLUG, *but* it squeezed the boolean
field backing the property to VFIOPCIDevice, rather than creating a
separate structure type *containing* VFIOPCIDevice *plus* the new field
"enable_ramfb".

I'm not sure if that was optimal (it may have been "unavoidable" for all
I know), but either way, now we have no choice but to follow suit, and
add the property's "backing field" to VFIOPCIDevice *again*.

OK. I kinda convinced myself this is proper. Let's move on.

> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 3b2ca3c24c..4689f2e5c1 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3484,6 +3515,7 @@ static const TypeInfo vfio_pci_dev_info = {
>
>  static Property vfio_pci_dev_nohotplug_properties[] = {
>  DEFINE_PROP_BOOL("ramfb", VFIOPCIDevice, enable_ramfb, false),
> +DEFINE_PROP_ON_OFF_AUTO("x-ramfb-migrate", VFIOPCIDevice, ramfb_migrate, 
> ON_OFF_AUTO_AUTO),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>

This is where we expose the new knob as a property, side to side with
the previously mentioned "ramfb" one. OK.

> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 47a07d1d9b..f2f8940a85 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -32,6 +32,7 @@
>
>  GlobalProperty hw_compat_8_1[] = {
>  { "ramfb", "x-migrate", "off" },
> +{ "vfio-pci-nohotplug", "x-ramfb-migrate", "off" }
>  };
>  const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>

Setting the property to "off" for old machine types, OK.

> @@ -3275,6 +3298,14 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  if (!vfio_migration_realize(vbasedev, errp)) {
>  goto out_deregister;
>  }
> +if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
> +if (vdev->ramfb_migrate == ON_OFF_AUTO_AUTO) {
> +vdev->ramfb_migrate = ON_OFF_AUTO_OFF;
> +} else if (vdev->ramfb_migrate == ON_OFF_AUTO_ON) {
> +error_setg(errp, "x-ramfb-migrate requires migration");
> +goto out_deregister;
> +}
> +}
>  }
>
>  vfio_register_err_notifier(vdev);

This looks good to me from two aspects, and not so good from two other
aspects.

- good: seems to ensure the knob consistency that Alex highlighted

- good: we enforce the "ramfb=on requires display=on" predicate too in
  the same function

- (1) not so good: with regard to error handling, I think the placement
  of this new logic could be improved, IMO. If we fail here, we're still
  past a successful vfio_migration_realize() call, and I'm not sure if
  jumping to the "out_deregister" label can undo *that*. Right now,
  nothing after the vfio_migration_realize() call can fail inside
  vfio_realize(), so the error path may not be ready for rolling back
  vfio_migration_realize().

  Looking at the larger context:

if (vdev->display_xres || vdev->display_yres) {
if (vdev->dpy == NULL) {
error_setg(errp, "xres and yres properties require display=on");
goto out_deregister;
}
if (vdev->dpy->edid_regs == NULL) {
error_setg(errp, "xres and yres properties need edid support");
goto out_deregister;
}
}

if

Re: [PATCH v3 4/5] ramfb-standalone: add migration support

2023-10-03 Thread Laszlo Ersek

On 10/3/23 10:56, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Add a "ramfb-dev" section whenever "x-migrate" is turned on. Turn it off
> by default on machines <= 8.1 for compatibility reasons.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/core/machine.c |  4 +++-
>  hw/display/ramfb-standalone.c | 27 +++
>  2 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index df40f10dfa..47a07d1d9b 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -30,7 +30,9 @@
>  #include "hw/virtio/virtio-pci.h"
>  #include "hw/virtio/virtio-net.h"
>  
> -GlobalProperty hw_compat_8_1[] = {};
> +GlobalProperty hw_compat_8_1[] = {
> +{ "ramfb", "x-migrate", "off" },
> +};
>  const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>  
>  GlobalProperty hw_compat_8_0[] = {
> diff --git a/hw/display/ramfb-standalone.c b/hw/display/ramfb-standalone.c
> index 8c0094397f..a96e7ebcd9 100644
> --- a/hw/display/ramfb-standalone.c
> +++ b/hw/display/ramfb-standalone.c
> @@ -1,4 +1,5 @@
>  #include "qemu/osdep.h"
> +#include "migration/vmstate.h"
>  #include "qapi/error.h"
>  #include "qemu/module.h"
>  #include "hw/loader.h"
> @@ -15,6 +16,7 @@ struct RAMFBStandaloneState {
>  SysBusDevice parent_obj;
>  QemuConsole *con;
>  RAMFBState *state;
> +bool migrate;
>  };
>  
>  static void display_update_wrapper(void *dev)
> @@ -40,14 +42,39 @@ static void ramfb_realizefn(DeviceState *dev, Error 
> **errp)
>  ramfb->state = ramfb_setup(errp);
>  }
>  
> +static bool migrate_needed(void *opaque)
> +{
> +RAMFBStandaloneState *ramfb = RAMFB(opaque);
> +
> +return ramfb->migrate;
> +}
> +
> +static const VMStateDescription ramfb_dev_vmstate = {
> +.name = "ramfb-dev",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = migrate_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_STRUCT_POINTER(state, RAMFBStandaloneState, ramfb_vmstate, 
> RAMFBState),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static Property ramfb_properties[] = {
> +DEFINE_PROP_BOOL("x-migrate", RAMFBStandaloneState, migrate,  true),
> +DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static void ramfb_class_initfn(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
> +dc->vmsd = _dev_vmstate;
>  dc->realize = ramfb_realizefn;
>  dc->desc = "ram framebuffer standalone device";
>  dc->user_creatable = true;
> +device_class_set_props(dc, ramfb_properties);
>  }
>  
>  static const TypeInfo ramfb_info = {

This patch (and patch #3) is *exactly* what I had in mind, when I was
racking my brain about RHBZ 1859424 a week (or a few weeks?) ago.
Specifically the VMSTATE_STRUCT_POINTER + VMSTATE_BUFFER_UNSAFE chain.

That's not to say I could have *completed* the patch myself, of course.
:) (From the many things, I didn't know that it was
VMSTATE_BUFFER_UNSAFE that we'd need -- but I certainly thought of
VMSTATE_STRUCT_POINTER for the higher-level devices.)

It's interesting that we don't have to do this in a "subsection" here --
this device used to have no VMStateDescription at all!

FWIW (patches #3 and #4):

Reviewed-by: Laszlo Ersek 

Laszlo

Re: [PATCH v2 4/5] ramfb: make migration conditional

2023-10-02 Thread Laszlo Ersek

On 10/2/23 22:38, Alex Williamson wrote:
> On Mon, 2 Oct 2023 21:41:55 +0200
> Laszlo Ersek  wrote:
> 
>> On 10/2/23 21:26, Alex Williamson wrote:
>>> On Mon, 2 Oct 2023 20:24:11 +0200
>>> Laszlo Ersek  wrote:
>>>   
>>>> On 10/2/23 16:41, Alex Williamson wrote:  
>>>>> On Mon, 2 Oct 2023 15:38:10 +0200
>>>>> Cédric Le Goater  wrote:
>>>>> 
>>>>>> On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
>>>>>>> From: Marc-André Lureau 
>>>>>>>
>>>>>>> RAMFB migration was unsupported until now, let's make it conditional.
>>>>>>> The following patch will prevent machines <= 8.1 to migrate it.
>>>>>>>
>>>>>>> Signed-off-by: Marc-André Lureau   
>>>>>> Maybe localize the new 'ramfb_migrate' attribute close to 'enable_ramfb'
>>>>>> in VFIOPCIDevice. Anyhow,
>>>>>
>>>>> Shouldn't this actually be tied to whether the device is migratable
>>>>> (which for GVT-g - the only ramfb user afaik - it's not)?  What does it
>>>>> mean to have a ramfb-migrate=true property on a device that doesn't
>>>>> support migration, or false on a device that does support migration.  I
>>>>> don't understand why this is a user controllable property.  Thanks,
>>>>
>>>> The comments in <https://bugzilla.redhat.com/show_bug.cgi?id=1859424>
>>>> (which are unfortunately not public :/ ) suggest that ramfb migration
>>>> was simply forgotten when vGPU migration was implemented. So, "now
>>>> that vGPU migration is done", this should be added.
>>>>
>>>> Comment 8 suggests that the following domain XML snippet
>>>>
>>>> >>> model='vfio-pci' display='on' ramfb='on'> 
>>>> 
>>>>   
>>>>   
>>>>   >>> function='0x0'/>   
>>>>
>>>> is migratable, but the ramfb device malfunctions on the destination
>>>> host.
>>>>
>>>> There's also a huge QEMU cmdline in comment#0 of the bug; I've not
>>>> tried to read that.
>>>>
>>>> AIUI BTW the property is not for the user to control, it's just a
>>>> compat knob for versioned machine types. AIUI those are usually
>>>> implemented with such (user-visible / -tweakable) device properties.  
>>>
>>> If it's not for user control it's unfortunate that we expose it to the
>>> user at all, but should it at least use the "x-" prefix to indicate that
>>> it's not intended to be an API?  
>>
>> I *think* it was your commit db32d0f43839 ("vfio/pci: Add option to
>> disable GeForce quirks", 2018-02-06) that hda introduced me to the "x-"
>> prefixed properties!
>>
>> For some reason though, machine type compat knobs are never named like
>> that, AFAIR.
> 
> Maybe I'm misunderstanding your comment, but it appears quite common to
> use "x-" prefix things in the compat tables...

You didn't misunderstand; I was wrong. I judged this off the compat prop
backports to RHEL that I remembered. Your examples from the tree are
good evidence.

> 
> GlobalProperty hw_compat_8_0[] = {
> { "migration", "multifd-flush-after-each-section", "on"},
> { TYPE_PCI_DEVICE, "x-pcie-ari-nextfn-1", "on" },
> { TYPE_VIRTIO_NET, "host_uso", "off"},
> { TYPE_VIRTIO_NET, "guest_uso4", "off"},
> { TYPE_VIRTIO_NET, "guest_uso6", "off"},
> };
> const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
> 
> GlobalProperty hw_compat_7_2[] = {
> { "e1000e", "migrate-timadj", "off" },
> { "virtio-mem", "x-early-migration", "false" },
> { "migration", "x-preempt-pre-7-2", "true" },
> { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
> };
> const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
> [etc]
> 
>>> It's still odd to think that we can
>>> have scenarios of a non-migratable vfio device registering a migratable
>>> ramfb, and vice versa, but I suppose in the end it doesn't matter.  
>>
>> I do think it matters! For one, if migration is not possible with
>> vfio-pci-nohotplug, then how can QE (or anyone else) *test* the patch
>> (i.e. that it makes a difference)? In that case, the ramfb_setup() call
>> from vfio-pci-nohotplug should just open-code "false" for the
>> "migratable" parameter.
> 
> Some vfio devices support migration, most don't.  I was thinking
> ramfb_setup might be called with something like:
> 
>   (vdev->ramfb_migrate && vdev->enable_migration)
> 
> so that at least the ramfb migration state matches the device, but I
> think ultimately it only saves a little bit of overhead in registering
> the vmstate, either one not supporting migration should block migration.
> 
> Hmm, since enable_migration is auto/on/off, it seems like device
> realize should fail if set to 'on' and ramfb_migrate is false.  I think
> that's the only way the device options don't become self contradictory.
> Thanks,

... easy-looking migration patchset becomes quite complex; isn't that
the story with almost all QEMU work? :)

Thanks!
Laszlo

[PATCH v3 3/7] vhost-user: factor out "vhost_user_write_sync"

2023-10-02 Thread Laszlo Ersek

The tails of the "vhost_user_set_vring_addr" and "vhost_user_set_u64"
functions are now byte-for-byte identical. Factor the common tail out to a
new function called "vhost_user_write_sync".

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
Tested-by: Albert Esteve 
Reviewed-by: Eugenio Pérez 
---

Notes:
v3:

- pick up R-b from Eugenio, T-b from Albert

v2:

- pick up R-b's from Phil and Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code and
  commit message) [Stefano]

 hw/virtio/vhost-user.c | 66 +---
 1 file changed, 28 insertions(+), 38 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 64eac317bfb2..1476b36f0a6e 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1320,10 +1320,35 @@ static int enforce_reply(struct vhost_dev *dev,
 return vhost_user_get_features(dev, );
 }
 
+/* Note: "msg->hdr.flags" may be modified. */
+static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
+ bool wait_for_reply)
+{
+int ret;
+
+if (wait_for_reply) {
+bool reply_supported = virtio_has_feature(dev->protocol_features,
+  VHOST_USER_PROTOCOL_F_REPLY_ACK);
+if (reply_supported) {
+msg->hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+}
+}
+
+ret = vhost_user_write(dev, msg, NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+if (wait_for_reply) {
+return enforce_reply(dev, msg);
+}
+
+return 0;
+}
+
 static int vhost_user_set_vring_addr(struct vhost_dev *dev,
  struct vhost_vring_addr *addr)
 {
-int ret;
 VhostUserMsg msg = {
 .hdr.request = VHOST_USER_SET_VRING_ADDR,
 .hdr.flags = VHOST_USER_VERSION,
@@ -1337,24 +1362,7 @@ static int vhost_user_set_vring_addr(struct vhost_dev 
*dev,
  */
 bool wait_for_reply = addr->flags & (1 << VHOST_VRING_F_LOG);
 
-if (wait_for_reply) {
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
-if (reply_supported) {
-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
-}
-}
-
-ret = vhost_user_write(dev, , NULL, 0);
-if (ret < 0) {
-return ret;
-}
-
-if (wait_for_reply) {
-return enforce_reply(dev, );
-}
-
-return 0;
+return vhost_user_write_sync(dev, , wait_for_reply);
 }
 
 static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64,
@@ -1366,26 +1374,8 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int 
request, uint64_t u64,
 .payload.u64 = u64,
 .hdr.size = sizeof(msg.payload.u64),
 };
-int ret;
 
-if (wait_for_reply) {
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
-if (reply_supported) {
-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
-}
-}
-
-ret = vhost_user_write(dev, , NULL, 0);
-if (ret < 0) {
-return ret;
-}
-
-if (wait_for_reply) {
-return enforce_reply(dev, );
-}
-
-return 0;
+return vhost_user_write_sync(dev, , wait_for_reply);
 }
 
 static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status)

[PATCH v3 0/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-02 Thread Laszlo Ersek

v2:

- http://mid.mail-archive.com/20230830134055.106812-1-lersek@redhat.com
- 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20230830134055.106812-1-ler...@redhat.com/

v3 picks up tags from Phil, Eugenio and Albert, and updates the commit
message on patch#7 according to Eugenio's comments.

Retested.

Laszlo Ersek (7):
  vhost-user: strip superfluous whitespace
  vhost-user: tighten "reply_supported" scope in "set_vring_addr"
  vhost-user: factor out "vhost_user_write_sync"
  vhost-user: flatten "enforce_reply" into "vhost_user_write_sync"
  vhost-user: hoist "write_sync", "get_features", "get_u64"
  vhost-user: allow "vhost_set_vring" to wait for a reply
  vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

 hw/virtio/vhost-user.c | 216 ++--
 1 file changed, 108 insertions(+), 108 deletions(-)


base-commit: 36e9aab3c569d4c9ad780473596e18479838d1aa

[PATCH v3 6/7] vhost-user: allow "vhost_set_vring" to wait for a reply

2023-10-02 Thread Laszlo Ersek

The "vhost_set_vring" function already centralizes the common parts of
"vhost_user_set_vring_num", "vhost_user_set_vring_base" and
"vhost_user_set_vring_enable". We'll want to allow some of those callers
to wait for a reply.

Therefore, rebase "vhost_set_vring" from just "vhost_user_write" to
"vhost_user_write_sync", exposing the "wait_for_reply" parameter.

This is purely refactoring -- there is no observable change. That's
because:

- all three callers pass in "false" for "wait_for_reply", which disables
  all logic in "vhost_user_write_sync" except the call to
  "vhost_user_write";

- the fds=NULL and fd_num=0 arguments of the original "vhost_user_write"
  call inside "vhost_set_vring" are hard-coded within
  "vhost_user_write_sync".

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
Tested-by: Albert Esteve 
Reviewed-by: Eugenio Pérez 
---

Notes:
v3:

- pick up R-b from Eugenio, T-b from Albert

v2:

- pick up R-b's from Phil and Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code and
  commit message) [Stefano]

 hw/virtio/vhost-user.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index c79b6f77cdca..18e15a9bb359 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1170,7 +1170,8 @@ static int vhost_user_write_sync(struct vhost_dev *dev, 
VhostUserMsg *msg,
 
 static int vhost_set_vring(struct vhost_dev *dev,
unsigned long int request,
-   struct vhost_vring_state *ring)
+   struct vhost_vring_state *ring,
+   bool wait_for_reply)
 {
 VhostUserMsg msg = {
 .hdr.request = request,
@@ -1179,13 +1180,13 @@ static int vhost_set_vring(struct vhost_dev *dev,
 .hdr.size = sizeof(msg.payload.state),
 };
 
-return vhost_user_write(dev, , NULL, 0);
+return vhost_user_write_sync(dev, , wait_for_reply);
 }
 
 static int vhost_user_set_vring_num(struct vhost_dev *dev,
 struct vhost_vring_state *ring)
 {
-return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
+return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring, false);
 }
 
 static void vhost_user_host_notifier_free(VhostUserHostNotifier *n)
@@ -1216,7 +1217,7 @@ static void 
vhost_user_host_notifier_remove(VhostUserHostNotifier *n,
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
  struct vhost_vring_state *ring)
 {
-return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
+return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring, false);
 }
 
 static int vhost_user_set_vring_enable(struct vhost_dev *dev, int enable)
@@ -1234,7 +1235,7 @@ static int vhost_user_set_vring_enable(struct vhost_dev 
*dev, int enable)
 .num   = enable,
 };
 
-ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, );
+ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , false);
 if (ret < 0) {
 /*
  * Restoring the previous state is likely infeasible, as well as

[PATCH v3 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-02 Thread Laszlo Ersek

ompletes*. That way OVMF's VCPU cannot advance to the FUSE_INIT
submission before virtiofsd's control plane processor takes notice of the
queue being enabled.

Wait for VHOST_USER_SET_VRING_ENABLE completion by:

- setting the NEED_REPLY flag on VHOST_USER_SET_VRING_ENABLE, and waiting
  for the reply, if the VHOST_USER_PROTOCOL_F_REPLY_ACK vhost-user feature
  has been negotiated, or

- performing a separate VHOST_USER_GET_FEATURES *exchange*, which requires
  a backend response regardless of VHOST_USER_PROTOCOL_F_REPLY_ACK.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
Tested-by: Albert Esteve 
[ler...@redhat.com: work Eugenio's explanation into the commit message,
 about QEMU containing step (1.1) until step (1.2)]
Reviewed-by: Eugenio Pérez 
---

Notes:
v3:

- pick up R-b from Eugenio, T-b from Albert

- clarify commit message (also give permanent credit for the
  clarification; I feel the change is important enough) [Eugenio]

v2:

- pick up R-b from Stefano

- update virtio spec reference from 1.0 to 1.2 (also keep the 1.0 ref)
  in the commit message; re-check the quotes / section headers [Stefano]

- summarize commit message in code comment [Stefano]

 hw/virtio/vhost-user.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 18e15a9bb359..41842eb023b5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1235,7 +1235,21 @@ static int vhost_user_set_vring_enable(struct vhost_dev 
*dev, int enable)
 .num   = enable,
 };
 
-ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , false);
+/*
+ * SET_VRING_ENABLE travels from guest to QEMU to vhost-user backend /
+ * control plane thread via unix domain socket. Virtio requests travel
+ * from guest to vhost-user backend / data plane thread via eventfd.
+ * Even if the guest enables the ring first, and pushes its first 
virtio
+ * request second (conforming to the virtio spec), the data plane 
thread
+ * in the backend may see the virtio request before the control plane
+ * thread sees the queue enablement. This causes (in fact, requires) 
the
+ * data plane thread to discard the virtio request (it arrived on a
+ * seemingly disabled queue). To prevent this out-of-order delivery,
+ * don't let the guest proceed to pushing the virtio request until the
+ * backend control plane acknowledges enabling the queue -- IOW, pass
+ * wait_for_reply=true below.
+ */
+ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , true);
 if (ret < 0) {
 /*
  * Restoring the previous state is likely infeasible, as well as

[PATCH v3 1/7] vhost-user: strip superfluous whitespace

2023-10-02 Thread Laszlo Ersek

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Albert Esteve 
Reviewed-by: Eugenio Pérez 
---

Notes:
v3:

- pick up R-b from Phil and Eugenio, T-b from Albert

v2:

- pick up Stefano's R-b

 hw/virtio/vhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d422b..b4b677c1ce66 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, 
VhostUserMsg *msg,
  * operations such as configuring device memory mappings or issuing device
  * resets, which affect the whole device instead of individual VQs,
  * vhost-user messages should only be sent once.
- * 
+ *
  * Devices with multiple vhost_devs are given an associated dev->vq_index
  * so per_device requests are only sent if vq_index is 0.
  */

[PATCH v3 2/7] vhost-user: tighten "reply_supported" scope in "set_vring_addr"

2023-10-02 Thread Laszlo Ersek

In the vhost_user_set_vring_addr() function, we calculate
"reply_supported" unconditionally, even though we'll only need it if
"wait_for_reply" is also true.

Restrict the scope of "reply_supported" to the minimum.

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
Tested-by: Albert Esteve 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Eugenio Pérez 
---

Notes:
v3:

- pick up R-b from Phil and Eugenio, T-b from Albert

v2:

- pick up Stefano's R-b

 hw/virtio/vhost-user.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b4b677c1ce66..64eac317bfb2 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1331,17 +1331,18 @@ static int vhost_user_set_vring_addr(struct vhost_dev 
*dev,
 .hdr.size = sizeof(msg.payload.addr),
 };
 
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
-
 /*
  * wait for a reply if logging is enabled to make sure
  * backend is actually logging changes
  */
 bool wait_for_reply = addr->flags & (1 << VHOST_VRING_F_LOG);
 
-if (reply_supported && wait_for_reply) {
-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+if (wait_for_reply) {
+bool reply_supported = virtio_has_feature(dev->protocol_features,
+  VHOST_USER_PROTOCOL_F_REPLY_ACK);
+if (reply_supported) {
+msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+}
 }
 
 ret = vhost_user_write(dev, , NULL, 0);

[PATCH v3 4/7] vhost-user: flatten "enforce_reply" into "vhost_user_write_sync"

2023-10-02 Thread Laszlo Ersek

At this point, only "vhost_user_write_sync" calls "enforce_reply"; embed
the latter into the former.

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
Tested-by: Albert Esteve 
Reviewed-by: Eugenio Pérez 
---

Notes:
v3:

- pick up R-b from Eugenio, T-b from Albert

v2:

- pick up R-b's from Phil and Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code
  context and commit message) [Stefano]

 hw/virtio/vhost-user.c | 32 
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1476b36f0a6e..4129ba72e408 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1302,24 +1302,6 @@ static int vhost_user_get_features(struct vhost_dev 
*dev, uint64_t *features)
 return 0;
 }
 
-static int enforce_reply(struct vhost_dev *dev,
- const VhostUserMsg *msg)
-{
-uint64_t dummy;
-
-if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
-return process_message_reply(dev, msg);
-}
-
-   /*
-* We need to wait for a reply but the backend does not
-* support replies for the command we just sent.
-* Send VHOST_USER_GET_FEATURES which makes all backends
-* send a reply.
-*/
-return vhost_user_get_features(dev, );
-}
-
 /* Note: "msg->hdr.flags" may be modified. */
 static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
  bool wait_for_reply)
@@ -1340,7 +1322,19 @@ static int vhost_user_write_sync(struct vhost_dev *dev, 
VhostUserMsg *msg,
 }
 
 if (wait_for_reply) {
-return enforce_reply(dev, msg);
+uint64_t dummy;
+
+if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
+return process_message_reply(dev, msg);
+}
+
+   /*
+* We need to wait for a reply but the backend does not
+* support replies for the command we just sent.
+* Send VHOST_USER_GET_FEATURES which makes all backends
+* send a reply.
+*/
+return vhost_user_get_features(dev, );
 }
 
 return 0;

[PATCH v3 5/7] vhost-user: hoist "write_sync", "get_features", "get_u64"

2023-10-02 Thread Laszlo Ersek

In order to avoid a forward-declaration for "vhost_user_write_sync" in a
subsequent patch, hoist "vhost_user_write_sync" ->
"vhost_user_get_features" -> "vhost_user_get_u64" just above
"vhost_set_vring".

This is purely code movement -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
Tested-by: Albert Esteve 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Eugenio Pérez 
---

Notes:
v3:

- pick up R-b from Phil and Eugenio, T-b from Albert

v2:

- pick up R-b from Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code and
  commit message) [Stefano]

 hw/virtio/vhost-user.c | 170 ++--
 1 file changed, 85 insertions(+), 85 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 4129ba72e408..c79b6f77cdca 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1083,6 +1083,91 @@ static int vhost_user_set_vring_endian(struct vhost_dev 
*dev,
 return vhost_user_write(dev, , NULL, 0);
 }
 
+static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t 
*u64)
+{
+int ret;
+VhostUserMsg msg = {
+.hdr.request = request,
+.hdr.flags = VHOST_USER_VERSION,
+};
+
+if (vhost_user_per_device_request(request) && dev->vq_index != 0) {
+return 0;
+}
+
+ret = vhost_user_write(dev, , NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = vhost_user_read(dev, );
+if (ret < 0) {
+return ret;
+}
+
+if (msg.hdr.request != request) {
+error_report("Received unexpected msg type. Expected %d received %d",
+ request, msg.hdr.request);
+return -EPROTO;
+}
+
+if (msg.hdr.size != sizeof(msg.payload.u64)) {
+error_report("Received bad msg size.");
+return -EPROTO;
+}
+
+*u64 = msg.payload.u64;
+
+return 0;
+}
+
+static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
+{
+if (vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features) < 0) {
+return -EPROTO;
+}
+
+return 0;
+}
+
+/* Note: "msg->hdr.flags" may be modified. */
+static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
+ bool wait_for_reply)
+{
+int ret;
+
+if (wait_for_reply) {
+bool reply_supported = virtio_has_feature(dev->protocol_features,
+  VHOST_USER_PROTOCOL_F_REPLY_ACK);
+if (reply_supported) {
+msg->hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+}
+}
+
+ret = vhost_user_write(dev, msg, NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+if (wait_for_reply) {
+uint64_t dummy;
+
+if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
+return process_message_reply(dev, msg);
+}
+
+   /*
+* We need to wait for a reply but the backend does not
+* support replies for the command we just sent.
+* Send VHOST_USER_GET_FEATURES which makes all backends
+* send a reply.
+*/
+return vhost_user_get_features(dev, );
+}
+
+return 0;
+}
+
 static int vhost_set_vring(struct vhost_dev *dev,
unsigned long int request,
struct vhost_vring_state *ring)
@@ -1255,91 +1340,6 @@ static int vhost_user_set_vring_err(struct vhost_dev 
*dev,
 return vhost_set_vring_file(dev, VHOST_USER_SET_VRING_ERR, file);
 }
 
-static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t 
*u64)
-{
-int ret;
-VhostUserMsg msg = {
-.hdr.request = request,
-.hdr.flags = VHOST_USER_VERSION,
-};
-
-if (vhost_user_per_device_request(request) && dev->vq_index != 0) {
-return 0;
-}
-
-ret = vhost_user_write(dev, , NULL, 0);
-if (ret < 0) {
-return ret;
-}
-
-ret = vhost_user_read(dev, );
-if (ret < 0) {
-return ret;
-}
-
-if (msg.hdr.request != request) {
-error_report("Received unexpected msg type. Expected %d received %d",
- request, msg.hdr.request);
-return -EPROTO;
-}
-
-if (msg.hdr.size != sizeof(msg.payload.u64)) {
-error_report("Received bad msg size.");
-return -EPROTO;
-}
-
-*u64 = msg.payload.u64;
-
-return 0;
-}
-
-static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
-{
-if (vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features) < 0) {
-return -EPROTO;
-}
-
-return 0;
-}

Re: [PATCH v2 1/7] vhost-user: strip superfluous whitespace

2023-10-02 Thread Laszlo Ersek

On 9/6/23 09:12, Albert Esteve wrote:
> 
> 
> On Thu, Aug 31, 2023 at 9:14 AM Laszlo Ersek  <mailto:ler...@redhat.com>> wrote:
> 
> On 8/30/23 15:40, Laszlo Ersek wrote:
> > Cc: "Michael S. Tsirkin" mailto:m...@redhat.com>>
> (supporter:vhost)
> > Cc: Eugenio Perez Martin  <mailto:epere...@redhat.com>>
> > Cc: German Maglione  <mailto:gmagli...@redhat.com>>
> > Cc: Liu Jiang  <mailto:ge...@linux.alibaba.com>>
> > Cc: Sergio Lopez Pascual mailto:s...@redhat.com>>
> > Cc: Stefano Garzarella  <mailto:sgarz...@redhat.com>>
> > Signed-off-by: Laszlo Ersek  <mailto:ler...@redhat.com>>
> > Reviewed-by: Stefano Garzarella  <mailto:sgarz...@redhat.com>>
> > ---
> >
> > Notes:
> >     v2:
> >     
> >     - pick up Stefano's R-b
> >
> >  hw/virtio/vhost-user.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> This has been
> 
> Reviewed-by: Philippe Mathieu-Daudé  <mailto:phi...@linaro.org>>
> 
> under the (identical) v1 posting:
> 
> 
> http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org 
> <http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org>
> 
> Thanks, Phil! (and sorry that I posted v2 too quickly -- I forgot that
> sometimes reviewers split a review over multiple days.)
> 
> Laszlo
> 
> >
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 8dcf049d422b..b4b677c1ce66 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev
> *dev, VhostUserMsg *msg,
> >       * operations such as configuring device memory mappings or
> issuing device
> >       * resets, which affect the whole device instead of
> individual VQs,
> >       * vhost-user messages should only be sent once.
> > -     *
> > +     *
> >       * Devices with multiple vhost_devs are given an associated
> dev->vq_index
> >       * so per_device requests are only sent if vq_index is 0.
> >       */
> >
> 
> 
> Thanks for the series!
> I had a timeout problem with a virtio device I am developing, and I was
> not sure yet what was going on.
> Your description of the problem seemed to fit mine, in my case the
> driver sent a command through the data plane
> right after the feature negotiation that reached the backend too soon.
> Adding delays alleviated the issue, so it
> already hinted me to a race condition.
> 
> I tested using this patch series and according to my experiments, it
> really lowers the chances to get the deadlock.
> Sadly, I do still get the issue sometimes, though (not frequently)...
> However, I think probably the solution comes not
> from the Qemu side, but from the rust-vmm vhost-user-backend crate. I am
> looking for that solution on my side.
> 
> But that does not invalidate this patch, which I think is a necessary
> improvement, and in my tests it really
> helps a lot with the described issue. Therefore:
> 
> Tested-by: Albert Esteve mailto:aest...@redhat.com>>

Thanks again -- I'm picking this up for the whole series.

Laszlo

> 
> BR,
> Albert

Re: [PATCH v2 4/5] ramfb: make migration conditional

2023-10-02 Thread Laszlo Ersek

On 10/2/23 21:26, Alex Williamson wrote:
> On Mon, 2 Oct 2023 20:24:11 +0200
> Laszlo Ersek  wrote:
> 
>> On 10/2/23 16:41, Alex Williamson wrote:
>>> On Mon, 2 Oct 2023 15:38:10 +0200
>>> Cédric Le Goater  wrote:
>>>   
>>>> On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:  
>>>>> From: Marc-André Lureau 
>>>>>
>>>>> RAMFB migration was unsupported until now, let's make it conditional.
>>>>> The following patch will prevent machines <= 8.1 to migrate it.
>>>>>
>>>>> Signed-off-by: Marc-André Lureau 
>>>> Maybe localize the new 'ramfb_migrate' attribute close to 'enable_ramfb'
>>>> in VFIOPCIDevice. Anyhow,  
>>>
>>> Shouldn't this actually be tied to whether the device is migratable
>>> (which for GVT-g - the only ramfb user afaik - it's not)?  What does it
>>> mean to have a ramfb-migrate=true property on a device that doesn't
>>> support migration, or false on a device that does support migration.  I
>>> don't understand why this is a user controllable property.  Thanks,  
>>
>> The comments in <https://bugzilla.redhat.com/show_bug.cgi?id=1859424>
>> (which are unfortunately not public :/ ) suggest that ramfb migration
>> was simply forgotten when vGPU migration was implemented. So, "now
>> that vGPU migration is done", this should be added.
>>
>> Comment 8 suggests that the following domain XML snippet
>>
>> > model='vfio-pci' display='on' ramfb='on'> 
>> 
>>   
>>   
>>   > function='0x0'/> 
>>
>> is migratable, but the ramfb device malfunctions on the destination
>> host.
>>
>> There's also a huge QEMU cmdline in comment#0 of the bug; I've not
>> tried to read that.
>>
>> AIUI BTW the property is not for the user to control, it's just a
>> compat knob for versioned machine types. AIUI those are usually
>> implemented with such (user-visible / -tweakable) device properties.
> 
> If it's not for user control it's unfortunate that we expose it to the
> user at all, but should it at least use the "x-" prefix to indicate that
> it's not intended to be an API?

I *think* it was your commit db32d0f43839 ("vfio/pci: Add option to
disable GeForce quirks", 2018-02-06) that hda introduced me to the "x-"
prefixed properties!

For some reason though, machine type compat knobs are never named like
that, AFAIR.

> It's still odd to think that we can
> have scenarios of a non-migratable vfio device registering a migratable
> ramfb, and vice versa, but I suppose in the end it doesn't matter.

I do think it matters! For one, if migration is not possible with
vfio-pci-nohotplug, then how can QE (or anyone else) *test* the patch
(i.e. that it makes a difference)? In that case, the ramfb_setup() call
from vfio-pci-nohotplug should just open-code "false" for the
"migratable" parameter.

But, more importantly, I think either we're missing something about RHBZ
1859424, or that use case is just plain wrong. Gerd, any comments perhaps?

Migration certainly makes sense for ramfb-standalone though.

Laszlo

Re: [PATCH v2 4/5] ramfb: make migration conditional

2023-10-02 Thread Laszlo Ersek

On 10/2/23 16:41, Alex Williamson wrote:
> On Mon, 2 Oct 2023 15:38:10 +0200
> Cédric Le Goater  wrote:
> 
>> On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
>>> From: Marc-André Lureau 
>>>
>>> RAMFB migration was unsupported until now, let's make it conditional.
>>> The following patch will prevent machines <= 8.1 to migrate it.
>>>
>>> Signed-off-by: Marc-André Lureau   
>> Maybe localize the new 'ramfb_migrate' attribute close to 'enable_ramfb'
>> in VFIOPCIDevice. Anyhow,
> 
> Shouldn't this actually be tied to whether the device is migratable
> (which for GVT-g - the only ramfb user afaik - it's not)?  What does it
> mean to have a ramfb-migrate=true property on a device that doesn't
> support migration, or false on a device that does support migration.  I
> don't understand why this is a user controllable property.  Thanks,

The comments in  (which 
are unfortunately not public :/ ) suggest that ramfb migration was simply 
forgotten when vGPU migration was implemented. So, "now that vGPU migration is 
done", this should be added.

Comment 8 suggests that the following domain XML snippet


  

  
  
  


is migratable, but the ramfb device malfunctions on the destination host.

There's also a huge QEMU cmdline in comment#0 of the bug; I've not tried to 
read that.

AIUI BTW the property is not for the user to control, it's just a compat knob 
for versioned machine types. AIUI those are usually implemented with such 
(user-visible / -tweakable) device properties.

Laszlo

> 
> Alex
> 
>>> ---
>>>   hw/vfio/pci.h | 1 +
>>>   include/hw/display/ramfb.h| 2 +-
>>>   hw/display/ramfb-standalone.c | 8 +++-
>>>   hw/display/ramfb.c| 6 --
>>>   hw/vfio/display.c | 4 ++--
>>>   hw/vfio/pci.c | 1 +
>>>   stubs/ramfb.c | 2 +-
>>>   7 files changed, 17 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>>> index 2d836093a8..671cc78912 100644
>>> --- a/hw/vfio/pci.h
>>> +++ b/hw/vfio/pci.h
>>> @@ -156,6 +156,7 @@ struct VFIOPCIDevice {
>>>   OnOffAuto display;
>>>   uint32_t display_xres;
>>>   uint32_t display_yres;
>>> +bool ramfb_migrate;
>>>   int32_t bootindex;
>>>   uint32_t igd_gms;
>>>   OffAutoPCIBAR msix_relo;
>>> diff --git a/include/hw/display/ramfb.h b/include/hw/display/ramfb.h
>>> index b33a2c467b..40063b62bd 100644
>>> --- a/include/hw/display/ramfb.h
>>> +++ b/include/hw/display/ramfb.h
>>> @@ -4,7 +4,7 @@
>>>   /* ramfb.c */
>>>   typedef struct RAMFBState RAMFBState;
>>>   void ramfb_display_update(QemuConsole *con, RAMFBState *s);
>>> -RAMFBState *ramfb_setup(Error **errp);
>>> +RAMFBState *ramfb_setup(bool migrate, Error **errp);
>>>   
>>>   /* ramfb-standalone.c */
>>>   #define TYPE_RAMFB_DEVICE "ramfb"
>>> diff --git a/hw/display/ramfb-standalone.c b/hw/display/ramfb-standalone.c
>>> index 8c0094397f..6bbd69ccdf 100644
>>> --- a/hw/display/ramfb-standalone.c
>>> +++ b/hw/display/ramfb-standalone.c
>>> @@ -15,6 +15,7 @@ struct RAMFBStandaloneState {
>>>   SysBusDevice parent_obj;
>>>   QemuConsole *con;
>>>   RAMFBState *state;
>>> +bool migrate;
>>>   };
>>>   
>>>   static void display_update_wrapper(void *dev)
>>> @@ -37,9 +38,13 @@ static void ramfb_realizefn(DeviceState *dev, Error 
>>> **errp)
>>>   RAMFBStandaloneState *ramfb = RAMFB(dev);
>>>   
>>>   ramfb->con = graphic_console_init(dev, 0, _ops, dev);
>>> -ramfb->state = ramfb_setup(errp);
>>> +ramfb->state = ramfb_setup(ramfb->migrate, errp);
>>>   }
>>>   
>>> +static Property ramfb_properties[] = {
>>> +DEFINE_PROP_BOOL("migrate", RAMFBStandaloneState, migrate,  true),
>>> +DEFINE_PROP_END_OF_LIST(),
>>> +};
>>>   static void ramfb_class_initfn(ObjectClass *klass, void *data)
>>>   {
>>>   DeviceClass *dc = DEVICE_CLASS(klass);
>>> @@ -48,6 +53,7 @@ static void ramfb_class_initfn(ObjectClass *klass, void 
>>> *data)
>>>   dc->realize = ramfb_realizefn;
>>>   dc->desc = "ram framebuffer standalone device";
>>>   dc->user_creatable = true;
>>> +device_class_set_props(dc, ramfb_properties);
>>>   }
>>>   
>>>   static const TypeInfo ramfb_info = {
>>> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
>>> index 47d653..73e08d605f 100644
>>> --- a/hw/display/ramfb.c
>>> +++ b/hw/display/ramfb.c
>>> @@ -135,7 +135,7 @@ static const VMStateDescription vmstate_ramfb = {
>>>   }
>>>   };
>>>   
>>> -RAMFBState *ramfb_setup(Error **errp)
>>> +RAMFBState *ramfb_setup(bool migrate, Error **errp)
>>>   {
>>>   FWCfgState *fw_cfg = fw_cfg_find();
>>>   RAMFBState *s;
>>> @@ -147,7 +147,9 @@ RAMFBState *ramfb_setup(Error **errp)
>>>   
>>>   s = g_new0(RAMFBState, 1);
>>>   
>>> -vmstate_register(NULL, 0, _ramfb, s);
>>> +if (migrate) {
>>> +vmstate_register(NULL, 0, _ramfb, s);
>>> +}
>>>

Re: [PATCH] hw/display/ramfb: plug slight guest-triggerable leak on mode setting

2023-10-02 Thread Laszlo Ersek

On 10/1/23 00:14, Laszlo Ersek wrote:
> On 9/29/23 13:17, Marc-André Lureau wrote:
>> Hi
>>
>> On Wed, Sep 27, 2023 at 7:46 PM Laszlo Ersek  wrote:
>>>
>>> On 9/19/23 15:19, Laszlo Ersek wrote:
>>>> The fw_cfg DMA write callback in ramfb prepares a new display surface in
>>>> QEMU; this new surface is put to use ("swapped in") upon the next display
>>>> update. At that time, the old surface (if any) is released.
>>>>
>>>> If the guest triggers the fw_cfg DMA write callback at least twice between
>>>> two adjacent display updates, then the second callback (and further such
>>>> callbacks) will leak the previously prepared (but not yet swapped in)
>>>> display surface.
>>>>
>>>> The issue can be shown by:
>>>>
>>>> (1) starting QEMU with "-trace displaysurface_free", and
>>>>
>>>> (2) running the following program in the guest UEFI shell:
>>>>
>>>>> #include// ShellAppMain()
>>>>> #include  // gBS
>>>>> #include   // 
>>>>> EFI_GRAPHICS_OUTPUT_PROTOCOL
>>>>>
>>>>> INTN
>>>>> EFIAPI
>>>>> ShellAppMain (
>>>>>   IN UINTN   Argc,
>>>>>   IN CHAR16  **Argv
>>>>>   )
>>>>> {
>>>>>   EFI_STATUSStatus;
>>>>>   VOID  *Interface;
>>>>>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>>>>>   UINT32Mode;
>>>>>
>>>>>   Status = gBS->LocateProtocol (
>>>>>   ,
>>>>>   NULL,
>>>>>   
>>>>>   );
>>>>>   if (EFI_ERROR (Status)) {
>>>>> return 1;
>>>>>   }
>>>>>
>>>>>   Gop = Interface;
>>>>>
>>>>>   Mode = 1;
>>>>>   for ( ; ;) {
>>>>> Status = Gop->SetMode (Gop, Mode);
>>>>> if (EFI_ERROR (Status)) {
>>>>>   break;
>>>>> }
>>>>>
>>>>> Mode = 1 - Mode;
>>>>>   }
>>>>>
>>>>>   return 1;
>>>>> }
>>>>
>>>> The symptom is then that:
>>>>
>>>> - only one trace message appears periodically,
>>>>
>>>> - the time between adjacent messages keeps increasing -- implying that
>>>>   some list structure (containing the leaked resources) keeps growing,
>>>>
>>>> - the "surface" pointer is ever different.
>>>>
>>>>> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0
>>>>> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10
>>>>> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0
>>>>> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0
>>>>> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80
>>>>> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00
>>>>> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0
>>>>> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50
>>>>> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50
>>>>> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0
>>>>> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0
>>>>> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0
>>>>> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00
>>>>> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910
>>>>> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20
>>>>> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40
>>>>> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020
>>>>> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160
>>>>> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0
>>>>> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870
>>>>> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960
>>>>> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140
>

Re: [PATCH v2 1/5] hw: remove needless includes

2023-10-02 Thread Laszlo Ersek

On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> The include list is large, make it smaller.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/core/machine.c | 10 --
>  1 file changed, 10 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index cb38b8cf4c..68cb556197 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -11,32 +11,22 @@
>   */
>  
>  #include "qemu/osdep.h"
> -#include "qemu/option.h"
>  #include "qemu/accel.h"
>  #include "sysemu/replay.h"
> -#include "qemu/units.h"
>  #include "hw/boards.h"
>  #include "hw/loader.h"
>  #include "qapi/error.h"
> -#include "qapi/qapi-visit-common.h"
>  #include "qapi/qapi-visit-machine.h"
> -#include "qapi/visitor.h"
>  #include "qom/object_interfaces.h"
> -#include "hw/sysbus.h"
>  #include "sysemu/cpus.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/reset.h"
>  #include "sysemu/runstate.h"
> -#include "sysemu/numa.h"
>  #include "sysemu/xen.h"
> -#include "qemu/error-report.h"
>  #include "sysemu/qtest.h"
> -#include "hw/pci/pci.h"
>  #include "hw/mem/nvdimm.h"
>  #include "migration/global_state.h"
> -#include "migration/vmstate.h"
>  #include "exec/confidential-guest-support.h"
> -#include "hw/virtio/virtio.h"
>  #include "hw/virtio/virtio-pci.h"
>  #include "hw/virtio/virtio-net.h"
>  

Acked-by: Laszlo Ersek

Re: [PATCH v2 2/5] pc: remove needless includes

2023-10-02 Thread Laszlo Ersek

On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> The include list is gigantic, make it smaller.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/i386/pc.c | 41 -
>  1 file changed, 41 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 5d399b6247..c376c5032d 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -24,79 +24,38 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/units.h"
> -#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/char/serial.h"
>  #include "hw/char/parallel.h"
> -#include "hw/i386/topology.h"
>  #include "hw/i386/fw_cfg.h"
>  #include "hw/i386/vmport.h"
>  #include "sysemu/cpus.h"
> -#include "hw/block/fdc.h"
>  #include "hw/ide/internal.h"
> -#include "hw/ide/isa.h"
> -#include "hw/pci/pci.h"
> -#include "hw/pci/pci_bus.h"
> -#include "hw/pci-bridge/pci_expander_bridge.h"
> -#include "hw/nvram/fw_cfg.h"
>  #include "hw/timer/hpet.h"
> -#include "hw/firmware/smbios.h"
>  #include "hw/loader.h"
> -#include "elf.h"
> -#include "migration/vmstate.h"
> -#include "multiboot.h"
>  #include "hw/rtc/mc146818rtc.h"
>  #include "hw/intc/i8259.h"
> -#include "hw/intc/ioapic.h"
>  #include "hw/timer/i8254.h"
>  #include "hw/input/i8042.h"
> -#include "hw/irq.h"
>  #include "hw/audio/pcspk.h"
> -#include "hw/pci/msi.h"
> -#include "hw/sysbus.h"
>  #include "sysemu/sysemu.h"
> -#include "sysemu/tcg.h"
> -#include "sysemu/numa.h"
> -#include "sysemu/kvm.h"
>  #include "sysemu/xen.h"
>  #include "sysemu/reset.h"
> -#include "sysemu/runstate.h"
>  #include "kvm/kvm_i386.h"
> -#include "hw/xen/xen.h"
> -#include "hw/xen/start_info.h"
> -#include "ui/qemu-spice.h"
> -#include "exec/memory.h"
> -#include "qemu/bitmap.h"
> -#include "qemu/config-file.h"
> -#include "qemu/error-report.h"
> -#include "qemu/option.h"
> -#include "qemu/cutils.h"
> -#include "hw/acpi/acpi.h"
>  #include "hw/acpi/cpu_hotplug.h"
>  #include "acpi-build.h"
> -#include "hw/mem/pc-dimm.h"
>  #include "hw/mem/nvdimm.h"
> -#include "hw/cxl/cxl.h"
>  #include "hw/cxl/cxl_host.h"
> -#include "qapi/error.h"
> -#include "qapi/qapi-visit-common.h"
> -#include "qapi/qapi-visit-machine.h"
> -#include "qapi/visitor.h"
> -#include "hw/core/cpu.h"
>  #include "hw/usb.h"
>  #include "hw/i386/intel_iommu.h"
>  #include "hw/net/ne2000-isa.h"
> -#include "standard-headers/asm-x86/bootparam.h"
>  #include "hw/virtio/virtio-iommu.h"
>  #include "hw/virtio/virtio-md-pci.h"
>  #include "hw/i386/kvm/xen_overlay.h"
>  #include "hw/i386/kvm/xen_evtchn.h"
>  #include "hw/i386/kvm/xen_gnttab.h"
>  #include "hw/i386/kvm/xen_xenstore.h"
> -#include "sysemu/replay.h"
> -#include "target/i386/cpu.h"
>  #include "e820_memory_layout.h"
> -#include "fw_cfg.h"
>  #include "trace.h"
>  #include CONFIG_DEVICES
>  
Acked-by: Laszlo Ersek

Re: [PATCH v2 5/5] hw: turn off ramfb migration for machines <= 8.1

2023-10-02 Thread Laszlo Ersek

On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> For compatibility reasons.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/core/machine.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 68cb556197..2fa7647422 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -30,7 +30,10 @@
>  #include "hw/virtio/virtio-pci.h"
>  #include "hw/virtio/virtio-net.h"
>  
> -GlobalProperty hw_compat_8_1[] = {};
> +GlobalProperty hw_compat_8_1[] = {
> +{ "ramfb", "migrate", "off" },
> +{ "vfio-pci-nohotplug", "ramfb-migrate", "off" }
> +};
>  const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>  
>  GlobalProperty hw_compat_8_0[] = {

In the other discussion, you mentioned the concrete reason for this -- I
think if we don't do this, then the ramfb vmstate blocks backward
migration? Can you document the reason here explicitly (commit message,
I mean, doesn't have to be a code comment)?

Thanks!
Laszlo

Re: [PATCH v2 4/5] ramfb: make migration conditional

2023-10-02 Thread Laszlo Ersek

On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> RAMFB migration was unsupported until now, let's make it conditional.
> The following patch will prevent machines <= 8.1 to migrate it.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/vfio/pci.h | 1 +
>  include/hw/display/ramfb.h| 2 +-
>  hw/display/ramfb-standalone.c | 8 +++-
>  hw/display/ramfb.c| 6 --
>  hw/vfio/display.c | 4 ++--
>  hw/vfio/pci.c | 1 +
>  stubs/ramfb.c | 2 +-
>  7 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 2d836093a8..671cc78912 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -156,6 +156,7 @@ struct VFIOPCIDevice {
>  OnOffAuto display;
>  uint32_t display_xres;
>  uint32_t display_yres;
> +bool ramfb_migrate;
>  int32_t bootindex;
>  uint32_t igd_gms;
>  OffAutoPCIBAR msix_relo;
> diff --git a/include/hw/display/ramfb.h b/include/hw/display/ramfb.h
> index b33a2c467b..40063b62bd 100644
> --- a/include/hw/display/ramfb.h
> +++ b/include/hw/display/ramfb.h
> @@ -4,7 +4,7 @@
>  /* ramfb.c */
>  typedef struct RAMFBState RAMFBState;
>  void ramfb_display_update(QemuConsole *con, RAMFBState *s);
> -RAMFBState *ramfb_setup(Error **errp);
> +RAMFBState *ramfb_setup(bool migrate, Error **errp);
>  
>  /* ramfb-standalone.c */
>  #define TYPE_RAMFB_DEVICE "ramfb"
> diff --git a/hw/display/ramfb-standalone.c b/hw/display/ramfb-standalone.c
> index 8c0094397f..6bbd69ccdf 100644
> --- a/hw/display/ramfb-standalone.c
> +++ b/hw/display/ramfb-standalone.c
> @@ -15,6 +15,7 @@ struct RAMFBStandaloneState {
>  SysBusDevice parent_obj;
>  QemuConsole *con;
>  RAMFBState *state;
> +bool migrate;
>  };
>  
>  static void display_update_wrapper(void *dev)
> @@ -37,9 +38,13 @@ static void ramfb_realizefn(DeviceState *dev, Error **errp)
>  RAMFBStandaloneState *ramfb = RAMFB(dev);
>  
>  ramfb->con = graphic_console_init(dev, 0, _ops, dev);
> -ramfb->state = ramfb_setup(errp);
> +ramfb->state = ramfb_setup(ramfb->migrate, errp);
>  }
>  
> +static Property ramfb_properties[] = {
> +DEFINE_PROP_BOOL("migrate", RAMFBStandaloneState, migrate,  true),
> +DEFINE_PROP_END_OF_LIST(),
> +};
>  static void ramfb_class_initfn(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -48,6 +53,7 @@ static void ramfb_class_initfn(ObjectClass *klass, void 
> *data)
>  dc->realize = ramfb_realizefn;
>  dc->desc = "ram framebuffer standalone device";
>  dc->user_creatable = true;
> +device_class_set_props(dc, ramfb_properties);
>  }
>  
>  static const TypeInfo ramfb_info = {
> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
> index 47d653..73e08d605f 100644
> --- a/hw/display/ramfb.c
> +++ b/hw/display/ramfb.c
> @@ -135,7 +135,7 @@ static const VMStateDescription vmstate_ramfb = {
>  }
>  };
>  
> -RAMFBState *ramfb_setup(Error **errp)
> +RAMFBState *ramfb_setup(bool migrate, Error **errp)
>  {
>  FWCfgState *fw_cfg = fw_cfg_find();
>  RAMFBState *s;
> @@ -147,7 +147,9 @@ RAMFBState *ramfb_setup(Error **errp)
>  
>  s = g_new0(RAMFBState, 1);
>  
> -vmstate_register(NULL, 0, _ramfb, s);
> +if (migrate) {
> +vmstate_register(NULL, 0, _ramfb, s);
> +}
>  rom_add_vga("vgabios-ramfb.bin");
>  fw_cfg_add_file_callback(fw_cfg, "etc/ramfb",
>   NULL, ramfb_fw_cfg_write, s,
> diff --git a/hw/vfio/display.c b/hw/vfio/display.c
> index bec864f482..3f6b251ccd 100644
> --- a/hw/vfio/display.c
> +++ b/hw/vfio/display.c
> @@ -356,7 +356,7 @@ static int vfio_display_dmabuf_init(VFIOPCIDevice *vdev, 
> Error **errp)
>_display_dmabuf_ops,
>vdev);
>  if (vdev->enable_ramfb) {
> -vdev->dpy->ramfb = ramfb_setup(errp);
> +vdev->dpy->ramfb = ramfb_setup(vdev->ramfb_migrate, errp);
>  }
>  vfio_display_edid_init(vdev);
>  return 0;
> @@ -483,7 +483,7 @@ static int vfio_display_region_init(VFIOPCIDevice *vdev, 
> Error **errp)
>_display_region_ops,
>vdev);
>  if (vdev->enable_ramfb) {
> -vdev->dpy->ramfb = ramfb_setup(errp);
> +vdev->dpy->ramfb = ramfb_setup(vdev->ramfb_migrate, errp);
>  }
>  return 0;
>  }
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 3b2ca3c24c..6575b8f32d 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3484,6 +3484,7 @@ static const TypeInfo vfio_pci_dev_info = {
>  
>  static Property vfio_pci_dev_nohotplug_properties[] = {
>  DEFINE_PROP_BOOL("ramfb", VFIOPCIDevice, enable_ramfb, false),
> +DEFINE_PROP_BOOL("ramfb-migrate", VFIOPCIDevice, ramfb_migrate,  true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/stubs/ramfb.c

Re: [PATCH v2 3/5] ramfb: implement migration support

2023-10-02 Thread Laszlo Ersek

On 10/2/23 13:11, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Implementing RAMFB migration is quite straightforward. One caveat is to
> treat the whole RAMFBCfg as a blob, since that's what is exposed to the
> guest directly. This avoid having to fiddle with endianness issues if we
> were to migrate fields individually as integers.
> 
> The following patches turns the migration only on machine >= 8.2.
> 
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=1859424
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/display/ramfb.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
> index 79b9754a58..47d653 100644
> --- a/hw/display/ramfb.c
> +++ b/hw/display/ramfb.c
> @@ -12,6 +12,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "migration/vmstate.h"
>  #include "qapi/error.h"
>  #include "hw/loader.h"
>  #include "hw/display/ramfb.h"
> @@ -28,6 +29,8 @@ struct QEMU_PACKED RAMFBCfg {
>  uint32_t stride;
>  };
>  
> +typedef struct RAMFBCfg RAMFBCfg;
> +
>  struct RAMFBState {
>  DisplaySurface *ds;
>  uint32_t width, height;
> @@ -115,6 +118,23 @@ void ramfb_display_update(QemuConsole *con, RAMFBState 
> *s)
>  dpy_gfx_update_full(con);
>  }
>  
> +static int ramfb_post_load(void *opaque, int version_id)
> +{
> +ramfb_fw_cfg_write(opaque, 0, 0);
> +return 0;
> +}
> +
> +static const VMStateDescription vmstate_ramfb = {
> +.name = "ramfb",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.post_load = ramfb_post_load,
> +.fields = (VMStateField[]) {
> +VMSTATE_BUFFER_UNSAFE(cfg, RAMFBState, 0, sizeof(RAMFBCfg)),

I just couldn't figure out, from code review, why VMSTATE_BUFFER would
not work here. So I applied your patches, changed this like follows:

diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
index 077fd2fa2c31..04bf01059994 100644
--- a/hw/display/ramfb.c
+++ b/hw/display/ramfb.c
@@ -131,7 +131,7 @@ static const VMStateDescription vmstate_ramfb = {
 .minimum_version_id = 1,
 .post_load = ramfb_post_load,
 .fields = (VMStateField[]) {
-VMSTATE_BUFFER_UNSAFE(cfg, RAMFBState, 0, sizeof(RAMFBCfg)),
+VMSTATE_BUFFER(cfg, RAMFBState),
 VMSTATE_END_OF_LIST()
 }
 };

and tried to build it.

I got a wall of error messages about cryptic macro nesting.

I'm quite annoyed that nearly none of the VMSTATE_ macros are
documented; even git-blame tends to be unhelpful. Ultimately though,
there was one useful bit in the wall of error messages: the error was
related to "type_check_array".

Upon reviewing type_check_array, my impression is that VMSTATE_BUFFER is
suitable only for *array fields*. I randomly picked an existent example,
namely

static const VMStateDescription bulk_in_vmstate = {
.name = "CCID BulkIn state",
.version_id = 1,
.minimum_version_id = 1,
.fields = (VMStateField[]) {
VMSTATE_BUFFER(data, BulkIn),
VMSTATE_UINT32(len, BulkIn),
VMSTATE_UINT32(pos, BulkIn),
VMSTATE_END_OF_LIST()
}
};

from "hw/usb/dev-smartcard-reader.c", and sure enough, "data" is an
array field in BulkIn:

typedef struct BulkIn {
uint8_t  data[BULK_IN_BUF_SIZE];
uint32_t len;
uint32_t pos;
} BulkIn;

So that's the reason.

Again, annoying lack of documentation, but I agree that your
VMSTATE_BUFFER_UNSAFE application is judicious.

Thanks!
Laszlo



> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  RAMFBState *ramfb_setup(Error **errp)
>  {
>  FWCfgState *fw_cfg = fw_cfg_find();
> @@ -127,6 +147,7 @@ RAMFBState *ramfb_setup(Error **errp)
>  
>  s = g_new0(RAMFBState, 1);
>  
> +vmstate_register(NULL, 0, _ramfb, s);
>  rom_add_vga("vgabios-ramfb.bin");
>  fw_cfg_add_file_callback(fw_cfg, "etc/ramfb",
>   NULL, ramfb_fw_cfg_write, s,

Re: [PATCH v2 3/5] ramfb: implement migration support

2023-10-02 Thread Laszlo Ersek

On 10/2/23 14:01, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Oct 2, 2023 at 3:12 PM  wrote:
>>
>> From: Marc-André Lureau 
>>
>> Implementing RAMFB migration is quite straightforward. One caveat is to
>> treat the whole RAMFBCfg as a blob, since that's what is exposed to the
>> guest directly. This avoid having to fiddle with endianness issues if we
>> were to migrate fields individually as integers.
>>
>> The following patches turns the migration only on machine >= 8.2.
>>
>> Fixes:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1859424
>>
>> Signed-off-by: Marc-André Lureau 
>> ---
>>  hw/display/ramfb.c | 21 +
>>  1 file changed, 21 insertions(+)
>>
>> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
>> index 79b9754a58..47d653 100644
>> --- a/hw/display/ramfb.c
>> +++ b/hw/display/ramfb.c
>> @@ -12,6 +12,7 @@
>>   */
>>
>>  #include "qemu/osdep.h"
>> +#include "migration/vmstate.h"
>>  #include "qapi/error.h"
>>  #include "hw/loader.h"
>>  #include "hw/display/ramfb.h"
>> @@ -28,6 +29,8 @@ struct QEMU_PACKED RAMFBCfg {
>>  uint32_t stride;
>>  };
>>
>> +typedef struct RAMFBCfg RAMFBCfg;
>> +
>>  struct RAMFBState {
>>  DisplaySurface *ds;
>>  uint32_t width, height;
>> @@ -115,6 +118,23 @@ void ramfb_display_update(QemuConsole *con, RAMFBState 
>> *s)
>>  dpy_gfx_update_full(con);
>>  }
>>
>> +static int ramfb_post_load(void *opaque, int version_id)
>> +{
>> +ramfb_fw_cfg_write(opaque, 0, 0);
>> +return 0;
>> +}
>> +
>> +static const VMStateDescription vmstate_ramfb = {
>> +.name = "ramfb",
>> +.version_id = 1,
>> +.minimum_version_id = 1,
>> +.post_load = ramfb_post_load,
>> +.fields = (VMStateField[]) {
>> +VMSTATE_BUFFER_UNSAFE(cfg, RAMFBState, 0, sizeof(RAMFBCfg)),
>> +VMSTATE_END_OF_LIST()
>> +}
>> +};
>> +
>>  RAMFBState *ramfb_setup(Error **errp)
>>  {
>>  FWCfgState *fw_cfg = fw_cfg_find();
>> @@ -127,6 +147,7 @@ RAMFBState *ramfb_setup(Error **errp)
>>
>>  s = g_new0(RAMFBState, 1);
>>
>> +vmstate_register(NULL, 0, _ramfb, s);
> 
> wip:
> I am going to make it attached to the actual device.

I'm really curious about that -- I think it's going to be better, and
it'll teach me stuff about migration!

Laszlo

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-02 Thread Laszlo Ersek

On 10/2/23 08:57, Michael S. Tsirkin wrote:
> On Mon, Oct 02, 2023 at 03:56:03AM +0200, Laszlo Ersek wrote:
>> On 10/1/23 21:25, Michael S. Tsirkin wrote:
>>> Not this actually - v2 of this.
>>
>> Thank you, but:
>>
>> - Stefan's question should be answered still IMO (although if you pick
>> up this series, then that could be interpreted as "QEMU bug, not spec bug")
>>
>> - I was supposed to update the commit message on 7/7 in v3; I didn't
>> want to do it before Stefan's question was answered
>>
>> Thanks!
>> Laszlo
> 
> OK I just answered. I am fine with the patch but I think
> virtiofsd should be fixed too.

Thanks. I'll prepare a v3 with an updated commit message on 7/7 (plus
picking up any new feedback tags).

Cheers
Laszlo

Re: [PATCH] hw/display/ramfb: plug slight guest-triggerable leak on mode setting

2023-10-01 Thread Laszlo Ersek

On 10/1/23 18:07, Marc-André Lureau wrote:
> Hi Laszlo
> 
> On Sun, Oct 1, 2023 at 4:20 AM Laszlo Ersek  wrote:
>>
>> On 10/1/23 00:14, Laszlo Ersek wrote:
>>> On 9/29/23 13:17, Marc-André Lureau wrote:
> [..]
>>>> fwiw, my migration support patch is still unreviewed:
>>>> https://patchew.org/QEMU/20230920082651.3349712-1-marcandre.lur...@redhat.com/
>>>>
>>>
>>> I don't have a copy of that patch (not subscribed, sorry...), but how
>>> simply you did it surprises me. I did expect to simulate an fw_cfg write
>>> in post_load, but I thought we'd approach the device (for the sake of
>>> including it in the migration stream) from the higher level device's
>>> vmstate descriptors (dc->vmsd) that set up / depend on ramfb in the
>>> first place. I didn't know that raw vmstate_register() was still accepted.
>>>
>>> If it is, then, for that patch (with Gerd's comment addressed):
>>>
>>> Acked-by: Laszlo Ersek 
>>
>> I think I may have found one issue with that patch.
>>
>> The fields that we save into the migration stream are integer members of
>> the RAMFBCfg structure that lives directly in the fw_cfg file. The ramfb
>> device specifies those fields for the guest as big endian. This means
>> that when ramfb_fw_cfg_write() runs, triggered by the guest, then on big
>> endian hosts, be32_to_cpu() and be64_to_cpu() will be no-ops, as the
>> integer representation inside the fw_cfg file will match the host CPU at
>> once. And on little endian hosts, these functions will byte swap. In
>> both cases, ramfb_create_display_surface() will have to be called with
>> identical host-side integers. This means that *before* the be32_to_cpu()
>> and be64_to_cpu() calls, the host side *values* read out from the fw_cfg
>> file members *differ* between big endian and little endian hosts.
>>
>> And the problem is that we write precisely those values into the
>> migration stream, via "vmstate_ramfb_cfg". The migration stream
>> represents integers in big endian regardless of host endianness, but the
>> question is the *values* that we encode in BE for the stream. And the
>> values (from fw_cg) will differ between little endian and big endian hosts.
>>
>> Thus, I think we should just use
>>
>>   VMSTATE_BUFFER(cfg, RAMFBState)
>>
>> in "vmstate_ramfb", and remove "vmstate_ramfb_cfg". For migration
>> purposes, we should treat the fw_cfg file as an opaque blob.
> 
> I think I see your point. Using VMSTATE_BUFFER like that doesn't work
> though.

Why not?

(I mean -- does it compile but misbehave, or it doesn't even compile (an
invalid use of the VMSTATE_BUFFER macro)?)

> It's also more error-prone if fields are added in the struct,
> imho.

The structure is effectively the guest-visible register block for the
device. We probably can't add any fields, and even if we do, the new
fields are going to be part of the fw_cfg blob (writeable file), so they
should be covered by VMSTATE_BUFFER just the same.

> 
> However, we could simply have a post-load to convert the values to BE.

post_load itself is not enough; if we want to go this route, then we
need pre_save too. Without a pre_save, the host endianness influences
the data serialized to the migration stream, and there's no way to know
how to recover (the source host's endianness is unknown at load time).

pre_save could work though, if it performed the same BE to host
conversions (to a separate buffer I guess!) as the fw_cfg write callback
does.

> I wonder if new macros couldn't be introduced too.
> 
>>>
>>> BTW: can you please assign
>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1859424> to yourself and
>>> link your patch into it? The reason we ended up duplicating work here is
>>> that noone took RHBZ 1859424 before.
> 
> I thought I did that.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1859424#c17

Ouch, sorry. That must have happened since I last looked, and I was
foolish enough not to CC myself on the BZ early on. My mistake!

> 
>>>
>>> ... Well, the ticket is RHEL-7478 in JIRA now, in fact. :/
> 
> :/
> 
> 

Laszlo

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-10-01 Thread Laszlo Ersek

On 10/1/23 21:25, Michael S. Tsirkin wrote:
> Not this actually - v2 of this.

Thank you, but:

- Stefan's question should be answered still IMO (although if you pick
up this series, then that could be interpreted as "QEMU bug, not spec bug")

- I was supposed to update the commit message on 7/7 in v3; I didn't
want to do it before Stefan's question was answered

Thanks!
Laszlo

Re: [PATCH 0/4] ui/console: multihead: fix crash, simplify logic

2023-10-01 Thread Laszlo Ersek

On 10/1/23 08:15, Mark Cave-Ayland wrote:
> On 30/09/2023 22:28, Laszlo Ersek wrote:
> 
>> On 9/29/23 09:57, Mark Cave-Ayland wrote:
>>> On 26/09/2023 09:00, Marc-André Lureau wrote:
>>>
>>>> Hi Laszlo
>>>>
>>>> On Mon, Sep 25, 2023 at 7:36 PM Laszlo Ersek  wrote:
>>>>> Has this been queued by someone? Both Gerd and Marc-André are "odd
>>>>> fixers", so I'm not sure who should be sending a PR with these patches
>>>>> (and I don't see a pending PULL at
>>>>> <https://lists.gnu.org/archive/html/qemu-devel/2023-09/threads.html>
>>>>> with these patch subjects included).
>>>>
>>>> I have the series in my "ui" branch. I was waiting for a few more
>>>> patches to be accumulated. But if someone else takes this first, I'll
>>>> drop them.
>>>
>>> Does this series fix the "../ui/console.c:818: dpy_get_ui_info:
>>> Assertion `dpy_ui_info_supported(con)' failed." assert() on startup when
>>> using gtk? It would be good to get this fixed in git master soon, as it
>>> has been broken for a couple of weeks now, and -display sdl has issues
>>> tracking the mouse correctly on my laptop here :(
>>
>> ... probably not; I've never seen that issue. Can you provide a
>> reproducer?
> 
> The environment is a standard Debian bookworm install building QEMU git
> master with QEMU gtk support. The only difference I can think of is that
> I do all my QEMU builds as a separate user, and then export the display
> to my current user desktop i.e.
> 
> As my current user:
>   $ xhost +
> 
> As my QEMU build user:
>   $ export DISPLAY=:1
>   $ ./build/qemu-system-sparc
>   qemu-system-sparc: ../ui/console.c:818: dpy_get_ui_info: Assertion
>  `dpy_ui_info_supported(con)' failed.
>   Aborted (core dumped)
> 
>> Also, it should be bisectable (over Marc-André's 52-part series I guess).
> 
> Indeed. I've just run git bisect and it returns the following:
> 
> a92e7bb4cad57cc5c8817fb18fb25650507b69f8 is the first bad commit
> commit a92e7bb4cad57cc5c8817fb18fb25650507b69f8
> Author: Marc-André Lureau 
> Date:   Tue Sep 12 10:13:01 2023 +0400
> 
>     ui: add precondition for dpy_get_ui_info()
> 
>     Ensure that it only get called when dpy_ui_info_supported(). The
>     function should always return a result. There should be a non-null
>     console or active_console.
> 
>     Modify the argument to be const as well.
> 
>     Signed-off-by: Marc-André Lureau 
>     Reviewed-by: Albert Esteve 
> 
>  include/ui/console.h | 2 +-
>  ui/console.c | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)

This commit looks plain wrong to me; or rather I don't understand the
argument.

In the particular crash, we fail in gtk_display_init -> gtk_widget_show
-> ... -> gd_configure -> gd_set_ui_size -> dpy_get_ui_info, and when
the latter calls dpy_ui_info_supported(), we find that
"con->hw_ops->ui_info" is NULL. In this particular case, "con->hw_ops"
is "vga_ops", and indeed "vga_ops" does not provide an "ui_info" funcptr.

SDL is unaffected because with SDL, we never call dpy_get_ui_info().

There's something fishy in the GTK display code BTW, in my opinion. I
can't quite put my finger on it, but commit aeffd071ed81 ("ui: Deliver
refresh rate via QemuUIInfo", 2022-06-14) definitely plays a role.

Before commit aeffd071ed81, "ui/gtk.c" wouldn't call dpy_get_ui_info()
either! Instead, from gd_configure(), we'd call gd_set_ui_info(),
directly setting the size from the incoming GdkEventConfigure object.

In commit aeffd071ed81, solely for the sake of carrying over the refresh
rate, gd_set_ui_info() was renamed to gd_set_ui_size(). The width and
height coming from the GdkEventConfigure object would be propagated the
same way to dpy_set_ui_info(), but the *rest* of the QemuUIInfo object
would be initialized differently. Before, the other fields would be
zero, now they'd come from dpy_get_ui_info() -- most likely for the sake
of carrying over the new refresh_rate field.

This in itself wouldn't crash, but it set up the call chain that is now
affected by the (IMO too strict) assertion.

Why is a hw_ops-based ui_info needed for dpy_get_ui_info()?
dpy_get_ui_info() never tries to *call* that function, it just returns
>ui_info. So dpy_get_ui_info() *already* guarantees that it returns
non-NULL.

Laszlo

> 
> 
> ATB,
> 
> Mark.
>

Re: [PATCH] hw/display/ramfb: plug slight guest-triggerable leak on mode setting

2023-09-30 Thread Laszlo Ersek

On 10/1/23 00:14, Laszlo Ersek wrote:
> On 9/29/23 13:17, Marc-André Lureau wrote:
>> Hi
>>
>> On Wed, Sep 27, 2023 at 7:46 PM Laszlo Ersek  wrote:
>>>
>>> On 9/19/23 15:19, Laszlo Ersek wrote:
>>>> The fw_cfg DMA write callback in ramfb prepares a new display surface in
>>>> QEMU; this new surface is put to use ("swapped in") upon the next display
>>>> update. At that time, the old surface (if any) is released.
>>>>
>>>> If the guest triggers the fw_cfg DMA write callback at least twice between
>>>> two adjacent display updates, then the second callback (and further such
>>>> callbacks) will leak the previously prepared (but not yet swapped in)
>>>> display surface.
>>>>
>>>> The issue can be shown by:
>>>>
>>>> (1) starting QEMU with "-trace displaysurface_free", and
>>>>
>>>> (2) running the following program in the guest UEFI shell:
>>>>
>>>>> #include// ShellAppMain()
>>>>> #include  // gBS
>>>>> #include   // 
>>>>> EFI_GRAPHICS_OUTPUT_PROTOCOL
>>>>>
>>>>> INTN
>>>>> EFIAPI
>>>>> ShellAppMain (
>>>>>   IN UINTN   Argc,
>>>>>   IN CHAR16  **Argv
>>>>>   )
>>>>> {
>>>>>   EFI_STATUSStatus;
>>>>>   VOID  *Interface;
>>>>>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>>>>>   UINT32Mode;
>>>>>
>>>>>   Status = gBS->LocateProtocol (
>>>>>   ,
>>>>>   NULL,
>>>>>   
>>>>>   );
>>>>>   if (EFI_ERROR (Status)) {
>>>>> return 1;
>>>>>   }
>>>>>
>>>>>   Gop = Interface;
>>>>>
>>>>>   Mode = 1;
>>>>>   for ( ; ;) {
>>>>> Status = Gop->SetMode (Gop, Mode);
>>>>> if (EFI_ERROR (Status)) {
>>>>>   break;
>>>>> }
>>>>>
>>>>> Mode = 1 - Mode;
>>>>>   }
>>>>>
>>>>>   return 1;
>>>>> }
>>>>
>>>> The symptom is then that:
>>>>
>>>> - only one trace message appears periodically,
>>>>
>>>> - the time between adjacent messages keeps increasing -- implying that
>>>>   some list structure (containing the leaked resources) keeps growing,
>>>>
>>>> - the "surface" pointer is ever different.
>>>>
>>>>> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0
>>>>> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10
>>>>> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0
>>>>> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0
>>>>> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80
>>>>> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00
>>>>> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0
>>>>> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50
>>>>> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50
>>>>> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0
>>>>> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0
>>>>> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0
>>>>> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00
>>>>> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910
>>>>> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20
>>>>> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40
>>>>> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020
>>>>> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160
>>>>> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0
>>>>> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870
>>>>> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960
>>>>> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140
>

Re: [PATCH] hw/display/ramfb: plug slight guest-triggerable leak on mode setting

2023-09-30 Thread Laszlo Ersek

On 9/29/23 13:17, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Sep 27, 2023 at 7:46 PM Laszlo Ersek  wrote:
>>
>> On 9/19/23 15:19, Laszlo Ersek wrote:
>>> The fw_cfg DMA write callback in ramfb prepares a new display surface in
>>> QEMU; this new surface is put to use ("swapped in") upon the next display
>>> update. At that time, the old surface (if any) is released.
>>>
>>> If the guest triggers the fw_cfg DMA write callback at least twice between
>>> two adjacent display updates, then the second callback (and further such
>>> callbacks) will leak the previously prepared (but not yet swapped in)
>>> display surface.
>>>
>>> The issue can be shown by:
>>>
>>> (1) starting QEMU with "-trace displaysurface_free", and
>>>
>>> (2) running the following program in the guest UEFI shell:
>>>
>>>> #include// ShellAppMain()
>>>> #include  // gBS
>>>> #include   // 
>>>> EFI_GRAPHICS_OUTPUT_PROTOCOL
>>>>
>>>> INTN
>>>> EFIAPI
>>>> ShellAppMain (
>>>>   IN UINTN   Argc,
>>>>   IN CHAR16  **Argv
>>>>   )
>>>> {
>>>>   EFI_STATUSStatus;
>>>>   VOID  *Interface;
>>>>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>>>>   UINT32Mode;
>>>>
>>>>   Status = gBS->LocateProtocol (
>>>>   ,
>>>>   NULL,
>>>>   
>>>>   );
>>>>   if (EFI_ERROR (Status)) {
>>>> return 1;
>>>>   }
>>>>
>>>>   Gop = Interface;
>>>>
>>>>   Mode = 1;
>>>>   for ( ; ;) {
>>>> Status = Gop->SetMode (Gop, Mode);
>>>> if (EFI_ERROR (Status)) {
>>>>   break;
>>>> }
>>>>
>>>> Mode = 1 - Mode;
>>>>   }
>>>>
>>>>   return 1;
>>>> }
>>>
>>> The symptom is then that:
>>>
>>> - only one trace message appears periodically,
>>>
>>> - the time between adjacent messages keeps increasing -- implying that
>>>   some list structure (containing the leaked resources) keeps growing,
>>>
>>> - the "surface" pointer is ever different.
>>>
>>>> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0
>>>> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10
>>>> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0
>>>> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0
>>>> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80
>>>> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00
>>>> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0
>>>> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50
>>>> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50
>>>> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0
>>>> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0
>>>> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0
>>>> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00
>>>> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910
>>>> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20
>>>> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40
>>>> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020
>>>> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160
>>>> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0
>>>> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870
>>>> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960
>>>> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140
>>>> 18566@1695127484.759987:displaysurface_free surface=0x7f2fcc321700
>>>> 18566@1695127485.940289:displaysurface_free surface=0x7f2fccaad100
>>>
>>> We figured this wasn't a CVE-worthy problem, as only small amounts of
>>> memory were leaked (the framebuffer itself is mapped from guest RAM, QEMU
>>> only allocates administrative structures), plus

Re: [PATCH 0/4] ui/console: multihead: fix crash, simplify logic

2023-09-30 Thread Laszlo Ersek

On 9/29/23 09:57, Mark Cave-Ayland wrote:
> On 26/09/2023 09:00, Marc-André Lureau wrote:
> 
>> Hi Laszlo
>>
>> On Mon, Sep 25, 2023 at 7:36 PM Laszlo Ersek  wrote:
>>> Has this been queued by someone? Both Gerd and Marc-André are "odd
>>> fixers", so I'm not sure who should be sending a PR with these patches
>>> (and I don't see a pending PULL at
>>> <https://lists.gnu.org/archive/html/qemu-devel/2023-09/threads.html>
>>> with these patch subjects included).
>>
>> I have the series in my "ui" branch. I was waiting for a few more
>> patches to be accumulated. But if someone else takes this first, I'll
>> drop them.
> 
> Does this series fix the "../ui/console.c:818: dpy_get_ui_info:
> Assertion `dpy_ui_info_supported(con)' failed." assert() on startup when
> using gtk? It would be good to get this fixed in git master soon, as it
> has been broken for a couple of weeks now, and -display sdl has issues
> tracking the mouse correctly on my laptop here :(

... probably not; I've never seen that issue. Can you provide a reproducer?

Also, it should be bisectable (over Marc-André's 52-part series I guess).

Laszlo

> 
> 
> ATB,
> 
> Mark.
>

Re: [PATCH] hw/display/ramfb: plug slight guest-triggerable leak on mode setting

2023-09-27 Thread Laszlo Ersek

On 9/19/23 15:19, Laszlo Ersek wrote:
> The fw_cfg DMA write callback in ramfb prepares a new display surface in
> QEMU; this new surface is put to use ("swapped in") upon the next display
> update. At that time, the old surface (if any) is released.
> 
> If the guest triggers the fw_cfg DMA write callback at least twice between
> two adjacent display updates, then the second callback (and further such
> callbacks) will leak the previously prepared (but not yet swapped in)
> display surface.
> 
> The issue can be shown by:
> 
> (1) starting QEMU with "-trace displaysurface_free", and
> 
> (2) running the following program in the guest UEFI shell:
> 
>> #include// ShellAppMain()
>> #include  // gBS
>> #include   // EFI_GRAPHICS_OUTPUT_PROTOCOL
>>
>> INTN
>> EFIAPI
>> ShellAppMain (
>>   IN UINTN   Argc,
>>   IN CHAR16  **Argv
>>   )
>> {
>>   EFI_STATUSStatus;
>>   VOID  *Interface;
>>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>>   UINT32Mode;
>>
>>   Status = gBS->LocateProtocol (
>>   ,
>>   NULL,
>>   
>>   );
>>   if (EFI_ERROR (Status)) {
>> return 1;
>>   }
>>
>>   Gop = Interface;
>>
>>   Mode = 1;
>>   for ( ; ;) {
>> Status = Gop->SetMode (Gop, Mode);
>> if (EFI_ERROR (Status)) {
>>   break;
>> }
>>
>> Mode = 1 - Mode;
>>   }
>>
>>   return 1;
>> }
> 
> The symptom is then that:
> 
> - only one trace message appears periodically,
> 
> - the time between adjacent messages keeps increasing -- implying that
>   some list structure (containing the leaked resources) keeps growing,
> 
> - the "surface" pointer is ever different.
> 
>> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0
>> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10
>> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0
>> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0
>> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80
>> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00
>> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0
>> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50
>> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50
>> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0
>> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0
>> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0
>> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00
>> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910
>> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20
>> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40
>> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020
>> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160
>> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0
>> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870
>> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960
>> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140
>> 18566@1695127484.759987:displaysurface_free surface=0x7f2fcc321700
>> 18566@1695127485.940289:displaysurface_free surface=0x7f2fccaad100
> 
> We figured this wasn't a CVE-worthy problem, as only small amounts of
> memory were leaked (the framebuffer itself is mapped from guest RAM, QEMU
> only allocates administrative structures), plus libvirt restricts QEMU
> memory footprint anyway, thus the guest can only DoS itself.
> 
> Plug the leak, by releasing the last prepared (not yet swapped in) display
> surface, if any, in the fw_cfg DMA write callback.
> 
> Regarding the "reproducer", with the fix in place, the log is flooded with
> trace messages (one per fw_cfg write), *and* the trace message alternates
> between just two "surface" pointer values (i.e., nothing is leaked, the
> allocator flip-flops between two objects in effect).
> 
> This issue appears to date back to the introducion of ramfb (995b30179bdc,
> "hw/display: add ramfb, a simple boot framebuffer living in guest ram",
> 2018-06-18).
> 
> Cc: Gerd Hoffmann  (maintainer:ramfb)
> Cc: qemu-sta...@nongnu.org
> Fixes: 995b30179bdc
> Signed-off-by: Laszlo Ersek 
> ---
>  hw/display/ramfb.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
> index 79b9754a5820..c2b002d53480 100644
> --- a/hw/display/ramfb.c
> +++ b/hw/display/ramfb.c
> @@ -97,6 +97,7 @@ static void ramfb_fw_cfg_write(void *dev, off_t offset, 
> size_t len)
>  
>  s->width = width;
>  s->height = height;
> +qemu_free_displaysurface(s->ds);
>  s->ds = surface;
>  }
>  

Ping.

Laszlo

Re: [PATCH 0/4] ui/console: multihead: fix crash, simplify logic

2023-09-25 Thread Laszlo Ersek

On 9/15/23 13:50, Marc-André Lureau wrote:
> Hi Laszlo
> 
> On Wed, Sep 13, 2023 at 6:50 PM Laszlo Ersek  wrote:
>>
>> Fix a recent regression (crash) in the multihead check; clean up the
>> code some more.
>>
>> Cc: "Marc-André Lureau"  (odd fixer:Graphics)
>> Cc: Gerd Hoffmann  (odd fixer:Graphics)
>>
>> Thanks,
>> Laszlo
>>
>> Laszlo Ersek (4):
>>   ui/console: make qemu_console_is_multihead() static
>>   ui/console: only walk QemuGraphicConsoles in
>> qemu_console_is_multihead()
>>   ui/console: eliminate QOM properties from qemu_console_is_multihead()
>>   ui/console: sanitize search in qemu_graphic_console_is_multihead()
> 
> Series:
> Reviewed-by: Marc-André Lureau 

Thanks.

Has this been queued by someone? Both Gerd and Marc-André are "odd
fixers", so I'm not sure who should be sending a PR with these patches
(and I don't see a pending PULL at
<https://lists.gnu.org/archive/html/qemu-devel/2023-09/threads.html>
with these patch subjects included).

Thanks.
Laszlo

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-09-25 Thread Laszlo Ersek

Ping -- Michael, any comments please? This set (now at v2) has been
waiting on your answer since Aug 30th.

Laszlo

On 9/5/23 08:30, Laszlo Ersek wrote:
> Michael,
> 
> On 8/30/23 17:37, Stefan Hajnoczi wrote:
>> On Wed, 30 Aug 2023 at 09:30, Laszlo Ersek  wrote:
>>>
>>> On 8/30/23 14:10, Stefan Hajnoczi wrote:
>>>> On Sun, 27 Aug 2023 at 14:31, Laszlo Ersek  wrote:
>>>>>
>>>>> (1) The virtio-1.0 specification
>>>>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:
>>>>>
>>>>>> 3 General Initialization And Device Operation
>>>>>> 3.1   Device Initialization
>>>>>> 3.1.1 Driver Requirements: Device Initialization
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>>>>the device, optional per-bus setup, reading and possibly writing the
>>>>>>device’s virtio configuration space, and population of virtqueues.
>>>>>>
>>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>>
>>>>> and
>>>>>
>>>>>> 4 Virtio Transport Options
>>>>>> 4.1   Virtio Over PCI Bus
>>>>>> 4.1.4 Virtio Structure PCI Capabilities
>>>>>> 4.1.4.3   Common configuration structure layout
>>>>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> The driver MUST configure the other virtqueue fields before enabling the
>>>>>> virtqueue with queue_enable.
>>>>>>
>>>>>> [...]
>>>>>
>>>>> These together mean that the following sub-sequence of steps is valid for
>>>>> a virtio-1.0 guest driver:
>>>>>
>>>>> (1.1) set "queue_enable" for the needed queues as the final part of device
>>>>> initialization step (7),
>>>>>
>>>>> (1.2) set DRIVER_OK in step (8),
>>>>>
>>>>> (1.3) immediately start sending virtio requests to the device.
>>>>>
>>>>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>>>>> special virtio feature is negotiated, then virtio rings start in disabled
>>>>> state, according to
>>>>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>>>>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
>>>>> enabling vrings.
>>>>>
>>>>> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
>>>>> operation, which travels from the guest through QEMU to the vhost-user
>>>>> backend, using a unix domain socket.
>>>>>
>>>>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>>>>> evades QEMU -- it travels from guest to the vhost-user backend via
>>>>> eventfd.
>>>>>
>>>>> This means that steps (1.1) and (1.3) travel through different channels,
>>>>> and their relative order can be reversed, as perceived by the vhost-user
>>>>> backend.
>>>>>
>>>>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
>>>>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>>>>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>>>>> crate.)
>>>>>
>>>>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>>>>> device initialization steps (i.e., control plane operations), and
>>>>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>>>>> operation). In the Rust-language virtiofsd, this creates a race between
>>>>> two components that run *concurrently*, i.e., in different threads or
>>>>> processes:
>>>>>
>>>>> - Control plane, handling vhost-user protocol messages:
>>>>>
>>>>>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>>>>   [crates/vhost-user-backend/src/handler.rs] handles
>>>>>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>>&

Re: Concerns regarding e17bebd049 ("dump: Set correct vaddr for ELF dump")

2023-09-21 Thread Laszlo Ersek

On 9/20/23 19:35, Stephen Brennan wrote:
> Hi Jon,
> 
> Jon Doron  writes:
>> Hi Stephen,
>> Like you have said the reason is as I wrote in the commit message, 
>> without "fixing" the vaddr GDB is messing up mapping and working with 
>> the generated core file.
> 
> For the record I totally love this workaround :)
> 
> It's clever and gets the job done and I would have done it in a
> heartbeat. It's just that it does end up making vmcores that have
> incorrect data, which is a pain for debuggers that are actually designed
> to look at kernel core dumps.
> 
>> This patch is almost 4 years old, perhaps some changes to GDB has been 
>> introduced to resolve this, I have not checked since then.
> 
> Program Headers:
>   Type   Offset VirtAddr   PhysAddr
>  FileSizMemSiz  Flags  Align
>   NOTE   0x0168 0x 0x
>  0x1980 0x1980 0x0
>   LOAD   0x1ae8 0x 0x
>  0x8000 0x8000 0x0
>   LOAD   0x80001ae8 0x 0xfffc
>  0x0004 0x0004 0x0
> 
> (gdb) info files
> Local core dump file:
> `/home/stepbren/repos/test_code/elf/dumpfile', file type elf64-x86-64.
> 0x - 0x8000 is load1
> 0x - 0x0004 is load2
> 
> $ gdb --version 
> GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.0.2.el9
> Copyright (C) 2021 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> 
> 
> It doesn't *look like* anything has changed in this version of GDB. But
> I'm not really certain that GDB is expected to use the physical
> addresses in the load segments: it's not a kernel debugger.

The paging=off vmcores dumped by QEMU are primarily meant for the
"crash" utility , not gdb.
Crash builds upon gdb (it downloads a gdb tarball at build time, IIRC),
but either way, the vmcores are meant to be consumed by "crash", and
crash *is* a kernel debugger (both live, and post-mortem).

So, from my perspective: "whatever works with 'crash'". If you revert
Jon's commit and the vmcores continue working with "crash", I won't object.

I commented similarly under Jon's v1 patch -- as long as paging=off
dumps continue working with "crash", I'm fine:

http://mid.mail-archive.com/7961a154-f139-af73-613d-94b88bf95392@redhat.com

For reference, these are the v1 through v3 patch threads, from 2019:

http://mid.mail-archive.com/20181225125344.4482-1-arilou@gmail.com
http://mid.mail-archive.com/20190108130219.18550-1-arilou@gmail.com
http://mid.mail-archive.com/20190109082203.27142-1-arilou@gmail.com

Laszlo


> 
> I think hacking the p_vaddr field _is_ the way to get GDB to behave in
> the way you want: allow you to read physical memory addresses.
> 
>> As I'm no longer using this feature and have not worked and tested it 
>> in a long while, so I have no obligations to this change, but perhaps
>> someone else might be using it...
> 
> I definitely think it's valuable for people to continue being able to
> use QEMU vmcores generated with paging=off in GDB, even if GDB isn't
> desgined for it. It seems like a useful hack that appeals to the lowest
> common denominator: most people have GDB and not a purpose-built kernel
> debugger. But maybe we could point to a program like the below that will
> tweak the p_paddr field after the fact, in order to appeal to GDB's
> sensibilities?
> 
> Thanks,
> Stephen
> 
> ---
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #include 
> 
> static void fail(const char *msg)
> {
>   fprintf(stderr, "%s\n", msg);
>   exit(EXIT_FAILURE);
> }
> 
> static void perror_fail(const char *pfx)
> {
>   perror(pfx);
>   exit(EXIT_FAILURE);
> }
> 
> static void usage(void)
> {
>   puts("usage: phys2virt COREFILE");
>   puts("Modifies the ELF COREFILE so that load segments have their 
> virtual");
>   puts("address value copied from the physical address field.");
>   exit(EXIT_SUCCESS);
> }
> 
> static int endian(void)
> {
>   union {
>   uint32_t ival;
>   char cval[4];
>   } data;
>   data.ival = 1;
>   if (data.cval[0])
>   return ELFDATA2LSB;
>   else
>   return ELFDATA2MSB;
> }
> 
> int main(int argc, char **argv)
> {
>   char *filename;
>   FILE *f;
>   Elf64_Ehdr hdr;
>   Elf64_Phdr *phdrs;
>   off_t phoff;
>   int phnum, phentsize;
>   bool bswap;
> 
>   if (argc != 2 || strcmp(argv[1], "-h") == 0)
>   usage();
> 
>   filename =

[PATCH] hw/display/ramfb: plug slight guest-triggerable leak on mode setting

2023-09-19 Thread Laszlo Ersek

The fw_cfg DMA write callback in ramfb prepares a new display surface in
QEMU; this new surface is put to use ("swapped in") upon the next display
update. At that time, the old surface (if any) is released.

If the guest triggers the fw_cfg DMA write callback at least twice between
two adjacent display updates, then the second callback (and further such
callbacks) will leak the previously prepared (but not yet swapped in)
display surface.

The issue can be shown by:

(1) starting QEMU with "-trace displaysurface_free", and

(2) running the following program in the guest UEFI shell:

> #include// ShellAppMain()
> #include  // gBS
> #include   // EFI_GRAPHICS_OUTPUT_PROTOCOL
>
> INTN
> EFIAPI
> ShellAppMain (
>   IN UINTN   Argc,
>   IN CHAR16  **Argv
>   )
> {
>   EFI_STATUSStatus;
>   VOID  *Interface;
>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>   UINT32Mode;
>
>   Status = gBS->LocateProtocol (
>   ,
>   NULL,
>   
>   );
>   if (EFI_ERROR (Status)) {
> return 1;
>   }
>
>   Gop = Interface;
>
>   Mode = 1;
>   for ( ; ;) {
> Status = Gop->SetMode (Gop, Mode);
> if (EFI_ERROR (Status)) {
>   break;
> }
>
> Mode = 1 - Mode;
>   }
>
>   return 1;
> }

The symptom is then that:

- only one trace message appears periodically,

- the time between adjacent messages keeps increasing -- implying that
  some list structure (containing the leaked resources) keeps growing,

- the "surface" pointer is ever different.

> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0
> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10
> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0
> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0
> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80
> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00
> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0
> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50
> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50
> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0
> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0
> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0
> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00
> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910
> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20
> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40
> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020
> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160
> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0
> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870
> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960
> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140
> 18566@1695127484.759987:displaysurface_free surface=0x7f2fcc321700
> 18566@1695127485.940289:displaysurface_free surface=0x7f2fccaad100

We figured this wasn't a CVE-worthy problem, as only small amounts of
memory were leaked (the framebuffer itself is mapped from guest RAM, QEMU
only allocates administrative structures), plus libvirt restricts QEMU
memory footprint anyway, thus the guest can only DoS itself.

Plug the leak, by releasing the last prepared (not yet swapped in) display
surface, if any, in the fw_cfg DMA write callback.

Regarding the "reproducer", with the fix in place, the log is flooded with
trace messages (one per fw_cfg write), *and* the trace message alternates
between just two "surface" pointer values (i.e., nothing is leaked, the
allocator flip-flops between two objects in effect).

This issue appears to date back to the introducion of ramfb (995b30179bdc,
"hw/display: add ramfb, a simple boot framebuffer living in guest ram",
2018-06-18).

Cc: Gerd Hoffmann  (maintainer:ramfb)
Cc: qemu-sta...@nongnu.org
Fixes: 995b30179bdc
Signed-off-by: Laszlo Ersek 
---
 hw/display/ramfb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
index 79b9754a5820..c2b002d53480 100644
--- a/hw/display/ramfb.c
+++ b/hw/display/ramfb.c
@@ -97,6 +97,7 @@ static void ramfb_fw_cfg_write(void *dev, off_t offset, 
size_t len)
 
 s->width = width;
 s->height = height;
+qemu_free_displaysurface(s->ds);
 s->ds = surface;
 }

[PATCH 1/4] ui/console: make qemu_console_is_multihead() static

2023-09-13 Thread Laszlo Ersek

qemu_console_is_multihead() is only called from within "ui/console.c";
make it static.

Cc: "Marc-André Lureau"  (odd fixer:Graphics)
Cc: Gerd Hoffmann  (odd fixer:Graphics)
Signed-off-by: Laszlo Ersek 
---
 include/ui/console.h | 1 -
 ui/console.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index 1ccd432b4d64..d715f88b1be2 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -506,7 +506,6 @@ bool qemu_console_is_visible(QemuConsole *con);
 bool qemu_console_is_graphic(QemuConsole *con);
 bool qemu_console_is_fixedsize(QemuConsole *con);
 bool qemu_console_is_gl_blocked(QemuConsole *con);
-bool qemu_console_is_multihead(DeviceState *dev);
 char *qemu_console_get_label(QemuConsole *con);
 int qemu_console_get_index(QemuConsole *con);
 uint32_t qemu_console_get_head(QemuConsole *con);
diff --git a/ui/console.c b/ui/console.c
index e4d61794bb2c..adacc3473140 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -2365,7 +2365,7 @@ bool qemu_console_is_gl_blocked(QemuConsole *con)
 return con->gl_block;
 }
 
-bool qemu_console_is_multihead(DeviceState *dev)
+static bool qemu_console_is_multihead(DeviceState *dev)
 {
 QemuConsole *con;
 Object *obj;

[PATCH 4/4] ui/console: sanitize search in qemu_graphic_console_is_multihead()

2023-09-13 Thread Laszlo Ersek

qemu_graphic_console_is_multihead() declares the graphical console "c" a
"multihead" console if there are two different graphical consoles in the
system that (a) both reference "c->device", and (b) have different
"c->head" numbers. In effect, if at least two graphical consoles exist
that are different heads of the same device that underlies "c". In fact,
"c" may be one of these two graphical consoles, or "c" may differ from
both of those consoles (in case "c->device" has at least three heads).

The loop currently uses this awkward "two different consoles" approach
because the function used not to have access to "c", only to "c->device",
which didn't allow for fetching (and comparing) "c->head". But, we've
changed that in the last patch; we now pass all of "c" to
qemu_graphic_console_is_multihead().

Thus, look for the *first* (and possibly *only*) graphical console, if
any, that refers to the same "device" as "c", but by a different "head"
number.

Cc: "Marc-André Lureau"  (odd fixer:Graphics)
Cc: Gerd Hoffmann  (odd fixer:Graphics)
Signed-off-by: Laszlo Ersek 
---

Notes:
context:-U4

 ui/console.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/ui/console.c b/ui/console.c
index 6424820c8521..9ce3c1248c7c 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -2367,10 +2367,8 @@ bool qemu_console_is_gl_blocked(QemuConsole *con)
 
 static bool qemu_graphic_console_is_multihead(QemuGraphicConsole *c)
 {
 QemuConsole *con;
-uint32_t f = 0x;
-uint32_t h;
 
 QTAILQ_FOREACH(con, , next) {
 QemuGraphicConsole *candidate;
 
@@ -2382,12 +2380,9 @@ static bool 
qemu_graphic_console_is_multihead(QemuGraphicConsole *c)
 if (candidate->device != c->device) {
 continue;
 }
 
-h = candidate->head;
-if (f == 0x) {
-f = h;
-} else if (h != f) {
+if (candidate->head != c->head) {
 return true;
 }
 }
 return false;

[PATCH 3/4] ui/console: eliminate QOM properties from qemu_console_is_multihead()

2023-09-13 Thread Laszlo Ersek

According to Marc-André's and Gerd's descriptions, the "device" and
"head" members of QemuGraphicConsole are exposed as QOM properties for two
purposes:

(1) Introspection (e.g., "qom-get" monitor command).

(2) A VNC server can display a specific device + head. This lets us run a
multihead configuration by using multiple VNC servers (one for each
head).

Further, we can link input devices to device + head, so input events
are routed to different devices dependent on where they are coming
from. Which is most useful for tablet devices in a VNC multihead
setup, each head has its own tablet device then. This does requires
manual guest-side configuration, for establishing the same tablet <->
head relationship.

However, neither goal seems to justify the complicated QOM property lookup
that's internal to qemu_console_is_multihead().

Rework qemu_console_is_multihead() with plain old C language field
accesses.

Cc: "Marc-André Lureau"  (odd fixer:Graphics)
Cc: Gerd Hoffmann  (odd fixer:Graphics)
Signed-off-by: Laszlo Ersek 
---
 ui/console.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/ui/console.c b/ui/console.c
index 2ee65207b430..6424820c8521 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -2365,25 +2365,25 @@ bool qemu_console_is_gl_blocked(QemuConsole *con)
 return con->gl_block;
 }
 
-static bool qemu_console_is_multihead(DeviceState *dev)
+static bool qemu_graphic_console_is_multihead(QemuGraphicConsole *c)
 {
 QemuConsole *con;
-Object *obj;
 uint32_t f = 0x;
 uint32_t h;
 
 QTAILQ_FOREACH(con, , next) {
+QemuGraphicConsole *candidate;
+
 if (!QEMU_IS_GRAPHIC_CONSOLE(con)) {
 continue;
 }
-obj = object_property_get_link(OBJECT(con),
-   "device", _abort);
-if (DEVICE(obj) != dev) {
+
+candidate = QEMU_GRAPHIC_CONSOLE(con);
+if (candidate->device != c->device) {
 continue;
 }
 
-h = object_property_get_uint(OBJECT(con),
- "head", _abort);
+h = candidate->head;
 if (f == 0x) {
 f = h;
 } else if (h != f) {
@@ -2402,7 +2402,7 @@ char *qemu_console_get_label(QemuConsole *con)
 bool multihead;
 
 dev = DEVICE(c->device);
-multihead = qemu_console_is_multihead(dev);
+multihead = qemu_graphic_console_is_multihead(c);
 if (multihead) {
 return g_strdup_printf("%s.%d", dev->id ?
dev->id :

[PATCH 2/4] ui/console: only walk QemuGraphicConsoles in qemu_console_is_multihead()

2023-09-13 Thread Laszlo Ersek

qemu_console_is_multihead() declares the console "c" a "multihead" console
if there are two different consoles in the system that (a) both reference
"c->device", and (b) have different "c->head" numbers. In effect, if at
least two consoles exist that are different heads of the same device that
underlies "c".

Commit 58d5870845c6 ("ui/console: move graphic fields to
QemuGraphicConsole", 2023-09-04) pushed the "device" and "head" members
from the QemuConsole base class down to the QemuGraphicConsole subclass,
adjusting the referring QOM properties accordingly as well. As a result,
the "device" property lookup in qemu_console_is_multihead() now crashes,
in case the candidate console being investigated for criterion (a) is not
a QemuGraphicConsole instance:

> Unexpected error in object_property_find_err() at qom/object.c:1314:
> qemu: Property 'qemu-fixed-text-console.device' not found
> Aborted (core dumped)

This is effectively an unchecked downcast. Make it checked: only consider
such console candidates that are themselves QemuGraphicConsole instances.

Cc: "Marc-André Lureau"  (odd fixer:Graphics)
Cc: Gerd Hoffmann  (odd fixer:Graphics)
Fixes: 58d5870845c6
Signed-off-by: Laszlo Ersek 
---
 ui/console.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ui/console.c b/ui/console.c
index adacc3473140..2ee65207b430 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -2373,6 +2373,9 @@ static bool qemu_console_is_multihead(DeviceState *dev)
 uint32_t h;
 
 QTAILQ_FOREACH(con, , next) {
+if (!QEMU_IS_GRAPHIC_CONSOLE(con)) {
+continue;
+}
 obj = object_property_get_link(OBJECT(con),
"device", _abort);
 if (DEVICE(obj) != dev) {

[PATCH 0/4] ui/console: multihead: fix crash, simplify logic

2023-09-13 Thread Laszlo Ersek

Fix a recent regression (crash) in the multihead check; clean up the
code some more.

Cc: "Marc-André Lureau"  (odd fixer:Graphics)
Cc: Gerd Hoffmann  (odd fixer:Graphics)

Thanks,
Laszlo

Laszlo Ersek (4):
  ui/console: make qemu_console_is_multihead() static
  ui/console: only walk QemuGraphicConsoles in
qemu_console_is_multihead()
  ui/console: eliminate QOM properties from qemu_console_is_multihead()
  ui/console: sanitize search in qemu_graphic_console_is_multihead()

 include/ui/console.h |  1 -
 ui/console.c | 24 +---
 2 files changed, 11 insertions(+), 14 deletions(-)

Re: [PATCH v2] hw/i386/pc: fix code comment on cumulative flash size

2023-09-13 Thread Laszlo Ersek

On 9/12/23 18:40, Philippe Mathieu-Daudé wrote:
> On 12/9/23 17:55, Laszlo Ersek wrote:
>> - The comment is incorrectly indented / formatted.
>>
>> - The comment states a 8MB limit, even though the code enforces a 16MB
>>    limit.
>>
>> Both of these warts come from commit 0657c657eb37 ("hw/i386/pc: add max
>> combined fw size as machine configuration option", 2020-12-09); clean
>> them
>> up.
>>
>> Arguably, it's also better to be consistent with the binary units
>> (such as
>> "MiB") that QEMU uses nowadays.
>>
>> Cc: "Michael S. Tsirkin"  (supporter:PC)
>> Cc: Marcel Apfelbaum  (supporter:PC)
>> Cc: Paolo Bonzini  (maintainer:X86 TCG CPUs)
>> Cc: Richard Henderson  (maintainer:X86
>> TCG CPUs)
>> Cc: Eduardo Habkost  (maintainer:X86 TCG CPUs)
>> Cc: qemu-triv...@nongnu.org
>> Fixes: 0657c657eb37
>> Signed-off-by: Laszlo Ersek 
>> ---
>>
>> Notes:
>>  v2:
>>   - use the binary units MiB, KiB, GiB comprehensively in the
>> comment
> 
> I was going to suggest that ;)
> 
> Reviewed-by: Philippe Mathieu-Daudé 

And when I was writing the patch, I was 100% sure that you were going to
be my first reviewer. :)

Thanks!
Laszlo

> 
>>
>>   hw/i386/pc.c | 12 ++--
>>   1 file changed, 6 insertions(+), 6 deletions(-)
>

[PATCH v2] hw/i386/pc: fix code comment on cumulative flash size

2023-09-12 Thread Laszlo Ersek

- The comment is incorrectly indented / formatted.

- The comment states a 8MB limit, even though the code enforces a 16MB
  limit.

Both of these warts come from commit 0657c657eb37 ("hw/i386/pc: add max
combined fw size as machine configuration option", 2020-12-09); clean them
up.

Arguably, it's also better to be consistent with the binary units (such as
"MiB") that QEMU uses nowadays.

Cc: "Michael S. Tsirkin"  (supporter:PC)
Cc: Marcel Apfelbaum  (supporter:PC)
Cc: Paolo Bonzini  (maintainer:X86 TCG CPUs)
Cc: Richard Henderson  (maintainer:X86 TCG CPUs)
Cc: Eduardo Habkost  (maintainer:X86 TCG CPUs)
Cc: qemu-triv...@nongnu.org
Fixes: 0657c657eb37
Signed-off-by: Laszlo Ersek 
---

Notes:
v2:

- use the binary units MiB, KiB, GiB comprehensively in the comment

 hw/i386/pc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 54838c0c411d..0b642e8af590 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1746,12 +1746,12 @@ static void pc_machine_set_max_fw_size(Object *obj, 
Visitor *v,
 }
 
 /*
-* We don't have a theoretically justifiable exact lower bound on the base
-* address of any flash mapping. In practice, the IO-APIC MMIO range is
-* [0xFEE0..0xFEE01000] -- see IO_APIC_DEFAULT_ADDRESS --, leaving free
-* only 18MB-4KB below 4G. For now, restrict the cumulative mapping to 8MB 
in
-* size.
-*/
+ * We don't have a theoretically justifiable exact lower bound on the base
+ * address of any flash mapping. In practice, the IO-APIC MMIO range is
+ * [0xFEE0..0xFEE01000] -- see IO_APIC_DEFAULT_ADDRESS --, leaving free
+ * only 18MiB-4KiB below 4GiB. For now, restrict the cumulative mapping to
+ * 16MiB in size.
+ */
 if (value > 16 * MiB) {
 error_setg(errp,
"User specified max allowed firmware size %" PRIu64 " is "

[PATCH] hw/i386/pc: fix code comment on cumulative flash size

2023-09-12 Thread Laszlo Ersek

- The comment is incorrectly indented / formatted.

- The comment states a 8MB limit, even though the code enforces a 16MB
  limit.

Both of these warts come from commit 0657c657eb37 ("hw/i386/pc: add max
combined fw size as machine configuration option", 2020-12-09); clean them
up.

Arguably, it's also better to be consistent with the "MiB" unit that QEMU
uses nowadays.

Cc: "Michael S. Tsirkin"  (supporter:PC)
Cc: Marcel Apfelbaum  (supporter:PC)
Cc: Paolo Bonzini  (maintainer:X86 TCG CPUs)
Cc: Richard Henderson  (maintainer:X86 TCG CPUs)
Cc: Eduardo Habkost  (maintainer:X86 TCG CPUs)
Cc: qemu-triv...@nongnu.org
Fixes: 0657c657eb37
Signed-off-by: Laszlo Ersek 
---
 hw/i386/pc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 54838c0c411d..d06b8da49cae 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1746,12 +1746,12 @@ static void pc_machine_set_max_fw_size(Object *obj, 
Visitor *v,
 }
 
 /*
-* We don't have a theoretically justifiable exact lower bound on the base
-* address of any flash mapping. In practice, the IO-APIC MMIO range is
-* [0xFEE0..0xFEE01000] -- see IO_APIC_DEFAULT_ADDRESS --, leaving free
-* only 18MB-4KB below 4G. For now, restrict the cumulative mapping to 8MB 
in
-* size.
-*/
+ * We don't have a theoretically justifiable exact lower bound on the base
+ * address of any flash mapping. In practice, the IO-APIC MMIO range is
+ * [0xFEE0..0xFEE01000] -- see IO_APIC_DEFAULT_ADDRESS --, leaving free
+ * only 18MB-4KB below 4G. For now, restrict the cumulative mapping to 
16MiB
+ * in size.
+ */
 if (value > 16 * MiB) {
 error_setg(errp,
"User specified max allowed firmware size %" PRIu64 " is "

Re: riscv64 virt board crash upon startup

2023-09-12 Thread Laszlo Ersek

On 9/11/23 15:12, Laszlo Ersek wrote:
> On 9/11/23 10:53, Gerd Hoffmann wrote:
>> On Mon, Sep 11, 2023 at 12:12:43PM +0400, Marc-André Lureau wrote:
>>>> Gerd, here's the question for you: why are "device" and "head" QOM 
>>>> properties in the first place? What are they needed for?
>>>>
>>>
>>> You get QOM tree introspection (ex: (qemu) qom-get
>>> /backend/console[0]/device type). Other than that, I don't think it
>>> brings anything else.
>>
>> You can configure vnc server(s) to show a specific device + head, which
>> allows to run multihead configurations by using multiple vnc servers (one
>> for each head).
>>
>> You can link input devices to device + head, so input events can go to
>> different devices depending on where they are coming from.  Which is
>> most useful for tablet devices in a vnc multihead setup, each head has
>> its own tablet device then.  Requires manual guest-side configuration
>> to establish the same tablet <-> head relationship (tested that years
>> ago with X11, not sure if and how this can be done with wayland).
> 
> OK, so I'm going to drop patch#3.

Hmmm, wait, I originally asked about the "QOM trickery" for a different
reason.

There are two things:

- using (exposing) QOM properties for introspection,

- using those propreties for internal access.

Patch#3 would eliminate property use *internally*, it would not
interfere with the use case explained by Gerd.

I originally asked about QOM because I wanted to know where to *stop
removing* QOM stuff. Like, once I replaced the QOM accessors with normal
C struct / field accesses in qemu_console_is_multihead(), why would I
stop there, and not just remove the "head" and and "device" properties
altogether?

With Gerd's explanation, I understand we need to keep those properties
-- but that doesn't seem to imply we *must* use the properties even in
internal functions such as qemu_console_is_multihead(). There, we can
just go for direct field access; is that right? (IOW I'd still keep
patch#3, if I can!)

Thanks!
Laszlo

Re: riscv64 virt board crash upon startup

2023-09-11 Thread Laszlo Ersek

On 9/11/23 10:53, Gerd Hoffmann wrote:
> On Mon, Sep 11, 2023 at 12:12:43PM +0400, Marc-André Lureau wrote:
>>> Gerd, here's the question for you: why are "device" and "head" QOM 
>>> properties in the first place? What are they needed for?
>>>
>>
>> You get QOM tree introspection (ex: (qemu) qom-get
>> /backend/console[0]/device type). Other than that, I don't think it
>> brings anything else.
> 
> You can configure vnc server(s) to show a specific device + head, which
> allows to run multihead configurations by using multiple vnc servers (one
> for each head).
> 
> You can link input devices to device + head, so input events can go to
> different devices depending on where they are coming from.  Which is
> most useful for tablet devices in a vnc multihead setup, each head has
> its own tablet device then.  Requires manual guest-side configuration
> to establish the same tablet <-> head relationship (tested that years
> ago with X11, not sure if and how this can be done with wayland).

OK, so I'm going to drop patch#3.

Thanks!
Laszlo

Re: riscv64 virt board crash upon startup

2023-09-07 Thread Laszlo Ersek

On 9/8/23 01:47, Laszlo Ersek wrote:

> I don't know why qemu_console_is_multihead() used a lot of QOM
> trickery for this in the first place, but here's what I'd propose as
> fix -- simply try to locate a QemuGraphicConsole in "consoles" that
> references the same "device" that *this* QemuGraphicConsole
> references, but by a different "head" number.

So, the final version of the function would look like:

static bool qemu_graphic_console_is_multihead(QemuGraphicConsole *c)
{
QemuConsole *con;

QTAILQ_FOREACH(con, , next) {
if (!QEMU_IS_GRAPHIC_CONSOLE(con)) {
continue;
}
QemuGraphicConsole *candidate = QEMU_GRAPHIC_CONSOLE(con);
if (candidate->device != c->device) {
continue;
}

if (candidate->head != c->head) {
return true;
}
}
return false;
}

Laszlo

Re: riscv64 virt board crash upon startup

2023-09-07 Thread Laszlo Ersek

Question for Gerd below:

On 9/7/23 14:29, Philippe Mathieu-Daudé wrote:
> On 7/9/23 13:25, Laszlo Ersek wrote:
>> This is with QEMU v8.1.0-391-gc152379422a2.
>>
>> I use the command line from (scroll to the bottom):
>>
>>    https://github.com/tianocore/edk2/commit/49f06b664018
>>
>> (with "-full-screen" removed).
>>
>> The crash is as follows:
>>
>>    Unexpected error in object_property_find_err() at
>> ../../src/upstream/qemu/qom/object.c:1314:
>>    qemu: Property 'qemu-fixed-text-console.device' not found
>>    Aborted (core dumped)
> 
> Cc'ing Marc-André for commit b208f745a8
> ("ui/console: introduce different console objects")

First bad commit:

58d5870845c61cea1e7df287b86c2607b2bf48a9 is the first bad commit
commit 58d5870845c61cea1e7df287b86c2607b2bf48a9
Author: Marc-André Lureau 
Date:   Wed Aug 30 13:38:03 2023 +0400

ui/console: move graphic fields to QemuGraphicConsole

Move fields specific to graphic console to the console subclass.

qemu_console_get_head() is adapated to accomodate QemuTextConsole, and
always returns 0.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <20230830093843.3531473-30-marcandre.lur...@redhat.com>

 ui/console.c | 110 ++-
 1 file changed, 64 insertions(+), 46 deletions(-)

Bisection log:

git bisect start
# status: waiting for both good and bad commits
# bad: [c152379422a204109f34ca2b43ecc538c7d738ae] Merge tag 'ui-pull-request' 
of https://gitlab.com/marcandre.lureau/qemu into staging
git bisect bad c152379422a204109f34ca2b43ecc538c7d738ae
# status: waiting for good commit(s), bad commit known
# good: [17780edd81d27fcfdb7a802efc870a99788bd2fc] Merge tag 
'quick-fix-pull-request' of https://gitlab.com/bsdimp/qemu into staging
git bisect good 17780edd81d27fcfdb7a802efc870a99788bd2fc
# good: [912a9efd6bf4d808b238e17d26de2e4bb9bc4743] Merge tag 
'pull-aspeed-20230901' of https://github.com/legoater/qemu into staging
git bisect good 912a9efd6bf4d808b238e17d26de2e4bb9bc4743
# bad: [6ce7b1fa8844db668f0a3c0b37b78b08d331a16a] ui/console: remove need for 
g_width/g_height
git bisect bad 6ce7b1fa8844db668f0a3c0b37b78b08d331a16a
# good: [6505fd8d2390e57c6a2e84f9c07b9e408ad7da76] ui/vc: move VCCharDev 
specific fields out of QemuConsole
git bisect good 6505fd8d2390e57c6a2e84f9c07b9e408ad7da76
# good: [7fa4b8041b870951642515e0954d274ec4d599b1] ui/console: update the head 
from unused QemuConsole
git bisect good 7fa4b8041b870951642515e0954d274ec4d599b1
# good: [b2bb9cc43dbb942a5333a6271629fd6094771bca] ui/vc: move text fields to 
QemuTextConsole
git bisect good b2bb9cc43dbb942a5333a6271629fd6094771bca
# bad: [98ee9dab81b2bc75d6ccf86681053ed80f9fc9af] ui/vc: fold 
text_console_do_init() in vc_chr_open()
git bisect bad 98ee9dab81b2bc75d6ccf86681053ed80f9fc9af
# bad: [58d5870845c61cea1e7df287b86c2607b2bf48a9] ui/console: move graphic 
fields to QemuGraphicConsole
git bisect bad 58d5870845c61cea1e7df287b86c2607b2bf48a9
# first bad commit: [58d5870845c61cea1e7df287b86c2607b2bf48a9] ui/console: move 
graphic fields to QemuGraphicConsole

The problem is that the commit in question didn't update 
qemu_console_is_multihead().

qemu_console_is_multihead() checks, effectively, if there is another console in 
the system that refers to *this* console's device, but under a different "head" 
number.

I don't know why qemu_console_is_multihead() used a lot of QOM trickery for 
this in the first place, but here's what I'd propose as fix -- simply try to 
locate a QemuGraphicConsole in "consoles" that references the same "device" 
that *this* QemuGraphicConsole references, but by a different "head" number.


* Patch #1 -- make "qemu_console_is_multihead" static:

diff --git a/include/ui/console.h b/include/ui/console.h
index 1ccd432b4d64..d715f88b1be2 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -506,7 +506,6 @@ bool qemu_console_is_visible(QemuConsole *con);
 bool qemu_console_is_graphic(QemuConsole *con);
 bool qemu_console_is_fixedsize(QemuConsole *con);
 bool qemu_console_is_gl_blocked(QemuConsole *con);
-bool qemu_console_is_multihead(DeviceState *dev);
 char *qemu_console_get_label(QemuConsole *con);
 int qemu_console_get_index(QemuConsole *con);
 uint32_t qemu_console_get_head(QemuConsole *con);
diff --git a/ui/console.c b/ui/console.c
index e4d61794bb2c..adacc3473140 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -2365,7 +2365,7 @@ bool qemu_console_is_gl_blocked(QemuConsole *con)
 return con->gl_block;
 }
 
-bool qemu_console_is_multihead(DeviceState *dev)
+static bool qemu_console_is_multihead(DeviceState *dev)
 {
 QemuConsole *con;
 Object *obj;


* Patch #2 -- only check QemuGraphicConsoles for referecing our "device" by a 
different "hea

Re: [PATCH v2 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-09-07 Thread Laszlo Ersek

On 9/7/23 17:59, Eugenio Perez Martin wrote:
> On Wed, Aug 30, 2023 at 3:41 PM Laszlo Ersek  wrote:
>>
>> (1) The virtio-1.2 specification
>> <http://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html> writes:
>>
>>> 3 General Initialization And Device Operation
>>> 3.1   Device Initialization
>>> 3.1.1 Driver Requirements: Device Initialization
>>>
>>> [...]
>>>
>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>the device, optional per-bus setup, reading and possibly writing the
>>>device’s virtio configuration space, and population of virtqueues.
>>>
>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>
>> and
>>
>>> 4 Virtio Transport Options
>>> 4.1   Virtio Over PCI Bus
>>> 4.1.4 Virtio Structure PCI Capabilities
>>> 4.1.4.3   Common configuration structure layout
>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>
>>> [...]
>>>
>>> The driver MUST configure the other virtqueue fields before enabling the
>>> virtqueue with queue_enable.
>>>
>>> [...]
>>
>> (The same statements are present in virtio-1.0 identically, at
>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html>.)
>>
>> These together mean that the following sub-sequence of steps is valid for
>> a virtio-1.0 guest driver:
>>
>> (1.1) set "queue_enable" for the needed queues as the final part of device
>> initialization step (7),
>>
>> (1.2) set DRIVER_OK in step (8),
>>
>> (1.3) immediately start sending virtio requests to the device.
>>
>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>> special virtio feature is negotiated, then virtio rings start in disabled
>> state, according to
>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
>> enabling vrings.
>>
>> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
>> operation, which travels from the guest through QEMU to the vhost-user
>> backend, using a unix domain socket.
>>
> 
> The code looks good to me, but this part of the message is not precise
> if I understood it correctly.
> 
> Guest PCI "queue_enable" writes remain in the qemu virtio device model
> until the guest writes DRIVER_OK to the status. I'm referring to
> hw/virtio/virtio-pci.c:virtio_pci_common_write, case
> VIRTIO_PCI_COMMON_Q_ENABLE. From there, virtio_queue_enable just saves
> the info in VirtIOPCIProxy.
> 
> After the needed queues are enabled, the guest writes DRIVER_OK status
> bit. Then, the vhost backend is started and qemu sends the
> VHOST_USER_SET_VRING_ENABLE through the unix socket. And that is the
> source of the message that is racing with the dataplane.

OK, so this means that 1.1 is "buffered" in QEMU until 1.2, but the race
between 1.2 and 1.3 is just the same.

I can reword the commit message to take this into account.

Thanks!
Laszlo

> 
> I didn't confirm it with virtiofs through tracing / debugging, so I
> may be missing something.
> 
> Even with the small nit, the code fixes the problem.
> 
> Acked-by: Eugenio Pérez 
> 
>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>> evades QEMU -- it travels from guest to the vhost-user backend via
>> eventfd.
>>
>> This means that steps (1.1) and (1.3) travel through different channels,
>> and their relative order can be reversed, as perceived by the vhost-user
>> backend.
>>
>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>> crate.)
>>
>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>> device initialization steps (i.e., control plane operations), and
>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>> operation). In the Rust-language virtiofsd, this creates a race between
>> two components that run *concurrently*, i.e., in different threads or
>> processes:
>>
>> - Control plane, handling vhost-user protocol messages:
>>
>>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>   [crates/vhost-user-backend/src/handler.rs] handles
>>   VHOST_USER_SET_

riscv64 virt board crash upon startup

2023-09-07 Thread Laszlo Ersek

This is with QEMU v8.1.0-391-gc152379422a2.

I use the command line from (scroll to the bottom):

  https://github.com/tianocore/edk2/commit/49f06b664018

(with "-full-screen" removed).

The crash is as follows:

  Unexpected error in object_property_find_err() at 
../../src/upstream/qemu/qom/object.c:1314:
  qemu: Property 'qemu-fixed-text-console.device' not found
  Aborted (core dumped)

(Full backtrace attached.)

If I replace the "-device virtio-gpu-pci" option with "-nographic", then
there is no crash; QEMU launches fine and the guest starts booting fine.

I think this is a board-related problem; the riscv virt board code
likely does not set up the console properly.

Furthermore, I reckon this could be regression. When Sunil's above patch
was committed to edk2 (2023-06-23), the graphical guest window must have
worked still.

Laszlo#0  0x74ea154c in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x74e54d46 in raise () at /lib64/libc.so.6
#2  0x74e287f3 in abort () at /lib64/libc.so.6
#3  0x55fc4a75 in error_handle (errp=0x567897b8 , 
err=0x57aee860) at ../../src/upstream/qemu/util/error.c:41
#4  0x55fc4bf8 in error_setv (errp=0x567897b8 , 
src=0x56205068 "../../src/upstream/qemu/qom/object.c", line=1314, 
func=0x562058a0 <__func__.20> "object_property_find_err", 
err_class=ERROR_CLASS_GENERIC_ERROR, fmt=0x56205451 "Property '%s.%s' not 
found", ap=0x7fffce20, suffix=0x0) at 
../../src/upstream/qemu/util/error.c:82
err = 0x57aee860
saved_errno = 2
__PRETTY_FUNCTION__ = "error_setv"
#5  0x55fc4dcb in error_setg_internal (errp=0x567897b8 
, src=0x56205068 "../../src/upstream/qemu/qom/object.c", 
line=1314, func=0x562058a0 <__func__.20> "object_property_find_err", 
fmt=0x56205451 "Property '%s.%s' not found") at 
../../src/upstream/qemu/util/error.c:105
ap = {{gp_offset = 48, fp_offset = 48, overflow_arg_area = 
0x7fffcf08, reg_save_area = 0x7fffce40}}
#6  0x55dbd0ae in object_property_find_err (obj=0x569672a0, 
name=0x5608117d "device", errp=0x567897b8 ) at 
../../src/upstream/qemu/qom/object.c:1314
prop = 0x0
__func__ = "object_property_find_err"
#7  0x55dbd361 in object_property_get (obj=0x569672a0, 
name=0x5608117d "device", v=0x56ad05d0, errp=0x567897b8 
) at ../../src/upstream/qemu/qom/object.c:1389
err = 0x0
prop = 0x77ffd000 <_rtld_local>
__func__ = "object_property_get"
#8  0x55dc1a44 in object_property_get_qobject (obj=0x569672a0, 
name=0x5608117d "device", errp=0x567897b8 ) at 
../../src/upstream/qemu/qom/qom-qobject.c:40
ret = 0x0
v = 0x56ad05d0
#9  0x55dbd635 in object_property_get_str (obj=0x569672a0, 
name=0x5608117d "device", errp=0x567897b8 ) at 
../../src/upstream/qemu/qom/object.c:1437
ret = 0x7fffd080
qstring = 0x55dbdf5d 
retval = 0x57b253b0 "\305'\373\002PU"
__func__ = "object_property_get_str"
#10 0x55dbd7c0 in object_property_get_link (obj=0x569672a0, 
name=0x5608117d "device", errp=0x567897b8 ) at 
../../src/upstream/qemu/qom/object.c:1470
str = 0xf036ed7667bd9500 
target = 0x57b253b0
__func__ = "object_property_get_link"
#11 0x558892c1 in qemu_console_is_multihead (dev=0x57173090) at 
../../src/upstream/qemu/ui/console.c:2376
con = 0x569672a0
obj = 0x57173090
f = 0
h = 0
#12 0x558893a9 in qemu_console_get_label (con=0x56bf7c00) at 
../../src/upstream/qemu/ui/console.c:2402
dev = 0x57173090
multihead = false
c = 0x56bf7c00
#13 0x55ba5fdf in gd_vc_gfx_init (s=0x57a45450, vc=0x57a454f0, 
con=0x56bf7c00, idx=0, group=0x0, view_menu=0x57cea580) at 
../../src/upstream/qemu/ui/gtk.c:2130
zoom_to_fit = false
i = 21845
#14 0x55ba6828 in gd_create_menu_view (s=0x57a45450, 
opts=0x5675f560 ) at ../../src/upstream/qemu/ui/gtk.c:2296
group = 0x0
view_menu = 0x57cea580
separator = 0x57cee6f0
con = 0x56bf7c00
vc = 0
#15 0x55ba6a95 in gd_create_menus (s=0x57a45450, 
opts=0x5675f560 ) at ../../src/upstream/qemu/ui/gtk.c:2336
settings = 0x7fffd270
#16 0x55ba6ee4 in gtk_display_init (ds=0x5687c390, 
opts=0x5675f560 ) at ../../src/upstream/qemu/ui/gtk.c:2414
vc = 0x57a43ea0
s = 0x57a45450
window_display = 0x56a93120
theme = 0x56ae71b0
dir = 0x57b33290 ""
__PRETTY_FUNCTION__ = "gtk_display_init"
#17 0x55889f4a in qemu_display_init (ds=0x5687c390, 
opts=0x5675f560 ) at ../../src/upstream/qemu/ui/console.c:2686
__PRETTY_FUNCTION__ = "qemu_display_init"
#18 0x55b109f0 in

Re: [PATCH v2 1/7] vhost-user: strip superfluous whitespace

2023-09-06 Thread Laszlo Ersek

On 9/6/23 09:12, Albert Esteve wrote:
> 
> 
> On Thu, Aug 31, 2023 at 9:14 AM Laszlo Ersek  <mailto:ler...@redhat.com>> wrote:
> 
> On 8/30/23 15:40, Laszlo Ersek wrote:
> > Cc: "Michael S. Tsirkin" mailto:m...@redhat.com>>
> (supporter:vhost)
> > Cc: Eugenio Perez Martin  <mailto:epere...@redhat.com>>
> > Cc: German Maglione  <mailto:gmagli...@redhat.com>>
> > Cc: Liu Jiang  <mailto:ge...@linux.alibaba.com>>
> > Cc: Sergio Lopez Pascual mailto:s...@redhat.com>>
> > Cc: Stefano Garzarella  <mailto:sgarz...@redhat.com>>
> > Signed-off-by: Laszlo Ersek  <mailto:ler...@redhat.com>>
> > Reviewed-by: Stefano Garzarella  <mailto:sgarz...@redhat.com>>
> > ---
> >
> > Notes:
> >     v2:
> >     
> >     - pick up Stefano's R-b
> >
> >  hw/virtio/vhost-user.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> This has been
> 
> Reviewed-by: Philippe Mathieu-Daudé  <mailto:phi...@linaro.org>>
> 
> under the (identical) v1 posting:
> 
> 
> cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org">http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org 
> <cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org">http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org>
> 
> Thanks, Phil! (and sorry that I posted v2 too quickly -- I forgot that
> sometimes reviewers split a review over multiple days.)
> 
> Laszlo
> 
> >
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 8dcf049d422b..b4b677c1ce66 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev
> *dev, VhostUserMsg *msg,
> >       * operations such as configuring device memory mappings or
> issuing device
> >       * resets, which affect the whole device instead of
> individual VQs,
> >       * vhost-user messages should only be sent once.
> > -     *
> > +     *
> >       * Devices with multiple vhost_devs are given an associated
> dev->vq_index
> >       * so per_device requests are only sent if vq_index is 0.
> >       */
> >
> 
> 
> Thanks for the series!
> I had a timeout problem with a virtio device I am developing, and I was
> not sure yet what was going on.
> Your description of the problem seemed to fit mine, in my case the
> driver sent a command through the data plane
> right after the feature negotiation that reached the backend too soon.
> Adding delays alleviated the issue, so it
> already hinted me to a race condition.
> 
> I tested using this patch series and according to my experiments, it
> really lowers the chances to get the deadlock.
> Sadly, I do still get the issue sometimes, though (not frequently)...
> However, I think probably the solution comes not
> from the Qemu side, but from the rust-vmm vhost-user-backend crate. I am
> looking for that solution on my side.
> 
> But that does not invalidate this patch, which I think is a necessary
> improvement, and in my tests it really
> helps a lot with the described issue. Therefore:
> 
> Tested-by: Albert Esteve mailto:aest...@redhat.com>>

Thank you for the testing and the feedback!

My problem with relegating the fix to rust-vmm/vhost -- i.e. with
calling the hang a bug inside rust-vmm/vhost -- is that rust-vmm/vhost
is *unfixable* as long as the vhost-user specification says what it
says. As soon as the guest is permitted to issue a data plane operation,
without forcing it to await completion of an earlier control plane
operation, the cat is out of the bag. Those operations travel through
independent channels, and the vhost-user spec currently requires the
backend to listen to both channels at the same time. There's no way to
restore the original order after both operations have been submitted
effectively in parallel from the guest side.

The guest cannot limit itself, the virtio operations it needs to do (on
the control plane) are described in the virtio spec, in "driver
requirements" sections, and most of those steps are inherently
treated/specified as synchronous. The guest already thinks those steps
are synchronous.

So it really is a spec problem. I see the big problem in the switch from
vhost-net to the generic vhost-user protocol. My understanding from the
virtio-networking series in the RH Developer Blog is that vhost-net was
entirely ioctl() based, and consequently totally synchronous. That was
lost when the protocol was rebased to unix domain sockets, without
upholding the explicit request-response pattern in all operations.

I'm worried we can't get this unstuck until Michael answers Stefan's
question, concerning the spec.

Laszlo

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-09-05 Thread Laszlo Ersek

Michael,

On 8/30/23 17:37, Stefan Hajnoczi wrote:
> On Wed, 30 Aug 2023 at 09:30, Laszlo Ersek  wrote:
>>
>> On 8/30/23 14:10, Stefan Hajnoczi wrote:
>>> On Sun, 27 Aug 2023 at 14:31, Laszlo Ersek  wrote:
>>>>
>>>> (1) The virtio-1.0 specification
>>>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:
>>>>
>>>>> 3 General Initialization And Device Operation
>>>>> 3.1   Device Initialization
>>>>> 3.1.1 Driver Requirements: Device Initialization
>>>>>
>>>>> [...]
>>>>>
>>>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>>>the device, optional per-bus setup, reading and possibly writing the
>>>>>device’s virtio configuration space, and population of virtqueues.
>>>>>
>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>
>>>> and
>>>>
>>>>> 4 Virtio Transport Options
>>>>> 4.1   Virtio Over PCI Bus
>>>>> 4.1.4 Virtio Structure PCI Capabilities
>>>>> 4.1.4.3   Common configuration structure layout
>>>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>>>
>>>>> [...]
>>>>>
>>>>> The driver MUST configure the other virtqueue fields before enabling the
>>>>> virtqueue with queue_enable.
>>>>>
>>>>> [...]
>>>>
>>>> These together mean that the following sub-sequence of steps is valid for
>>>> a virtio-1.0 guest driver:
>>>>
>>>> (1.1) set "queue_enable" for the needed queues as the final part of device
>>>> initialization step (7),
>>>>
>>>> (1.2) set DRIVER_OK in step (8),
>>>>
>>>> (1.3) immediately start sending virtio requests to the device.
>>>>
>>>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>>>> special virtio feature is negotiated, then virtio rings start in disabled
>>>> state, according to
>>>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>>>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
>>>> enabling vrings.
>>>>
>>>> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
>>>> operation, which travels from the guest through QEMU to the vhost-user
>>>> backend, using a unix domain socket.
>>>>
>>>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>>>> evades QEMU -- it travels from guest to the vhost-user backend via
>>>> eventfd.
>>>>
>>>> This means that steps (1.1) and (1.3) travel through different channels,
>>>> and their relative order can be reversed, as perceived by the vhost-user
>>>> backend.
>>>>
>>>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
>>>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>>>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>>>> crate.)
>>>>
>>>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>>>> device initialization steps (i.e., control plane operations), and
>>>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>>>> operation). In the Rust-language virtiofsd, this creates a race between
>>>> two components that run *concurrently*, i.e., in different threads or
>>>> processes:
>>>>
>>>> - Control plane, handling vhost-user protocol messages:
>>>>
>>>>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>>>   [crates/vhost-user-backend/src/handler.rs] handles
>>>>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>>>>   flag according to the message processed.
>>>>
>>>> - Data plane, handling virtio / FUSE requests:
>>>>
>>>>   The "VringEpollHandler::handle_event" method
>>>>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>>>>   virtio / FUSE request, consuming the virtio kick at the same time. If
>>>>   the vring's "enabled" flag is set, the virtio / FUSE request is

Re: [PATCH v2 5/7] vhost-user: hoist "write_sync", "get_features", "get_u64"

2023-08-31 Thread Laszlo Ersek

On 8/30/23 15:40, Laszlo Ersek wrote:
> In order to avoid a forward-declaration for "vhost_user_write_sync" in a
> subsequent patch, hoist "vhost_user_write_sync" ->
> "vhost_user_get_features" -> "vhost_user_get_u64" just above
> "vhost_set_vring".
> 
> This is purely code movement -- no observable change.
> 
> Cc: "Michael S. Tsirkin"  (supporter:vhost)
> Cc: Eugenio Perez Martin 
> Cc: German Maglione 
> Cc: Liu Jiang 
> Cc: Sergio Lopez Pascual 
> Cc: Stefano Garzarella 
> Signed-off-by: Laszlo Ersek 
> Reviewed-by: Stefano Garzarella 
> ---
> 
> Notes:
> v2:
> 
> - pick up R-b from Stefano
> 
> - rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code and
>   commit message) [Stefano]
> 
>  hw/virtio/vhost-user.c | 170 ++--
>  1 file changed, 85 insertions(+), 85 deletions(-)

Phil reviewed v1:

98150923-39ef-7581-6144-8d0ad8d4dd52@linaro.org">http://mid.mail-archive.com/98150923-39ef-7581-6144-8d0ad8d4dd52@linaro.org

and I would've kept his R-b (similar to Stefano's) across the
vhost_user_write_msg->vhost_user_write_sync rename in v2; so I'm copying
it here:

Reviewed-by: Philippe Mathieu-Daudé 

Hope that's OK.

Laszlo

> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 4129ba72e408..c79b6f77cdca 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1083,6 +1083,91 @@ static int vhost_user_set_vring_endian(struct 
> vhost_dev *dev,
>  return vhost_user_write(dev, , NULL, 0);
>  }
>  
> +static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t 
> *u64)
> +{
> +int ret;
> +VhostUserMsg msg = {
> +.hdr.request = request,
> +.hdr.flags = VHOST_USER_VERSION,
> +};
> +
> +if (vhost_user_per_device_request(request) && dev->vq_index != 0) {
> +return 0;
> +}
> +
> +ret = vhost_user_write(dev, , NULL, 0);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +ret = vhost_user_read(dev, );
> +if (ret < 0) {
> +return ret;
> +}
> +
> +if (msg.hdr.request != request) {
> +error_report("Received unexpected msg type. Expected %d received %d",
> + request, msg.hdr.request);
> +return -EPROTO;
> +}
> +
> +if (msg.hdr.size != sizeof(msg.payload.u64)) {
> +error_report("Received bad msg size.");
> +return -EPROTO;
> +}
> +
> +*u64 = msg.payload.u64;
> +
> +return 0;
> +}
> +
> +static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
> +{
> +if (vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features) < 0) {
> +return -EPROTO;
> +}
> +
> +return 0;
> +}
> +
> +/* Note: "msg->hdr.flags" may be modified. */
> +static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
> + bool wait_for_reply)
> +{
> +int ret;
> +
> +if (wait_for_reply) {
> +bool reply_supported = virtio_has_feature(dev->protocol_features,
> +  VHOST_USER_PROTOCOL_F_REPLY_ACK);
> +if (reply_supported) {
> +msg->hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
> +}
> +}
> +
> +ret = vhost_user_write(dev, msg, NULL, 0);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +if (wait_for_reply) {
> +uint64_t dummy;
> +
> +if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
> +return process_message_reply(dev, msg);
> +}
> +
> +   /*
> +* We need to wait for a reply but the backend does not
> +* support replies for the command we just sent.
> +* Send VHOST_USER_GET_FEATURES which makes all backends
> +* send a reply.
> +*/
> +return vhost_user_get_features(dev, );
> +}
> +
> +return 0;
> +}
> +
>  static int vhost_set_vring(struct vhost_dev *dev,
> unsigned long int request,
> struct vhost_vring_state *ring)
> @@ -1255,91 +1340,6 @@ static int vhost_user_set_vring_err(struct vhost_dev 
> *dev,
>  return vhost_set_vring_file(dev, VHOST_USER_SET_VRING_ERR, file);
>  }
>  
> -static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t 
> *u64)
> -{
> -int ret;
> -VhostUserMsg msg = {
> -.hdr.request = request,
> -.hdr.flags = VHOST_USER_VERSION,
>

Re: [PATCH v2 2/7] vhost-user: tighten "reply_supported" scope in "set_vring_addr"

2023-08-31 Thread Laszlo Ersek

On 8/30/23 15:40, Laszlo Ersek wrote:
> In the vhost_user_set_vring_addr() function, we calculate
> "reply_supported" unconditionally, even though we'll only need it if
> "wait_for_reply" is also true.
> 
> Restrict the scope of "reply_supported" to the minimum.
> 
> This is purely refactoring -- no observable change.
> 
> Cc: "Michael S. Tsirkin"  (supporter:vhost)
> Cc: Eugenio Perez Martin 
> Cc: German Maglione 
> Cc: Liu Jiang 
> Cc: Sergio Lopez Pascual 
> Cc: Stefano Garzarella 
> Signed-off-by: Laszlo Ersek 
> Reviewed-by: Stefano Garzarella 
> ---
> 
> Notes:
> v2:
> 
> - pick up Stefano's R-b
> 
>  hw/virtio/vhost-user.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)

identical to v1, so:

Reviewed-by: Philippe Mathieu-Daudé 

from
<6c12069e-da31-9758-4972-7121ab5ffdee@linaro.org">http://mid.mail-archive.com/6c12069e-da31-9758-4972-7121ab5ffdee@linaro.org>.

Laszlo

> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index b4b677c1ce66..64eac317bfb2 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1331,17 +1331,18 @@ static int vhost_user_set_vring_addr(struct vhost_dev 
> *dev,
>  .hdr.size = sizeof(msg.payload.addr),
>  };
>  
> -bool reply_supported = virtio_has_feature(dev->protocol_features,
> -  
> VHOST_USER_PROTOCOL_F_REPLY_ACK);
> -
>  /*
>   * wait for a reply if logging is enabled to make sure
>   * backend is actually logging changes
>   */
>  bool wait_for_reply = addr->flags & (1 << VHOST_VRING_F_LOG);
>  
> -if (reply_supported && wait_for_reply) {
> -msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
> +if (wait_for_reply) {
> +bool reply_supported = virtio_has_feature(dev->protocol_features,
> +  VHOST_USER_PROTOCOL_F_REPLY_ACK);
> +if (reply_supported) {
> +msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
> +}
>  }
>  
>  ret = vhost_user_write(dev, , NULL, 0);
>

Re: [PATCH v2 1/7] vhost-user: strip superfluous whitespace

2023-08-31 Thread Laszlo Ersek

On 8/30/23 15:40, Laszlo Ersek wrote:
> Cc: "Michael S. Tsirkin"  (supporter:vhost)
> Cc: Eugenio Perez Martin 
> Cc: German Maglione 
> Cc: Liu Jiang 
> Cc: Sergio Lopez Pascual 
> Cc: Stefano Garzarella 
> Signed-off-by: Laszlo Ersek 
> Reviewed-by: Stefano Garzarella 
> ---
> 
> Notes:
> v2:
> 
> - pick up Stefano's R-b
> 
>  hw/virtio/vhost-user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This has been

Reviewed-by: Philippe Mathieu-Daudé 

under the (identical) v1 posting:

cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org">http://mid.mail-archive.com/cd0604a1-6826-ac6c-6c47-dcb6def64b28@linaro.org

Thanks, Phil! (and sorry that I posted v2 too quickly -- I forgot that
sometimes reviewers split a review over multiple days.)

Laszlo

> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 8dcf049d422b..b4b677c1ce66 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, 
> VhostUserMsg *msg,
>   * operations such as configuring device memory mappings or issuing 
> device
>   * resets, which affect the whole device instead of individual VQs,
>   * vhost-user messages should only be sent once.
> - * 
> + *
>   * Devices with multiple vhost_devs are given an associated dev->vq_index
>   * so per_device requests are only sent if vq_index is 0.
>   */
>

[PATCH v2 3/7] vhost-user: factor out "vhost_user_write_sync"

2023-08-30 Thread Laszlo Ersek

The tails of the "vhost_user_set_vring_addr" and "vhost_user_set_u64"
functions are now byte-for-byte identical. Factor the common tail out to a
new function called "vhost_user_write_sync".

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
---

Notes:
v2:

- pick up R-b's from Phil and Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code and
  commit message) [Stefano]

 hw/virtio/vhost-user.c | 66 +---
 1 file changed, 28 insertions(+), 38 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 64eac317bfb2..1476b36f0a6e 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1320,10 +1320,35 @@ static int enforce_reply(struct vhost_dev *dev,
 return vhost_user_get_features(dev, );
 }
 
+/* Note: "msg->hdr.flags" may be modified. */
+static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
+ bool wait_for_reply)
+{
+int ret;
+
+if (wait_for_reply) {
+bool reply_supported = virtio_has_feature(dev->protocol_features,
+  VHOST_USER_PROTOCOL_F_REPLY_ACK);
+if (reply_supported) {
+msg->hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+}
+}
+
+ret = vhost_user_write(dev, msg, NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+if (wait_for_reply) {
+return enforce_reply(dev, msg);
+}
+
+return 0;
+}
+
 static int vhost_user_set_vring_addr(struct vhost_dev *dev,
  struct vhost_vring_addr *addr)
 {
-int ret;
 VhostUserMsg msg = {
 .hdr.request = VHOST_USER_SET_VRING_ADDR,
 .hdr.flags = VHOST_USER_VERSION,
@@ -1337,24 +1362,7 @@ static int vhost_user_set_vring_addr(struct vhost_dev 
*dev,
  */
 bool wait_for_reply = addr->flags & (1 << VHOST_VRING_F_LOG);
 
-if (wait_for_reply) {
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
-if (reply_supported) {
-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
-}
-}
-
-ret = vhost_user_write(dev, , NULL, 0);
-if (ret < 0) {
-return ret;
-}
-
-if (wait_for_reply) {
-return enforce_reply(dev, );
-}
-
-return 0;
+return vhost_user_write_sync(dev, , wait_for_reply);
 }
 
 static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64,
@@ -1366,26 +1374,8 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int 
request, uint64_t u64,
 .payload.u64 = u64,
 .hdr.size = sizeof(msg.payload.u64),
 };
-int ret;
 
-if (wait_for_reply) {
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
-if (reply_supported) {
-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
-}
-}
-
-ret = vhost_user_write(dev, , NULL, 0);
-if (ret < 0) {
-return ret;
-}
-
-if (wait_for_reply) {
-return enforce_reply(dev, );
-}
-
-return 0;
+return vhost_user_write_sync(dev, , wait_for_reply);
 }
 
 static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status)

[PATCH v2 5/7] vhost-user: hoist "write_sync", "get_features", "get_u64"

2023-08-30 Thread Laszlo Ersek

In order to avoid a forward-declaration for "vhost_user_write_sync" in a
subsequent patch, hoist "vhost_user_write_sync" ->
"vhost_user_get_features" -> "vhost_user_get_u64" just above
"vhost_set_vring".

This is purely code movement -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
---

Notes:
v2:

- pick up R-b from Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code and
  commit message) [Stefano]

 hw/virtio/vhost-user.c | 170 ++--
 1 file changed, 85 insertions(+), 85 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 4129ba72e408..c79b6f77cdca 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1083,6 +1083,91 @@ static int vhost_user_set_vring_endian(struct vhost_dev 
*dev,
 return vhost_user_write(dev, , NULL, 0);
 }
 
+static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t 
*u64)
+{
+int ret;
+VhostUserMsg msg = {
+.hdr.request = request,
+.hdr.flags = VHOST_USER_VERSION,
+};
+
+if (vhost_user_per_device_request(request) && dev->vq_index != 0) {
+return 0;
+}
+
+ret = vhost_user_write(dev, , NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+ret = vhost_user_read(dev, );
+if (ret < 0) {
+return ret;
+}
+
+if (msg.hdr.request != request) {
+error_report("Received unexpected msg type. Expected %d received %d",
+ request, msg.hdr.request);
+return -EPROTO;
+}
+
+if (msg.hdr.size != sizeof(msg.payload.u64)) {
+error_report("Received bad msg size.");
+return -EPROTO;
+}
+
+*u64 = msg.payload.u64;
+
+return 0;
+}
+
+static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
+{
+if (vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features) < 0) {
+return -EPROTO;
+}
+
+return 0;
+}
+
+/* Note: "msg->hdr.flags" may be modified. */
+static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
+ bool wait_for_reply)
+{
+int ret;
+
+if (wait_for_reply) {
+bool reply_supported = virtio_has_feature(dev->protocol_features,
+  VHOST_USER_PROTOCOL_F_REPLY_ACK);
+if (reply_supported) {
+msg->hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+}
+}
+
+ret = vhost_user_write(dev, msg, NULL, 0);
+if (ret < 0) {
+return ret;
+}
+
+if (wait_for_reply) {
+uint64_t dummy;
+
+if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
+return process_message_reply(dev, msg);
+}
+
+   /*
+* We need to wait for a reply but the backend does not
+* support replies for the command we just sent.
+* Send VHOST_USER_GET_FEATURES which makes all backends
+* send a reply.
+*/
+return vhost_user_get_features(dev, );
+}
+
+return 0;
+}
+
 static int vhost_set_vring(struct vhost_dev *dev,
unsigned long int request,
struct vhost_vring_state *ring)
@@ -1255,91 +1340,6 @@ static int vhost_user_set_vring_err(struct vhost_dev 
*dev,
 return vhost_set_vring_file(dev, VHOST_USER_SET_VRING_ERR, file);
 }
 
-static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t 
*u64)
-{
-int ret;
-VhostUserMsg msg = {
-.hdr.request = request,
-.hdr.flags = VHOST_USER_VERSION,
-};
-
-if (vhost_user_per_device_request(request) && dev->vq_index != 0) {
-return 0;
-}
-
-ret = vhost_user_write(dev, , NULL, 0);
-if (ret < 0) {
-return ret;
-}
-
-ret = vhost_user_read(dev, );
-if (ret < 0) {
-return ret;
-}
-
-if (msg.hdr.request != request) {
-error_report("Received unexpected msg type. Expected %d received %d",
- request, msg.hdr.request);
-return -EPROTO;
-}
-
-if (msg.hdr.size != sizeof(msg.payload.u64)) {
-error_report("Received bad msg size.");
-return -EPROTO;
-}
-
-*u64 = msg.payload.u64;
-
-return 0;
-}
-
-static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
-{
-if (vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features) < 0) {
-return -EPROTO;
-}
-
-return 0;
-}
-
-/* Note: "msg->hdr.flags" may be modified. */
-static int vhost_user_write_sync(struct v

[PATCH v2 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Laszlo Ersek

USER_SET_VRING_ENABLE completion by:

- setting the NEED_REPLY flag on VHOST_USER_SET_VRING_ENABLE, and waiting
  for the reply, if the VHOST_USER_PROTOCOL_F_REPLY_ACK vhost-user feature
  has been negotiated, or

- performing a separate VHOST_USER_GET_FEATURES *exchange*, which requires
  a backend response regardless of VHOST_USER_PROTOCOL_F_REPLY_ACK.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
---

Notes:
v2:

- pick up R-b from Stefano

- update virtio spec reference from 1.0 to 1.2 (also keep the 1.0 ref)
  in the commit message; re-check the quotes / section headers [Stefano]

- summarize commit message in code comment [Stefano]

 hw/virtio/vhost-user.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 18e15a9bb359..41842eb023b5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1235,7 +1235,21 @@ static int vhost_user_set_vring_enable(struct vhost_dev 
*dev, int enable)
 .num   = enable,
 };
 
-ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , false);
+/*
+ * SET_VRING_ENABLE travels from guest to QEMU to vhost-user backend /
+ * control plane thread via unix domain socket. Virtio requests travel
+ * from guest to vhost-user backend / data plane thread via eventfd.
+ * Even if the guest enables the ring first, and pushes its first 
virtio
+ * request second (conforming to the virtio spec), the data plane 
thread
+ * in the backend may see the virtio request before the control plane
+ * thread sees the queue enablement. This causes (in fact, requires) 
the
+ * data plane thread to discard the virtio request (it arrived on a
+ * seemingly disabled queue). To prevent this out-of-order delivery,
+ * don't let the guest proceed to pushing the virtio request until the
+ * backend control plane acknowledges enabling the queue -- IOW, pass
+ * wait_for_reply=true below.
+ */
+ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , true);
 if (ret < 0) {
 /*
  * Restoring the previous state is likely infeasible, as well as

[PATCH v2 6/7] vhost-user: allow "vhost_set_vring" to wait for a reply

2023-08-30 Thread Laszlo Ersek

The "vhost_set_vring" function already centralizes the common parts of
"vhost_user_set_vring_num", "vhost_user_set_vring_base" and
"vhost_user_set_vring_enable". We'll want to allow some of those callers
to wait for a reply.

Therefore, rebase "vhost_set_vring" from just "vhost_user_write" to
"vhost_user_write_sync", exposing the "wait_for_reply" parameter.

This is purely refactoring -- there is no observable change. That's
because:

- all three callers pass in "false" for "wait_for_reply", which disables
  all logic in "vhost_user_write_sync" except the call to
  "vhost_user_write";

- the fds=NULL and fd_num=0 arguments of the original "vhost_user_write"
  call inside "vhost_set_vring" are hard-coded within
  "vhost_user_write_sync".

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
---

Notes:
v2:

- pick up R-b's from Phil and Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code and
  commit message) [Stefano]

 hw/virtio/vhost-user.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index c79b6f77cdca..18e15a9bb359 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1170,7 +1170,8 @@ static int vhost_user_write_sync(struct vhost_dev *dev, 
VhostUserMsg *msg,
 
 static int vhost_set_vring(struct vhost_dev *dev,
unsigned long int request,
-   struct vhost_vring_state *ring)
+   struct vhost_vring_state *ring,
+   bool wait_for_reply)
 {
 VhostUserMsg msg = {
 .hdr.request = request,
@@ -1179,13 +1180,13 @@ static int vhost_set_vring(struct vhost_dev *dev,
 .hdr.size = sizeof(msg.payload.state),
 };
 
-return vhost_user_write(dev, , NULL, 0);
+return vhost_user_write_sync(dev, , wait_for_reply);
 }
 
 static int vhost_user_set_vring_num(struct vhost_dev *dev,
 struct vhost_vring_state *ring)
 {
-return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
+return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring, false);
 }
 
 static void vhost_user_host_notifier_free(VhostUserHostNotifier *n)
@@ -1216,7 +1217,7 @@ static void 
vhost_user_host_notifier_remove(VhostUserHostNotifier *n,
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
  struct vhost_vring_state *ring)
 {
-return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
+return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring, false);
 }
 
 static int vhost_user_set_vring_enable(struct vhost_dev *dev, int enable)
@@ -1234,7 +1235,7 @@ static int vhost_user_set_vring_enable(struct vhost_dev 
*dev, int enable)
 .num   = enable,
 };
 
-ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, );
+ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , false);
 if (ret < 0) {
 /*
  * Restoring the previous state is likely infeasible, as well as

[PATCH v2 1/7] vhost-user: strip superfluous whitespace

2023-08-30 Thread Laszlo Ersek

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
---

Notes:
v2:

- pick up Stefano's R-b

 hw/virtio/vhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d422b..b4b677c1ce66 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -398,7 +398,7 @@ static int vhost_user_write(struct vhost_dev *dev, 
VhostUserMsg *msg,
  * operations such as configuring device memory mappings or issuing device
  * resets, which affect the whole device instead of individual VQs,
  * vhost-user messages should only be sent once.
- * 
+ *
  * Devices with multiple vhost_devs are given an associated dev->vq_index
  * so per_device requests are only sent if vq_index is 0.
  */

[PATCH v2 0/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Laszlo Ersek

v1:

- 20230827182937.146450-1-lersek@redhat.com">http://mid.mail-archive.com/20230827182937.146450-1-lersek@redhat.com
- 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20230827182937.146450-1-ler...@redhat.com/

v2 picks up tags from Phil and Stefano, and addresses feedback from
Stefano. Please see the Notes section on each patch, for the v2 changes.

I've not CC'd the stable list, as we've not figured out what prior
releases to target. Applying the series to 8.1 is easy; to 8.0 -- not so
much.

Retested.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 

Thanks,
Laszlo

Laszlo Ersek (7):
  vhost-user: strip superfluous whitespace
  vhost-user: tighten "reply_supported" scope in "set_vring_addr"
  vhost-user: factor out "vhost_user_write_sync"
  vhost-user: flatten "enforce_reply" into "vhost_user_write_sync"
  vhost-user: hoist "write_sync", "get_features", "get_u64"
  vhost-user: allow "vhost_set_vring" to wait for a reply
  vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

 hw/virtio/vhost-user.c | 216 ++--
 1 file changed, 108 insertions(+), 108 deletions(-)


base-commit: 813bac3d8d70d85cb7835f7945eb9eed84c2d8d0

[PATCH v2 2/7] vhost-user: tighten "reply_supported" scope in "set_vring_addr"

2023-08-30 Thread Laszlo Ersek

In the vhost_user_set_vring_addr() function, we calculate
"reply_supported" unconditionally, even though we'll only need it if
"wait_for_reply" is also true.

Restrict the scope of "reply_supported" to the minimum.

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Stefano Garzarella 
---

Notes:
v2:

- pick up Stefano's R-b

 hw/virtio/vhost-user.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b4b677c1ce66..64eac317bfb2 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1331,17 +1331,18 @@ static int vhost_user_set_vring_addr(struct vhost_dev 
*dev,
 .hdr.size = sizeof(msg.payload.addr),
 };
 
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
-
 /*
  * wait for a reply if logging is enabled to make sure
  * backend is actually logging changes
  */
 bool wait_for_reply = addr->flags & (1 << VHOST_VRING_F_LOG);
 
-if (reply_supported && wait_for_reply) {
-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+if (wait_for_reply) {
+bool reply_supported = virtio_has_feature(dev->protocol_features,
+  VHOST_USER_PROTOCOL_F_REPLY_ACK);
+if (reply_supported) {
+msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+}
 }
 
 ret = vhost_user_write(dev, , NULL, 0);

[PATCH v2 4/7] vhost-user: flatten "enforce_reply" into "vhost_user_write_sync"

2023-08-30 Thread Laszlo Ersek

At this point, only "vhost_user_write_sync" calls "enforce_reply"; embed
the latter into the former.

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
---

Notes:
v2:

- pick up R-b's from Phil and Stefano

- rename "vhost_user_write_msg" to "vhost_user_write_sync" (in code
  context and commit message) [Stefano]

 hw/virtio/vhost-user.c | 32 
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1476b36f0a6e..4129ba72e408 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1302,24 +1302,6 @@ static int vhost_user_get_features(struct vhost_dev 
*dev, uint64_t *features)
 return 0;
 }
 
-static int enforce_reply(struct vhost_dev *dev,
- const VhostUserMsg *msg)
-{
-uint64_t dummy;
-
-if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
-return process_message_reply(dev, msg);
-}
-
-   /*
-* We need to wait for a reply but the backend does not
-* support replies for the command we just sent.
-* Send VHOST_USER_GET_FEATURES which makes all backends
-* send a reply.
-*/
-return vhost_user_get_features(dev, );
-}
-
 /* Note: "msg->hdr.flags" may be modified. */
 static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
  bool wait_for_reply)
@@ -1340,7 +1322,19 @@ static int vhost_user_write_sync(struct vhost_dev *dev, 
VhostUserMsg *msg,
 }
 
 if (wait_for_reply) {
-return enforce_reply(dev, msg);
+uint64_t dummy;
+
+if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
+return process_message_reply(dev, msg);
+}
+
+   /*
+* We need to wait for a reply but the backend does not
+* support replies for the command we just sent.
+* Send VHOST_USER_GET_FEATURES which makes all backends
+* send a reply.
+*/
+return vhost_user_get_features(dev, );
 }
 
 return 0;

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Laszlo Ersek

On 8/30/23 14:10, Stefan Hajnoczi wrote:
> On Sun, 27 Aug 2023 at 14:31, Laszlo Ersek  wrote:
>>
>> (1) The virtio-1.0 specification
>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:
>>
>>> 3 General Initialization And Device Operation
>>> 3.1   Device Initialization
>>> 3.1.1 Driver Requirements: Device Initialization
>>>
>>> [...]
>>>
>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>the device, optional per-bus setup, reading and possibly writing the
>>>device’s virtio configuration space, and population of virtqueues.
>>>
>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>
>> and
>>
>>> 4 Virtio Transport Options
>>> 4.1   Virtio Over PCI Bus
>>> 4.1.4 Virtio Structure PCI Capabilities
>>> 4.1.4.3   Common configuration structure layout
>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>
>>> [...]
>>>
>>> The driver MUST configure the other virtqueue fields before enabling the
>>> virtqueue with queue_enable.
>>>
>>> [...]
>>
>> These together mean that the following sub-sequence of steps is valid for
>> a virtio-1.0 guest driver:
>>
>> (1.1) set "queue_enable" for the needed queues as the final part of device
>> initialization step (7),
>>
>> (1.2) set DRIVER_OK in step (8),
>>
>> (1.3) immediately start sending virtio requests to the device.
>>
>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>> special virtio feature is negotiated, then virtio rings start in disabled
>> state, according to
>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
>> enabling vrings.
>>
>> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
>> operation, which travels from the guest through QEMU to the vhost-user
>> backend, using a unix domain socket.
>>
>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>> evades QEMU -- it travels from guest to the vhost-user backend via
>> eventfd.
>>
>> This means that steps (1.1) and (1.3) travel through different channels,
>> and their relative order can be reversed, as perceived by the vhost-user
>> backend.
>>
>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>> crate.)
>>
>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>> device initialization steps (i.e., control plane operations), and
>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>> operation). In the Rust-language virtiofsd, this creates a race between
>> two components that run *concurrently*, i.e., in different threads or
>> processes:
>>
>> - Control plane, handling vhost-user protocol messages:
>>
>>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>   [crates/vhost-user-backend/src/handler.rs] handles
>>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>>   flag according to the message processed.
>>
>> - Data plane, handling virtio / FUSE requests:
>>
>>   The "VringEpollHandler::handle_event" method
>>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>>   virtio / FUSE request, consuming the virtio kick at the same time. If
>>   the vring's "enabled" flag is set, the virtio / FUSE request is
>>   processed genuinely. If the vring's "enabled" flag is clear, then the
>>   virtio / FUSE request is discarded.
> 
> Why is virtiofsd monitoring the virtqueue and discarding requests
> while it's disabled?

That's what the vhost-user spec requires:

https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states

"""
started but disabled: the back-end must process the ring without causing
any side effects. For example, for a networking device, in the disabled
state the back-end must not supply any new RX packets, but must process
and discard any TX packets.
"""

This state is different from "stopped", where "the back-end must not
process the ring at all".

The spec also says,

"""
If VHOST_USER_F_

Re: [PATCH 3/7] vhost-user: factor out "vhost_user_write_msg"

2023-08-30 Thread Laszlo Ersek

On 8/30/23 11:14, Laszlo Ersek wrote:
> On 8/30/23 10:31, Stefano Garzarella wrote:
>> On Sun, Aug 27, 2023 at 08:29:33PM +0200, Laszlo Ersek wrote:
>>> The tails of the "vhost_user_set_vring_addr" and "vhost_user_set_u64"
>>> functions are now byte-for-byte identical. Factor the common tail out
>>> to a
>>> new function called "vhost_user_write_msg".
>>>
>>> This is purely refactoring -- no observable change.
>>>
>>> Cc: "Michael S. Tsirkin"  (supporter:vhost)
>>> Cc: Eugenio Perez Martin 
>>> Cc: German Maglione 
>>> Cc: Liu Jiang 
>>> Cc: Sergio Lopez Pascual 
>>> Cc: Stefano Garzarella 
>>> Signed-off-by: Laszlo Ersek 
>>> ---
>>> hw/virtio/vhost-user.c | 66 +---
>>> 1 file changed, 28 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index 64eac317bfb2..36f99b66a644 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -1320,10 +1320,35 @@ static int enforce_reply(struct vhost_dev *dev,
>>>     return vhost_user_get_features(dev, );
>>> }
>>>
>>> +/* Note: "msg->hdr.flags" may be modified. */
>>> +static int vhost_user_write_msg(struct vhost_dev *dev, VhostUserMsg
>>> *msg,
>>> +    bool wait_for_reply)
>>
>> The difference between vhost_user_write() and vhost_user_write_msg() is
>> not immediately obvious from the function name, so I would propose
>> something different, like vhost_user_write_sync() or
>> vhost_user_write_wait().
> 
> I'm mostly OK with either variant; I think I may have thought of _sync
> myself, but didn't like it because the wait would be *optional*,
> dependent on caller choice. And I didn't like
> vhost_user_write_maybe_wait() either; that one seemed awkward / too verbose.
> 
> Let's see what others prefer. :)

... I went with vhost_user_write_sync.

> 
>>
>> Anyway, I'm not good with names and don't have a strong opinion, so this
>> version is fine with me as well :-)
>>
>> Reviewed-by: Stefano Garzarella 
>>
> 
> Thanks!

Re: [PATCH 0/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Laszlo Ersek

On 8/30/23 10:48, Stefano Garzarella wrote:
> On Sun, Aug 27, 2023 at 08:29:30PM +0200, Laszlo Ersek wrote:
>> The last patch in the series states and fixes the problem; prior patches
>> only refactor the code.
> 
> Thanks for the fix and great cleanup!
> 
> I fully reviewed the series and LGTM.
> 
> An additional step that we can take (not in this series) crossed my
> mind, though. In some places we repeat the following pattern:
> 
>     vhost_user_write(dev, , NULL, 0);
>     ...
> 
>     if (reply_supported) {
>     return process_message_reply(dev, );
>     }
> 
> So what about extending the vhost_user_write_msg() added in this series
> to support also this cases and remove some code.
> Or maybe integrate vhost_user_write_msg() in vhost_user_write().

Good idea, I'd just like someone else to do it -- and as you say, after
this series :)

This series is relatively packed with "thought" already (in the last
patch), plus a week ago I knew absolutely nothing about vhost /
vhost-user. (And, I read the whole blog series at
<https://www.redhat.com/en/virtio-networking-series> in 1-2 days, while
analyzing this issue, to understand the design of vhost.)

So I'd prefer keeping my first contribution in this area limited -- what
you are suggesting touches on some of the requests that require genuine
responses, and I didn't want to fiddle with those.

(I think your patch should be fine BTW!)

Laszlo

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Laszlo Ersek

On 8/30/23 10:39, Stefano Garzarella wrote:
> On Sun, Aug 27, 2023 at 08:29:37PM +0200, Laszlo Ersek wrote:
>> (1) The virtio-1.0 specification
>> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:
> 
> What about referring the latest spec available now (1.2)?

I didn't want to do that because the OVMF guest driver was written
against 1.0 (and the spec and the device are backwards compatible).

But, I don't feel strongly about this; I'm OK updating the reference /
quote to 1.2.

> 
>>
>>> 3 General Initialization And Device Operation
>>> 3.1   Device Initialization
>>> 3.1.1 Driver Requirements: Device Initialization
>>>
>>> [...]
>>>
>>> 7. Perform device-specific setup, including discovery of virtqueues for
>>>    the device, optional per-bus setup, reading and possibly writing the
>>>    device’s virtio configuration space, and population of virtqueues.
>>>
>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>
>> and
>>
>>> 4 Virtio Transport Options
>>> 4.1   Virtio Over PCI Bus
>>> 4.1.4 Virtio Structure PCI Capabilities
>>> 4.1.4.3   Common configuration structure layout
>>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
>>>
>>> [...]
>>>
>>> The driver MUST configure the other virtqueue fields before enabling the
>>> virtqueue with queue_enable.
>>>
>>> [...]
>>
>> These together mean that the following sub-sequence of steps is valid for
>> a virtio-1.0 guest driver:
>>
>> (1.1) set "queue_enable" for the needed queues as the final part of
>> device
>> initialization step (7),
>>
>> (1.2) set DRIVER_OK in step (8),
>>
>> (1.3) immediately start sending virtio requests to the device.
>>
>> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
>> special virtio feature is negotiated, then virtio rings start in disabled
>> state, according to
>> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
>> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed
>> for
>> enabling vrings.
>>
>> Therefore setting "queue_enable" from the guest (1.1) is a *control
>> plane*
>> operation, which travels from the guest through QEMU to the vhost-user
>> backend, using a unix domain socket.
>>
>> Whereas sending a virtio request (1.3) is a *data plane* operation, which
>> evades QEMU -- it travels from guest to the vhost-user backend via
>> eventfd.
>>
>> This means that steps (1.1) and (1.3) travel through different channels,
>> and their relative order can be reversed, as perceived by the vhost-user
>> backend.
>>
>> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe)
>> runs
>> against the Rust-language virtiofsd version 1.7.2. (Which uses version
>> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
>> crate.)
>>
>> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
>> device initialization steps (i.e., control plane operations), and
>> immediately sends a FUSE_INIT request too (i.e., performs a data plane
>> operation). In the Rust-language virtiofsd, this creates a race between
>> two components that run *concurrently*, i.e., in different threads or
>> processes:
>>
>> - Control plane, handling vhost-user protocol messages:
>>
>>  The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>>  [crates/vhost-user-backend/src/handler.rs] handles
>>  VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>>  flag according to the message processed.
>>
>> - Data plane, handling virtio / FUSE requests:
>>
>>  The "VringEpollHandler::handle_event" method
>>  [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>>  virtio / FUSE request, consuming the virtio kick at the same time. If
>>  the vring's "enabled" flag is set, the virtio / FUSE request is
>>  processed genuinely. If the vring's "enabled" flag is clear, then the
>>  virtio / FUSE request is discarded.
>>
>> Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
>> However, if the data plane processor in virtiofsd wins the race, then it
>> sees the FUSE_INIT *before* the control plane processor took notice of
>> VHOST_USER_SET_VRING_ENABLE and green-lit the queue for the data plane
>> p

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4203 matches

Mail list logo