date:20240128

[PATCH v2] virtio-gpu: Correct virgl_renderer_resource_get_info() error check

2024-01-28 Thread Dmitry Osipenko

virgl_renderer_resource_get_info() returns errno and not -1 on error.
Correct the return-value check.

Reviewed-by: Marc-André Lureau 
Signed-off-by: Dmitry Osipenko 
---

v2: - Fixed similar incorrect error-checking in vhost-user-gpu
- Added r-b from Marc

 contrib/vhost-user-gpu/virgl.c | 6 +++---
 hw/display/virtio-gpu-virgl.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/vhost-user-gpu/virgl.c b/contrib/vhost-user-gpu/virgl.c
index d1ccdf7d0668..51da0e3667f9 100644
--- a/contrib/vhost-user-gpu/virgl.c
+++ b/contrib/vhost-user-gpu/virgl.c
@@ -327,7 +327,7 @@ virgl_get_resource_info_modifiers(uint32_t resource_id,
 #ifdef VIRGL_RENDERER_RESOURCE_INFO_EXT_VERSION
 struct virgl_renderer_resource_info_ext info_ext;
 ret = virgl_renderer_resource_get_info_ext(resource_id, _ext);
-if (ret < 0) {
+if (ret) {
 return ret;
 }
 
@@ -335,7 +335,7 @@ virgl_get_resource_info_modifiers(uint32_t resource_id,
 *modifiers = info_ext.modifiers;
 #else
 ret = virgl_renderer_resource_get_info(resource_id, info);
-if (ret < 0) {
+if (ret) {
 return ret;
 }
 
@@ -372,7 +372,7 @@ virgl_cmd_set_scanout(VuGpu *g,
 uint64_t modifiers = 0;
 ret = virgl_get_resource_info_modifiers(ss.resource_id, ,
 );
-if (ret == -1) {
+if (ret) {
 g_critical("%s: illegal resource specified %d\n",
__func__, ss.resource_id);
 cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 8bb7a2c21fe7..9f34d0e6619c 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -181,7 +181,7 @@ static void virgl_cmd_set_scanout(VirtIOGPU *g,
 memset(, 0, sizeof(info));
 ret = virgl_renderer_resource_get_info(ss.resource_id, );
 #endif
-if (ret == -1) {
+if (ret) {
 qemu_log_mask(LOG_GUEST_ERROR,
   "%s: illegal resource specified %d\n",
   __func__, ss.resource_id);
-- 
2.43.0

Re: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work

2024-01-28 Thread Peter Xu

On Mon, Jan 29, 2024 at 01:41:01AM +, Liu, Yuan1 wrote:
> Because this change has an impact on the previous live migration 
> With IAA Patch, does the submission of the next version needs 
> to be submitted based on this change?

I'd say hold off a little while until we're more certain on the planned
interface changes, to avoid you rebase your code back and forth; unless
you're pretty confident that this will be the right approach.

I apologize on not having looked at any of the QAT/IAA compression / zero
detection series posted on the list; I do plan to read them very soon too
after Fabiano.  So I may not have a complete full picture here yet, please
bare with me.

If this series is trying to provide a base ground for all the efforts,
it'll be great if we can thoroughly discuss here and settle an approach
soon that will satisfy everyone.

Thanks,

-- 
Peter Xu

Re: [PATCH v1] virtio-gpu: Correct virgl_renderer_resource_get_info() error check

2024-01-28 Thread Marc-André Lureau

On Sun, Jan 28, 2024 at 2:10 AM Dmitry Osipenko
 wrote:
>
> virgl_renderer_resource_get_info() returns errno and not -1 on error.
> Correct the return-value check.
>
> Signed-off-by: Dmitry Osipenko 

Reviewed-by: Marc-André Lureau 

Can you also correct the code in vhost-user-gpu ?

> ---
>  hw/display/virtio-gpu-virgl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> index 8bb7a2c21fe7..9f34d0e6619c 100644
> --- a/hw/display/virtio-gpu-virgl.c
> +++ b/hw/display/virtio-gpu-virgl.c
> @@ -181,7 +181,7 @@ static void virgl_cmd_set_scanout(VirtIOGPU *g,
>  memset(, 0, sizeof(info));
>  ret = virgl_renderer_resource_get_info(ss.resource_id, );
>  #endif
> -if (ret == -1) {
> +if (ret) {
>  qemu_log_mask(LOG_GUEST_ERROR,
>"%s: illegal resource specified %d\n",
>__func__, ss.resource_id);
> --
> 2.43.0
>

Re: [PATCH 1/5] migration/multifd: Separate compression ops from non-compression

2024-01-28 Thread Peter Xu

On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> +static MultiFDMethods multifd_socket_ops = {
> +.send_setup = multifd_socket_send_setup,
> +.send_cleanup = multifd_socket_send_cleanup,
> +.send_prepare = multifd_socket_send_prepare,

Here it's named with "socket", however not all socket-based multifd
migrations will go into this route, e.g., when zstd compression enabled it
will not go via this route, even if zstd also uses sockets as transport.
>From that pov, this may be slightly confusing.  Maybe it suites more to be
called "socket_plain" / "socket_no_comp"?

One step back, I had a feeling that the current proposal tried to provide a
single ->ops to cover a model where we may need more than one layer of
abstraction.

Since it might be helpful to allow multifd send arbitrary data (e.g. for
VFIO?  Avihai might have an answer there..), I'll try to even consider that
into the picture.

Let's consider the ultimate goal of multifd, where the simplest model could
look like this in my mind (I'm only discussing sender side, but it'll be
similar on recv side):

   prepare()   send()
  Input   > IOVs > iochannels

[I used prepare/send, but please think them as generic terms, not 100%
 aligned with what we have with existing multifd_ops, or what you proposed
 later]

Here what are sure, IMHO, is:

  - We always can have some input data to dump; I didn't use "guest pages"
just to say we may allow arbitrary data.  For any multifd user that
would like to dump arbitrary data, they can already provide IOVs, so
here input can be either "MultiFDPages_t" or "IOVs".

  - We may always want to have IOVs to represent the buffers at some point,
no matter what the input it

  - We always flush the IOVs to iochannels; basically I want to say we can
always assume the last layer is connecting to QIOChannel APIs, while I
don't think there's outliers here so far, even if the send() may differ.

Then _maybe_ it's clearer that we can have two layers of OPs?

  - prepare(): it tells how the "input" will be converted into a scatter
gatter list of buffers.  All compression methods fall into this afaiu.
This has _nothing_ to do on how the buffers will be sent.  For
arbitrary-typed input, this can already be a no-op since the IOVs
provided can already be passed over to send().

  - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
only useful for fixed-ram migrations.

Would this be clearer, rather than keep using a single multifd_ops?

-- 
Peter Xu

Re: [PATCH] hw/scsi/lsi53c895a: add missing decrement of reentrancy counter

2024-01-28 Thread Michael Tokarev


28.01.2024 23:22, Sven Schnelle :

When the maximum count of SCRIPTS instructions is reached, the code
stops execution and returns, but fails to decrement the reentrancy
counter. This effectively renders the SCSI controller unusable
because on next entry the reentrancy counter is still above the limit.

This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS
loops.


Cc: qemu-stable@

/mjt

Re: [PATCH 2/2] hw/smbios: Fix port connector option validation

2024-01-28 Thread Michael Tokarev


28.01.2024 10:15, Akihiko Odaki:

qemu_smbios_type8_opts did not the list terminator and that resulted in
out-of-bound memory access. qemu_smbios_type8_opts also needs to have
an element for the type option.


With the same description fix as in 1/1,

Reviewed-by: Michael Tokarev 

/mjt

Re: [PATCH] hw/scsi/lsi53c895a: add missing decrement of reentrancy counter

2024-01-28 Thread Sven Schnelle

Thomas Huth  writes:

> On 28/01/2024 21.22, Sven Schnelle wrote:
>> When the maximum count of SCRIPTS instructions is reached, the code
>> stops execution and returns, but fails to decrement the reentrancy
>> counter. This effectively renders the SCSI controller unusable
>> because on next entry the reentrancy counter is still above the limit.
>> This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS
>> loops.
>
> Out of curiosity: What happened there before we introduced the
> reentrancy_level fix? Did it end up in an endless loop, or was it
> finishing at one point? In the latter case, we might need to adjust
> the "reentrancy_level > 8" to allow deeper nesting.

Without the reentrancy counter it was triggering the insn_processed
limit. The HP-UX scsi driver seems to spin on some memory value during
some SCSI writes (CDB with command 0x2a). So it is spinning in an
endless loop until the insn_processed counter will trigger the exit.
In HP-UX you will see a SCSI command timeout error in the kernel log
- at least i'm assuming that's related, but can't say for sure as
there's no kernel source available.

Re: [PATCH 1/2] hw/smbios: Fix OEM strings table option validation

2024-01-28 Thread Michael Tokarev


28.01.2024 10:15, Akihiko Odaki:

qemu_smbios_type11_opts did not the list terminator and that resulted in


..did not *have* the list terminator.., here and in 2/2.


out-of-bound memory access. qemu_smbios_type11_opts also needs to have
an element for the type option.

Fixes: 2d6dcbf93fb0 ("smbios: support setting OEM strings table")


Wow.  That's long ago..

This is a -stable material.
And since it's exactly the same 2 problems in 2 nearby places, it can
be combined into a single patch, but it definitely works this way too,
just a question of taste.

Reviewed-by: Michael Tokarev 


Signed-off-by: Akihiko Odaki 
---
  hw/smbios/smbios.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 2a90601ac5d9..522ed1ed9fe3 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -369,6 +369,11 @@ static const QemuOptDesc qemu_smbios_type8_opts[] = {
  };
  
  static const QemuOptDesc qemu_smbios_type11_opts[] = {

+{
+.name = "type",
+.type = QEMU_OPT_NUMBER,
+.help = "SMBIOS element type",
+},
  {
  .name = "value",
  .type = QEMU_OPT_STRING,
@@ -379,6 +384,7 @@ static const QemuOptDesc qemu_smbios_type11_opts[] = {
  .type = QEMU_OPT_STRING,
  .help = "OEM string data from file",
  },
+{ /* end of list */ }
  };
  
  static const QemuOptDesc qemu_smbios_type17_opts[] = {

Re: [PATCH 0/8] tests/unit/test-smp-parse.c: Add more CPU topology test cases

2024-01-28 Thread Song, Xiaoling


I tested the unit test cases with "make check" and result looks good.

Tested-by: Xiaoling Song 

Regards,

Xiaoling

On 1/18/2024 10:48 PM, Zhao Liu wrote:

From: Zhao Liu

Hi list,

Currently, test-smp-parse lacks the following cases:
* The case to cover drawer and book parameters parsing in -smp.
* The case to cover the full topology (with total 7 levels) to ensure
   that the topology-related calculations are correct.
* The case to check smp_props.has_clusters of MachineClass.

Thus, add the above cases to improve test coverage.

In addition, people is trying to bump max_cpus to 4096 for PC machine
[1]. Without considering other changes, it's only a matter of time
before the maximum CPUs is raised. Therefore, aslo bump max_cpus to 4096
in -smp related test cases as a start.

[1]:https://lore.kernel.org/qemu-devel/20231208122611.32311-1-anisi...@redhat.com/

Regards,
Zhao

---
Zhao Liu (8):
   tests/unit/test-smp-parse.c: Use CPU number macros in invalid topology
 case
   tests/unit/test-smp-parse.c: Bump max_cpus to 4096
   tests/unit/test-smp-parse.c: Make test cases aware of the book/drawer
   tests/unit/test-smp-parse.c: Test "books" parameter in -smp
   tests/unit/test-smp-parse.c: Test "drawers" parameter in -smp
   tests/unit/test-smp-parse.c: Test "drawers" and "books" combination
 case
   tests/unit/test-smp-parse.c: Test the full 7-levels topology hierarchy
   tests/unit/test-smp-parse.c: Test smp_props.has_clusters

  tests/unit/test-smp-parse.c | 515 ++--
  1 file changed, 494 insertions(+), 21 deletions(-)

[PATCH] pci-host: designware: Limit value range of iATU viewport register

2024-01-28 Thread Guenter Roeck

The latest version of qemu (v8.2.0-869-g7a1dc45af5) crashes when booting
the mcimx7d-sabre emulation with Linux v5.11 and later.

qemu-system-arm: ../system/memory.c:2750: memory_region_set_alias_offset: 
Assertion `mr->alias' failed.

Problem is that the Designware PCIe emulation accepts the full value range
for the iATU Viewport Register. However, both hardware and emulation only
support four inbound and four outbound viewports.

The Linux kernel determines the number of supported viewports by writing
0xff into the viewport register and reading the value back. The expected
value when reading the register is the highest supported viewport index.
Match that code by masking the supported viewport value range when the
register is written. With this change, the Linux kernel reports

imx6q-pcie 3380.pcie: iATU: unroll F, 4 ob, 4 ib, align 0K, limit 4G

as expected and supported.

Fixes: d64e5eabc4c7 ("pci: Add support for Designware IP block")
Cc: Andrey Smirnov 
Cc: Nikita Ostrenkov 
Signed-off-by: Guenter Roeck 
---
 hw/pci-host/designware.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
index dd9e389c07..c25d50f1c6 100644
--- a/hw/pci-host/designware.c
+++ b/hw/pci-host/designware.c
@@ -340,6 +340,8 @@ static void designware_pcie_root_config_write(PCIDevice *d, 
uint32_t address,
 break;
 
 case DESIGNWARE_PCIE_ATU_VIEWPORT:
+val &= DESIGNWARE_PCIE_ATU_REGION_INBOUND |
+(DESIGNWARE_PCIE_NUM_VIEWPORTS - 1);
 root->atu_viewport = val;
 break;
 
-- 
2.39.2

Re: [PATCH v2] hw/arm: add PCIe to Freescale i.MX6

2024-01-28 Thread Guenter Roeck

On Sat, Jan 27, 2024 at 11:11:58AM -0800, Guenter Roeck wrote:
> Hi,
> 
> On Mon, Jan 08, 2024 at 02:03:25PM +, Nikita Ostrenkov wrote:
> > Signed-off-by: Nikita Ostrenkov 
> > ---
> 
> This patch, with the "sabrelite" emulation and the Linux upstream kernel
> (v6.8-rc1, using imx_v6_v7_defconfig), results in:
> 
> qemu-system-arm: ../system/memory.c:2750: memory_region_set_alias_offset: 
> Assertion `mr->alias' failed.
> 
> with the backtrace below. Any idea what might be wrong ?
> 
Never mind. I found the problem. I'll send a patch.

Guenter

Re: [RESEND v2 1/2] hw/arm/virt-acpi-build.c: Migrate SPCR creation to common location

2024-01-28 Thread Sunil V L

Hi Jee Heng,

On Sun, Jan 28, 2024 at 06:14:39PM -0800, Sia Jee Heng wrote:
> RISC-V should also generate the SPCR in a manner similar to ARM.
> Therefore, instead of replicating the code, relocate this function
> to the common AML build.
> 
> Signed-off-by: Sia Jee Heng 
> ---
>  hw/acpi/aml-build.c | 51 
>  hw/arm/virt-acpi-build.c| 68 +++--
>  include/hw/acpi/acpi-defs.h | 33 ++
>  include/hw/acpi/aml-build.h |  4 +++
>  4 files changed, 115 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index af66bde0f5..f3904650e4 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1994,6 +1994,57 @@ static void build_processor_hierarchy_node(GArray 
> *tbl, uint32_t flags,
>  }
>  }
>  
> +void build_spcr(GArray *table_data, BIOSLinker *linker,
> +const AcpiSpcrData *f, const uint8_t rev,
> +const char *oem_id, const char *oem_table_id)
> +{
> +AcpiTable table = { .sig = "SPCR", .rev = rev, .oem_id = oem_id,
> +.oem_table_id = oem_table_id };
> +
> +acpi_table_begin(, table_data);
> +/* Interface type */
> +build_append_int_noprefix(table_data, f->interface_type, 1);
> +/* Reserved */
> +build_append_int_noprefix(table_data, 0, 3);
> +/* Base Address */
> +build_append_gas(table_data, f->base_addr.id, f->base_addr.width,
> + f->base_addr.offset, f->base_addr.size,
> + f->base_addr.addr);
> +/* Interrupt type */
> +build_append_int_noprefix(table_data, f->interrupt_type, 1);
> +/* IRQ */
> +build_append_int_noprefix(table_data, f->pc_interrupt, 1);
> +/* Global System Interrupt */
> +build_append_int_noprefix(table_data, f->interrupt, 4);
> +/* Baud Rate */
> +build_append_int_noprefix(table_data, f->baud_rate, 1);
> +/* Parity */
> +build_append_int_noprefix(table_data, f->parity, 1);
> +/* Stop Bits */
> +build_append_int_noprefix(table_data, f->stop_bits, 1);
> +/* Flow Control */
> +build_append_int_noprefix(table_data, f->flow_control, 1);
> +/* Terminal Type */
> +build_append_int_noprefix(table_data, f->terminal_type, 1);
> +/* PCI Device ID  */
> +build_append_int_noprefix(table_data, f->pci_device_id, 2);
> +/* PCI Vendor ID */
> +build_append_int_noprefix(table_data, f->pci_vendor_id, 2);
> +/* PCI Bus Number */
> +build_append_int_noprefix(table_data, f->pci_bus, 1);
> +/* PCI Device Number */
> +build_append_int_noprefix(table_data, f->pci_device, 1);
> +/* PCI Function Number */
> +build_append_int_noprefix(table_data, f->pci_function, 1);
> +/* PCI Flags */
> +build_append_int_noprefix(table_data, f->pci_flags, 4);
> +/* PCI Segment */
> +build_append_int_noprefix(table_data, f->pci_segment, 1);
> +/* Reserved */
> +build_append_int_noprefix(table_data, 0, 4);
> +
I think either there should be a comment that this supports only v2 of
SPCR spec or it should be able to create SPCR of any version. IMO, I
think it is better to add support till v4 (latest). Since consumers like
Linux probably doesn't support v4 yet, ARM/RISC-V can continue to create
v2 itself for the time being but the generic build_spcr() should be able
to create v4 also if the arch requires it.

Thanks,
Sunil
> +acpi_table_end(linker, );
> +}
>  /*
>   * ACPI spec, Revision 6.3
>   * 5.2.29 Processor Properties Topology Table (PPTT)
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index a22a2f43a5..195767c0f0 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -431,48 +431,34 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>   * Rev: 1.07
>   */
>  static void
> -build_spcr(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> +spcr_setup(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>  {
> -AcpiTable table = { .sig = "SPCR", .rev = 2, .oem_id = vms->oem_id,
> -.oem_table_id = vms->oem_table_id };
> -
> -acpi_table_begin(, table_data);
> -
> -/* Interface Type */
> -build_append_int_noprefix(table_data, 3, 1); /* ARM PL011 UART */
> -build_append_int_noprefix(table_data, 0, 3); /* Reserved */
> -/* Base Address */
> -build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 32, 0, 3,
> - vms->memmap[VIRT_UART].base);
> -/* Interrupt Type */
> -build_append_int_noprefix(table_data,
> -(1 << 3) /* Bit[3] ARMH GIC interrupt */, 1);
> -build_append_int_noprefix(table_data, 0, 1); /* IRQ */
> -/* Global System Interrupt */
> -build_append_int_noprefix(table_data,
> -  vms->irqmap[VIRT_UART] + ARM_SPI_BASE, 4);
> -build_append_int_noprefix(table_data, 3 /* 9600 */, 1); /* Baud Rate */
> -

Re: [PATCH v2 4/4] tests/avocado: excercise scripts/replay-dump.py in replay tests

2024-01-28 Thread Pavel Dovgalyuk


Reviewed-by: Pavel Dovgalyuk 

On 25.01.2024 19:08, Nicholas Piggin wrote:

This runs replay-dump.py after recording a trace, and fails the test if
the script fails.

replay-dump.py is modified to exit with non-zero if an error is
encountered while parsing, to support this.

Signed-off-by: Nicholas Piggin 
---
  scripts/replay-dump.py |  6 --
  tests/avocado/replay_kernel.py | 16 
  tests/avocado/replay_linux.py  | 16 
  3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index a1d7ae0364..bfea9c099b 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -21,6 +21,7 @@
  import argparse
  import struct
  import os
+import sys
  from collections import namedtuple
  from os import path
  
@@ -104,7 +105,7 @@ def call_decode(table, index, dumpfile):

  print("Could not decode index: %d" % (index))
  print("Entry is: %s" % (decoder))
  print("Decode Table is:\n%s" % (table))
-return False
+raise(Exception("unknown event"))
  else:
  return decoder.fn(decoder.eid, decoder.name, dumpfile)
  
@@ -125,7 +126,7 @@ def print_event(eid, name, string=None, event_count=None):

  def decode_unimp(eid, name, _unused_dumpfile):
  "Unimplemented decoder, will trigger exit"
  print("%s not handled - will now stop" % (name))
-return False
+raise(Exception("unhandled event"))
  
  def decode_plain(eid, name, _unused_dumpfile):

  "Plain events without additional data"
@@ -439,6 +440,7 @@ def decode_file(filename):
  dumpfile)
  except Exception as inst:
  print(f"error {inst}")
+sys.exit(1)
  
  finally:

  print(f"Reached {dumpfile.tell()} of {dumpsize} bytes")
diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index 10d99403a4..9b3ee6726b 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -13,6 +13,7 @@
  import shutil
  import logging
  import time
+import subprocess
  
  from avocado import skip

  from avocado import skipUnless
@@ -22,6 +23,11 @@
  from avocado.utils import process
  from boot_linux_console import LinuxKernelTest
  
+from pathlib import Path

+
+self_dir = Path(__file__).parent
+src_dir = self_dir.parent.parent
+
  class ReplayKernelBase(LinuxKernelTest):
  """
  Boots a Linux kernel in record mode and checks that the console
@@ -63,6 +69,8 @@ def run_vm(self, kernel_path, kernel_command_line, 
console_pattern,
  vm.shutdown()
  logger.info('finished the recording with log size %s bytes'
  % os.path.getsize(replay_path))
+self.run_replay_dump(replay_path)
+logger.info('successfully tested replay-dump.py')
  else:
  vm.wait()
  logger.info('successfully finished the replay')
@@ -70,6 +78,14 @@ def run_vm(self, kernel_path, kernel_command_line, 
console_pattern,
  logger.info('elapsed time %.2f sec' % elapsed)
  return elapsed
  
+def run_replay_dump(self, replay_path):

+try:
+subprocess.check_call(["./scripts/replay-dump.py",
+   "-f", replay_path],
+  cwd=src_dir, stdout=subprocess.DEVNULL)
+except subprocess.CalledProcessError:
+self.fail('replay-dump.py failed')
+
  def run_rr(self, kernel_path, kernel_command_line, console_pattern,
 shift=7, args=None):
  replay_path = os.path.join(self.workdir, 'replay.bin')
diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index f3a43dc98c..dd148ff639 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -11,6 +11,7 @@
  import os
  import logging
  import time
+import subprocess
  
  from avocado import skipUnless

  from avocado_qemu import BUILD_DIR
@@ -21,6 +22,11 @@
  from avocado.utils.path import find_command
  from avocado_qemu import LinuxTest
  
+from pathlib import Path

+
+self_dir = Path(__file__).parent
+src_dir = self_dir.parent.parent
+
  class ReplayLinux(LinuxTest):
  """
  Boots a Linux system, checking for a successful initialization
@@ -94,6 +100,8 @@ def launch_and_wait(self, record, args, shift):
  vm.shutdown()
  logger.info('finished the recording with log size %s bytes'
  % os.path.getsize(replay_path))
+self.run_replay_dump(replay_path)
+logger.info('successfully tested replay-dump.py')
  else:
  vm.event_wait('SHUTDOWN', self.timeout)
  vm.wait()
@@ -108,6 +116,14 @@ def run_rr(self, args=None, shift=7):
  logger = logging.getLogger('replay')
  logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1))
  
+def run_replay_dump(self, replay_path):

+try:
+

Re: [PATCH v2 1/4] replay: allow runstate shutdown->running when replaying trace

2024-01-28 Thread Pavel Dovgalyuk


Reviewed-by: Pavel Dovgalyuk 

On 25.01.2024 19:08, Nicholas Piggin wrote:

When replaying a trace, it is possible to go from shutdown to
running with a reverse-debugging step. This can be useful if the
problem being debugged triggers a reset or shutdown.

Signed-off-by: Nicholas Piggin 
---
  include/sysemu/runstate.h |  1 +
  replay/replay.c   |  2 ++
  system/runstate.c | 19 +++
  3 files changed, 22 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..fe25eed3c0 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -9,6 +9,7 @@ void runstate_set(RunState new_state);
  RunState runstate_get(void);
  bool runstate_is_running(void);
  bool runstate_needs_reset(void);
+void runstate_replay_enable(void);
  
  typedef void VMChangeStateHandler(void *opaque, bool running, RunState state);
  
diff --git a/replay/replay.c b/replay/replay.c

index 3fd241a4fc..2951eed3bd 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -383,6 +383,8 @@ static void replay_enable(const char *fname, int mode)
  /* go to the beginning */
  fseek(replay_file, HEADER_SIZE, SEEK_SET);
  replay_fetch_data_kind();
+
+runstate_replay_enable();
  }
  
  replay_init_events();

diff --git a/system/runstate.c b/system/runstate.c
index d6ab860eca..bd0fed8657 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -182,6 +182,12 @@ static const RunStateTransition runstate_transitions_def[] 
= {
  { RUN_STATE__MAX, RUN_STATE__MAX },
  };
  
+static const RunStateTransition replay_runstate_transitions_def[] = {

+{ RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING},
+
+{ RUN_STATE__MAX, RUN_STATE__MAX },
+};
+
  static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX];
  
  bool runstate_check(RunState state)

@@ -189,6 +195,19 @@ bool runstate_check(RunState state)
  return current_run_state == state;
  }
  
+void runstate_replay_enable(void)

+{
+const RunStateTransition *p;
+
+assert(replay_mode == REPLAY_MODE_PLAY);
+
+for (p = _runstate_transitions_def[0]; p->from != RUN_STATE__MAX;
+ p++) {
+runstate_valid_transitions[p->from][p->to] = true;
+}
+
+}
+
  static void runstate_init(void)
  {
  const RunStateTransition *p;

Re: [PATCH v2 19/23] target/s390x: Prefer fast cpu_env() over slower CPU QOM cast macro

2024-01-28 Thread Thomas Huth


On 26/01/2024 23.04, Philippe Mathieu-Daudé wrote:

Mechanical patch produced running the command documented
in scripts/coccinelle/cpu_env.cocci_template header.

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/s390x/cpu-dump.c|  3 +--
  target/s390x/gdbstub.c |  6 ++
  target/s390x/helper.c  |  3 +--
  target/s390x/kvm/kvm.c |  6 ++
  target/s390x/tcg/excp_helper.c | 11 +++
  target/s390x/tcg/translate.c   |  3 +--
  6 files changed, 10 insertions(+), 22 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH v2 11/23] target/m68k: Prefer fast cpu_env() over slower CPU QOM cast macro

2024-01-28 Thread Thomas Huth


On 26/01/2024 23.03, Philippe Mathieu-Daudé wrote:

Mechanical patch produced running the command documented
in scripts/coccinelle/cpu_env.cocci_template header.

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/m68k/cpu.c   | 30 ++
  target/m68k/gdbstub.c   |  6 ++
  target/m68k/helper.c|  3 +--
  target/m68k/m68k-semi.c |  6 ++
  target/m68k/op_helper.c | 11 +++
  target/m68k/translate.c |  3 +--
  6 files changed, 19 insertions(+), 40 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place

2024-01-28 Thread Peter Xu

On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
> 
> On 25/01/2024 22:57, Fabiano Rosas wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Avihai Horon  writes:
> > 
> > > The commit in the fixes line moved multifd thread creation to a
> > > different location, but forgot to move the p->running = true assignment
> > > as well. Thus, p->running is set to true before multifd thread is
> > > actually created.
> > > 
> > > p->running is used in multifd_save_cleanup() to decide whether to join
> > > the multifd thread or not.
> > > 
> > > With TLS, an error in multifd_tls_channel_connect() can lead to a
> > > segmentation fault because p->running is true but p->thread is never
> > > initialized, so multifd_save_cleanup() tries to join an uninitialized
> > > thread.
> > > 
> > > Fix it by moving p->running = true assignment right after multifd thread
> > > creation. Also move qio_channel_set_delay() to there, as this is where
> > > it used to be originally.
> > > 
> > > Fixes: 29647140157a ("migration/tls: add support for multifd 
> > > tls-handshake")
> > > Signed-off-by: Avihai Horon 
> > Just for context, I haven't looked at this patch yet, but we were
> > planning to remove p->running altogether:
> > 
> > https://lore.kernel.org/r/20231110200241.20679-1-faro...@suse.de
> 
> Thanks for putting me in the picture.
> I see that there has been a discussion about the multifd creation/treadown
> flow.
> In light of this discussion, I can already see a few problems in my series
> that I didn't notice before (such as the TLS handshake thread leak).
> The thread you mentioned here and some of my patches point out some problems
> in multifd creation/treardown. I guess we can discuss it and see what's the
> best way to solve them.
> 
> Regarding this patch, your solution indeed solves the bug that this patch
> addresses, so maybe this could be dropped (or only noted in your patch).
> 
> Maybe I should also put you (and Peter) in context for this whole series --
> I am writing it as preparation for adding a separate migration channel for
> VFIO device migration, so VFIO devices could be migrated in parallel.
> So this series tries to lay down some foundations to facilitate it.

Avihai, is the throughput the only reason that VFIO would like to have a
separate channel?

I'm wondering if we can also use multifd threads to send vfio data at some
point.  Now multifd indeed is closely bound to ram pages but maybe it'll
change in the near future to take any load?

Multifd is for solving the throughput issue already. If vfio has the same
goal, IMHO it'll be good to keep them using the same thread model, instead
of managing different threads in different places.  With that, any user
setting (for example, multifd-n-threads) will naturally apply to all
components, rather than relying on yet-another vfio-migration-threads-num
parameter.

-- 
Peter Xu

Re: [PATCH] hw/scsi/lsi53c895a: add missing decrement of reentrancy counter

2024-01-28 Thread Thomas Huth


On 28/01/2024 21.22, Sven Schnelle wrote:

When the maximum count of SCRIPTS instructions is reached, the code
stops execution and returns, but fails to decrement the reentrancy
counter. This effectively renders the SCSI controller unusable
because on next entry the reentrancy counter is still above the limit.

This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS
loops.


Out of curiosity: What happened there before we introduced the 
reentrancy_level fix? Did it end up in an endless loop, or was it finishing 
at one point? In the latter case, we might need to adjust the 
"reentrancy_level > 8" to allow deeper nesting.



Fixes: b987718bbb ("hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller 
(CVE-2023-0330)")
Signed-off-by: Sven Schnelle 
---
  hw/scsi/lsi53c895a.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 34e3b89287..d607a5f9fb 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -1159,6 +1159,7 @@ again:
  lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
  lsi_disconnect(s);
  trace_lsi_execute_script_stop();
+reentrancy_level--;
  return;
  }
  insn = read_dword(s, s->dsp);


Reviewed-by: Thomas Huth

Re: spapr watchdog vs watchdog_perform_action() / QMP watchdog-set-action

2024-01-28 Thread David Gibson

On Sat, Jan 27, 2024 at 01:08:02PM +, Peter Maydell wrote:
> On Fri, 26 Jan 2024 at 20:49, Markus Armbruster  wrote:
> >
> > Peter Maydell  writes:
> >
> > > Hi; one of the "bitesized tasks" we have listed is to convert
> > > watchdog timers which directly call qemu_system_reset_request() on
> > > watchdog timeout to call watchdog_perform_action() instead. This
> > > means they honour the QMP commands that let the user specifiy
> > > the behaviour on watchdog expiry:
> > > https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#qapidoc-141
> > > https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#qapidoc-129
> > > (choices include reset, power off the system, do nothing, etc).
> > >
> > > There are only a few remaining watchdogs that don't use the
> > > watchdog_perform_action() function. In most cases the change
> > > is obvious and easy: just make them do that instead of calling
> > > qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET).
> > >
> > > However, the hw/watchdog/spapr_watchdog.c case is trickier. As
> > > far as I can tell from the sources, this is a watchdog set up via
> > > a hypercall, and the guest makes a choice of "power off, restart,
> > > or dump and restart" for its on-expiry action.
> > >
> > > What should this watchdog's interaction with the watchdog-set-action
> > > QMP command be? If the user says "do X" and the guest says "do Y",
> > > which do we do? (With the current code, we always honour what
> > > the guest asks for and ignore what the user asks for.)
> >
> > Gut reaction: when the user says "do X", the guest should not get a say.
> > But one of the values of X could be "whatever the guest says".

That would also be my inclination.

> Mmm. Slightly awkwardly, we don't currently distinguish between
> "action is reset because the user never expressed a preference"
> and "action is reset because the user specifically asked for that",
> but I guess in theory we could make that distinction. (Conveniently
> there is no QMP action for "query current watchdog-action state",
> so we don't need to worry about reflecting that distinction in the
> QMP interface if we make it.)

I think that change is necessary in order to accomodate this sort of
watchdog with guest-progammable behaviour (which is part of the PAPR
spec, so we shouldn't just ignore it).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v4 15/45] Add BCM2838 PCIE host

2024-01-28 Thread Sergei Kambalin

I'm not super familiar with how QEMU models PCI controllers, but I'm not 
sure this handling of the root port config registers is right. On other 
controllers it looks like the root config reads and writes are handled 
by setting the PCIDeviceClass::config_read and PCIDevice::config_write 
methods.

It's not only for config space access but also to configuration area beyond 
PCIE config space.
I'm going to split into two access methods:
1) bcm2838_pcie_config_{write,read}() for access to addresses within the config 
space. Will be assigned to the dev->config_write.
2) bcm2838_pcie_mmio_{write,read}() for access to addresses beyond the config 
space. Will be implemented as 'normal' MemoryRegionOps structure.

Isn't the important thing here not what guest software thinks, but what 
the actual hardware in this bcm2838 SoC implements ?


The thing is that the code is not fully abides the PCIe spec. It's been written 
mostly basing on kernel code and boot logs analysis.
That's why I think it worth to add a detailed comment why it was written that 
way.

On 12/7/23 20:31, Sergey Kambalin wrote:

Signed-off-by: Sergey Kambalin
---
  hw/arm/bcm2838_pcie.c | 216 +-
  include/hw/arm/bcm2838_pcie.h |  22 
  2 files changed, 236 insertions(+), 2 deletions(-)

diff --git a/hw/arm/bcm2838_pcie.c b/hw/arm/bcm2838_pcie.c
index 3b4373c6a6..75146d6c2e 100644
--- a/hw/arm/bcm2838_pcie.c
+++ b/hw/arm/bcm2838_pcie.c
@@ -12,11 +12,222 @@
  #include "hw/irq.h"
  #include "hw/pci-host/gpex.h"
  #include "hw/qdev-properties.h"
-#include "migration/vmstate.h"
-#include "qemu/module.h"
  #include "hw/arm/bcm2838_pcie.h"
  #include "trace.h"
  
+/*

+ * RC host part
+ */
+
+static uint64_t bcm2838_pcie_host_read(void *opaque, hwaddr offset,
+   unsigned size) {
+hwaddr mmcfg_addr;
+uint64_t value = ~0;
+BCM2838PcieHostState *s = opaque;
+PCIExpressHost *pcie_hb = PCIE_HOST_BRIDGE(s);
+PCIDevice *root_pci_dev = PCI_DEVICE(>root_port);
+uint8_t *root_regs = s->root_port.regs;
+uint32_t *cfg_idx = (uint32_t *)(root_regs + BCM2838_PCIE_EXT_CFG_INDEX
+ - PCIE_CONFIG_SPACE_SIZE);
+
+if (offset < PCIE_CONFIG_SPACE_SIZE) {
+value = pci_host_config_read_common(root_pci_dev, offset,
+PCIE_CONFIG_SPACE_SIZE, size);
+} else if (offset - PCIE_CONFIG_SPACE_SIZE + size
+   <= sizeof(s->root_port.regs)) {
+switch (offset) {
+case BCM2838_PCIE_EXT_CFG_DATA
+... BCM2838_PCIE_EXT_CFG_DATA + PCIE_CONFIG_SPACE_SIZE - 1:
+mmcfg_addr = *cfg_idx
+| PCIE_MMCFG_CONFOFFSET(offset - BCM2838_PCIE_EXT_CFG_DATA);
+value = pcie_hb->mmio.ops->read(opaque, mmcfg_addr, size);
+break;
+default:
+memcpy(, root_regs + offset - PCIE_CONFIG_SPACE_SIZE, size);
+}
+} else {
+qemu_log_mask(
+LOG_GUEST_ERROR,
+"%s: out-of-range access, %u bytes @ offset 0x%04" PRIx64 "\n",
+__func__, size, offset);
+}
+
+trace_bcm2838_pcie_host_read(size, offset, value);
+return value;
+}
+
+static void bcm2838_pcie_host_write(void *opaque, hwaddr offset,
+uint64_t value, unsigned size) {
+hwaddr mmcfg_addr;
+BCM2838PcieHostState *s = opaque;
+PCIExpressHost *pcie_hb = PCIE_HOST_BRIDGE(s);
+PCIDevice *root_pci_dev = PCI_DEVICE(>root_port);
+uint8_t *root_regs = s->root_port.regs;
+uint32_t *cfg_idx = (uint32_t *)(root_regs + BCM2838_PCIE_EXT_CFG_INDEX
+ - PCIE_CONFIG_SPACE_SIZE);
+
+trace_bcm2838_pcie_host_write(size, offset, value);
+
+if (offset < PCIE_CONFIG_SPACE_SIZE) {
+pci_host_config_write_common(root_pci_dev, offset,
+ PCIE_CONFIG_SPACE_SIZE, value, size);
+} else if (offset - PCIE_CONFIG_SPACE_SIZE + size
+   <= sizeof(s->root_port.regs)) {
+switch (offset) {
+case BCM2838_PCIE_EXT_CFG_DATA
+... BCM2838_PCIE_EXT_CFG_DATA + PCIE_CONFIG_SPACE_SIZE - 1:
+mmcfg_addr = *cfg_idx
+| PCIE_MMCFG_CONFOFFSET(offset - BCM2838_PCIE_EXT_CFG_DATA);
+pcie_hb->mmio.ops->write(opaque, mmcfg_addr, value, size);
+break;
+default:
+memcpy(root_regs + offset - PCIE_CONFIG_SPACE_SIZE, , size);
+}
+} else {
+qemu_log_mask(
+LOG_GUEST_ERROR,
+"%s: out-of-range access, %u bytes @ offset 0x%04" PRIx64 "\n",
+__func__, size, offset);
+}
+}
+
+static const MemoryRegionOps bcm2838_pcie_host_ops = {
+.read = bcm2838_pcie_host_read,
+.write = bcm2838_pcie_host_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.impl = {.max_access_size = sizeof(uint64_t)},
+};
+
+int

[PULL 06/14] ci: Add a migration compatibility test job

2024-01-28 Thread peterx

From: Fabiano Rosas 

The migration tests have support for being passed two QEMU binaries to
test migration compatibility.

Add a CI job that builds the lastest release of QEMU and another job
that uses that version plus an already present build of the current
version and run the migration tests with the two, both as source and
destination. I.e.:

 old QEMU (n-1) -> current QEMU (development tree)
 current QEMU (development tree) -> old QEMU (n-1)

The purpose of this CI job is to ensure the code we're about to merge
will not cause a migration compatibility problem when migrating the
next release (which will contain that code) to/from the previous
release.

The version of migration-test used will be the one matching the older
QEMU. That way we can avoid special-casing new tests that wouldn't be
compatible with the older QEMU.

Note: for user forks, the version tags need to be pushed to gitlab
otherwise it won't be able to checkout a different version.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240118164951.30350-3-faro...@suse.de
Signed-off-by: Peter Xu 
---
 .gitlab-ci.d/buildtest.yml | 60 ++
 1 file changed, 60 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index e1c7801598..f0b0edc634 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -167,6 +167,66 @@ build-system-centos:
   x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
 MAKE_CHECK_ARGS: check-build
 
+# Previous QEMU release. Used for cross-version migration tests.
+build-previous-qemu:
+  extends: .native_build_job_template
+  artifacts:
+when: on_success
+expire_in: 2 days
+paths:
+  - build-previous
+exclude:
+  - build-previous/**/*.p
+  - build-previous/**/*.a.p
+  - build-previous/**/*.fa.p
+  - build-previous/**/*.c.o
+  - build-previous/**/*.c.o.d
+  - build-previous/**/*.fa
+  needs:
+job: amd64-opensuse-leap-container
+  variables:
+IMAGE: opensuse-leap
+TARGETS: x86_64-softmmu aarch64-softmmu
+  before_script:
+- export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' VERSION)"
+- git checkout $QEMU_PREV_VERSION
+  after_script:
+- mv build build-previous
+
+.migration-compat-common:
+  extends: .common_test_job_template
+  needs:
+- job: build-previous-qemu
+- job: build-system-opensuse
+  # The old QEMU could have bugs unrelated to migration that are
+  # already fixed in the current development branch, so this test
+  # might fail.
+  allow_failure: true
+  variables:
+IMAGE: opensuse-leap
+MAKE_CHECK_ARGS: check-build
+  script:
+# Use the migration-tests from the older QEMU tree. This avoids
+# testing an old QEMU against new features/tests that it is not
+# compatible with.
+- cd build-previous
+# old to new
+- QTEST_QEMU_BINARY_SRC=./qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} 
./tests/qtest/migration-test
+# new to old
+- QTEST_QEMU_BINARY_DST=./qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} 
./tests/qtest/migration-test
+
+migration-compat-aarch64:
+  extends: .migration-compat-common
+  variables:
+TARGET: aarch64
+
+migration-compat-x86_64:
+  extends: .migration-compat-common
+  variables:
+TARGET: x86_64
+
 check-system-centos:
   extends: .native_test_job_template
   needs:
-- 
2.43.0

[PULL 04/14] migration: Drop unnecessary check in ram's pending_exact()

2024-01-28 Thread peterx

From: Peter Xu 

When the migration frameworks fetches the exact pending sizes, it means
this check:

  remaining_size < s->threshold_size

Must have been done already, actually at migration_iteration_run():

if (must_precopy <= s->threshold_size) {
qemu_savevm_state_pending_exact(_precopy, _postcopy);

That should be after one round of ram_state_pending_estimate().  It makes
the 2nd check meaningless and can be dropped.

To say it in another way, when reaching ->state_pending_exact(), we
unconditionally sync dirty bits for precopy.

Then we can drop migrate_get_current() there too.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240117075848.139045-3-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/ram.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c0cdcccb75..d5b7cd5ac2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void *opaque, 
uint64_t *must_precopy,
 static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
 uint64_t *can_postcopy)
 {
-MigrationState *s = migrate_get_current();
 RAMState **temp = opaque;
 RAMState *rs = *temp;
+uint64_t remaining_size;
 
-uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
-
-if (!migration_in_postcopy() && remaining_size < s->threshold_size) {
+if (!migration_in_postcopy()) {
 bql_lock();
 WITH_RCU_READ_LOCK_GUARD() {
 migration_bitmap_sync_precopy(rs, false);
 }
 bql_unlock();
-remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
 }
 
+remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
+
 if (migrate_postcopy_ram()) {
 /* We can do postcopy, and all the data is postcopiable */
 *can_postcopy += remaining_size;
-- 
2.43.0

[PULL 05/14] analyze-migration.py: Remove trick on parsing ramblocks

2024-01-28 Thread peterx

From: Peter Xu 

RAM_SAVE_FLAG_MEM_SIZE contains the total length of ramblock idstr to know
whether scanning of ramblocks is complete.  Drop the trick.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240117075848.139045-4-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 scripts/analyze-migration.py | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index a39dfb8766..8a254a5b6a 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -151,17 +151,12 @@ def read(self):
 addr &= ~(self.TARGET_PAGE_SIZE - 1)
 
 if flags & self.RAM_SAVE_FLAG_MEM_SIZE:
-while True:
+total_length = addr
+while total_length > 0:
 namelen = self.file.read8()
-# We assume that no RAM chunk is big enough to ever
-# hit the first byte of the address, so when we see
-# a zero here we know it has to be an address, not the
-# length of the next block.
-if namelen == 0:
-self.file.file.seek(-1, 1)
-break
 self.name = self.file.readstr(len = namelen)
 len = self.file.read64()
+total_length -= len
 self.sizeinfo[self.name] = '0x%016x' % len
 if self.write_memory:
 print(self.name)
-- 
2.43.0

[PULL 01/14] userfaultfd: use 1ULL to build ioctl masks

2024-01-28 Thread peterx

From: Paolo Bonzini 

There is no need to use the Linux-internal __u64 type, 1ULL is
guaranteed to be wide enough.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Link: https://lore.kernel.org/r/20240117160313.175609-1-pbonz...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/postcopy-ram.c  | 16 +++-
 subprojects/libvhost-user/libvhost-user.c |  2 +-
 tests/qtest/migration-test.c  |  4 ++--
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5408e028c6..893ec8fa89 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -102,11 +102,9 @@ void postcopy_thread_create(MigrationIncomingState *mis,
  * are target OS specific.
  */
 #if defined(__linux__)
-
 #include 
 #include 
 #include 
-#include  /* for __u64 */
 #endif
 
 #if defined(__linux__) && defined(__NR_userfaultfd) && defined(CONFIG_EVENTFD)
@@ -272,8 +270,8 @@ static bool request_ufd_features(int ufd, uint64_t features)
 return false;
 }
 
-ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
- (__u64)1 << _UFFDIO_UNREGISTER;
+ioctl_mask = 1ULL << _UFFDIO_REGISTER |
+ 1ULL << _UFFDIO_UNREGISTER;
 if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
 error_report("Missing userfault features: %" PRIx64,
  (uint64_t)(~api_struct.ioctls & ioctl_mask));
@@ -462,9 +460,9 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState 
*mis, Error **errp)
 goto out;
 }
 
-feature_mask = (__u64)1 << _UFFDIO_WAKE |
-   (__u64)1 << _UFFDIO_COPY |
-   (__u64)1 << _UFFDIO_ZEROPAGE;
+feature_mask = 1ULL << _UFFDIO_WAKE |
+   1ULL << _UFFDIO_COPY |
+   1ULL << _UFFDIO_ZEROPAGE;
 if ((reg_struct.ioctls & feature_mask) != feature_mask) {
 error_setg(errp, "Missing userfault map features: %" PRIx64,
(uint64_t)(~reg_struct.ioctls & feature_mask));
@@ -733,11 +731,11 @@ static int ram_block_enable_notify(RAMBlock *rb, void 
*opaque)
 error_report("%s userfault register: %s", __func__, strerror(errno));
 return -1;
 }
-if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
+if (!(reg_struct.ioctls & (1ULL << _UFFDIO_COPY))) {
 error_report("%s userfault: Region doesn't support COPY", __func__);
 return -1;
 }
-if (reg_struct.ioctls & ((__u64)1 << _UFFDIO_ZEROPAGE)) {
+if (reg_struct.ioctls & (1ULL << _UFFDIO_ZEROPAGE)) {
 qemu_ram_set_uf_zeroable(rb);
 }
 
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 6684057370..a3b158c671 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -684,7 +684,7 @@ generate_faults(VuDev *dev) {
  dev->postcopy_ufd, strerror(errno));
 return false;
 }
-if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
+if (!(reg_struct.ioctls & (1ULL << _UFFDIO_COPY))) {
 vu_panic(dev, "%s Region (%d) doesn't support COPY",
  __func__, i);
 return false;
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d3066e119f..7675519cfa 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -104,8 +104,8 @@ static bool ufd_version_check(void)
 }
 uffd_feature_thread_id = api_struct.features & UFFD_FEATURE_THREAD_ID;
 
-ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
- (__u64)1 << _UFFDIO_UNREGISTER;
+ioctl_mask = 1ULL << _UFFDIO_REGISTER |
+ 1ULL << _UFFDIO_UNREGISTER;
 if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
 g_test_message("Skipping test: Missing userfault feature");
 return false;
-- 
2.43.0

[PULL 09/14] migration: Fix use-after-free of migration state object

2024-01-28 Thread peterx

From: Fabiano Rosas 

We're currently allowing the process_incoming_migration_bh bottom-half
to run without holding a reference to the 'current_migration' object,
which leads to a segmentation fault if the BH is still live after
migration_shutdown() has dropped the last reference to
current_migration.

In my system the bug manifests as migrate_multifd() returning true
when it shouldn't and multifd_load_shutdown() calling
multifd_recv_terminate_threads() which crashes due to an uninitialized
multifd_recv_state.

Fix the issue by holding a reference to the object when scheduling the
BH and dropping it before returning from the BH. The same is already
done for the cleanup_bh at migrate_fd_cleanup_schedule().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1969
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240119233922.32588-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 219447dea1..cf17b68e57 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -648,6 +648,7 @@ static void process_incoming_migration_bh(void *opaque)
   MIGRATION_STATUS_COMPLETED);
 qemu_bh_delete(mis->bh);
 migration_incoming_state_destroy();
+object_unref(OBJECT(migrate_get_current()));
 }
 
 static void coroutine_fn
@@ -713,6 +714,7 @@ process_incoming_migration_co(void *opaque)
 }
 
 mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
+object_ref(OBJECT(migrate_get_current()));
 qemu_bh_schedule(mis->bh);
 return;
 fail:
-- 
2.43.0

[PULL 07/14] ci: Disable migration compatibility tests for aarch64

2024-01-28 Thread peterx

From: Fabiano Rosas 

Until 9.0 is out, we need to keep the aarch64 job disabled because the
tests always use the n-1 version of migration-test. That happens to be
broken for aarch64 in 8.2. Once 9.0 is out, it will become the n-1
version and it will bring the fixed tests.

We can revert this patch when 9.0 releases.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240118164951.30350-4-faro...@suse.de
[peterx: use _SKIPPED rather than _OPTIONAL]
Signed-off-by: Peter Xu 
---
 .gitlab-ci.d/buildtest.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index f0b0edc634..79bbc8585b 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -217,10 +217,14 @@ build-previous-qemu:
 - QTEST_QEMU_BINARY_DST=./qemu-system-${TARGET}
   QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} 
./tests/qtest/migration-test
 
+# This job is disabled until we release 9.0. The existing
+# migration-test in 8.2 is broken on aarch64. The fix was already
+# commited, but it will only take effect once 9.0 is out.
 migration-compat-aarch64:
   extends: .migration-compat-common
   variables:
 TARGET: aarch64
+QEMU_JOB_SKIPPED: 1
 
 migration-compat-x86_64:
   extends: .migration-compat-common
-- 
2.43.0

[PULL 14/14] Make 'uri' optional for migrate QAPI

2024-01-28 Thread peterx

From: Het Gala 

'uri' argument should be optional, as 'uri' and 'channels'
arguments are mutally exclusive in nature.

Fixes: 074dbce5fcce (migration: New migrate and migrate-incoming argument 
'channels')
Signed-off-by: Het Gala 
Link: https://lore.kernel.org/r/20240123064219.40514-1-het.g...@nutanix.com
Signed-off-by: Peter Xu 
---
 qapi/migration.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 489b591c23..d3e2b864c5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1757,7 +1757,7 @@
 #
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str',
+  'data': {'*uri': 'str',
'*channels': [ 'MigrationChannel' ],
'*blk': { 'type': 'bool', 'features': [ 'deprecated' ] },
'*inc': { 'type': 'bool', 'features': [ 'deprecated' ] },
-- 
2.43.0

[PULL 10/14] migration: Take reference to migration state around bg_migration_vm_start_bh

2024-01-28 Thread peterx

From: Fabiano Rosas 

We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240119233922.32588-3-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index cf17b68e57..b1213b59ce 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3382,6 +3382,7 @@ static void bg_migration_vm_start_bh(void *opaque)
 
 vm_resume(s->vm_old_state);
 migration_downtime_end(s);
+object_unref(OBJECT(s));
 }
 
 /**
@@ -3486,6 +3487,7 @@ static void *bg_migration_thread(void *opaque)
  * writes to virtio VQs memory which is in write-protected region.
  */
 s->vm_start_bh = qemu_bh_new(bg_migration_vm_start_bh, s);
+object_ref(OBJECT(s));
 qemu_bh_schedule(s->vm_start_bh);
 
 bql_unlock();
-- 
2.43.0

[PULL 11/14] migration: Reference migration state around loadvm_postcopy_handle_run_bh

2024-01-28 Thread peterx

From: Fabiano Rosas 

We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use. Even on this
load-side function, we might still use the MigrationState, e.g to
check for capabilities.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240119233922.32588-4-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/savevm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 6410705ebe..93387350c7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2174,6 +2174,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 qemu_bh_delete(mis->bh);
 
 trace_vmstate_downtime_checkpoint("dst-postcopy-bh-vm-started");
+object_unref(OBJECT(migration_get_current()));
 }
 
 /* After all discards we can start running and asking for pages */
@@ -2189,6 +2190,7 @@ static int 
loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 
 postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
 mis->bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, mis);
+object_ref(OBJECT(migration_get_current()));
 qemu_bh_schedule(mis->bh);
 
 /* We need to finish reading the stream from the package
-- 
2.43.0

[PULL 08/14] migration/yank: Use channel features

2024-01-28 Thread peterx

From: Fabiano Rosas 

Stop using outside knowledge about the io channels when registering
yank functions. Query for features instead.

The yank method for all channels used with migration code currently is
to call the qio_channel_shutdown() function, so query for
QIO_CHANNEL_FEATURE_SHUTDOWN. We could add a separate feature in the
future for indicating whether a channel supports yanking, but that
seems overkill at the moment.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20230911171320.24372-9-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/yank_functions.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/migration/yank_functions.c b/migration/yank_functions.c
index d5a710a3f2..979e60c762 100644
--- a/migration/yank_functions.c
+++ b/migration/yank_functions.c
@@ -8,12 +8,9 @@
  */
 
 #include "qemu/osdep.h"
-#include "qapi/error.h"
 #include "io/channel.h"
 #include "yank_functions.h"
 #include "qemu/yank.h"
-#include "io/channel-socket.h"
-#include "io/channel-tls.h"
 #include "qemu-file.h"
 
 void migration_yank_iochannel(void *opaque)
@@ -26,8 +23,7 @@ void migration_yank_iochannel(void *opaque)
 /* Return whether yank is supported on this ioc */
 static bool migration_ioc_yank_supported(QIOChannel *ioc)
 {
-return object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_SOCKET) ||
-object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS);
+return qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
 }
 
 void migration_ioc_register_yank(QIOChannel *ioc)
-- 
2.43.0

[PULL 12/14] migration: Add a wrapper to qemu_bh_schedule

2024-01-28 Thread peterx

From: Fabiano Rosas 

Wrap qemu_bh_schedule() to ensure we always hold a reference to the
current_migration object.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240119233922.32588-5-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b1213b59ce..0e7f101d64 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -199,6 +199,16 @@ void migration_object_init(void)
 dirty_bitmap_mig_init();
 }
 
+static void migration_bh_schedule(MigrationState *s, QEMUBH *bh)
+{
+/*
+ * Ref the state for bh, because it may be called when
+ * there're already no other refs
+ */
+object_ref(OBJECT(s));
+qemu_bh_schedule(bh);
+}
+
 void migration_cancel(const Error *error)
 {
 if (error) {
@@ -714,8 +724,7 @@ process_incoming_migration_co(void *opaque)
 }
 
 mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
-object_ref(OBJECT(migrate_get_current()));
-qemu_bh_schedule(mis->bh);
+migration_bh_schedule(migrate_get_current(), mis->bh);
 return;
 fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
@@ -1332,16 +1341,6 @@ static void migrate_fd_cleanup(MigrationState *s)
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
-static void migrate_fd_cleanup_schedule(MigrationState *s)
-{
-/*
- * Ref the state for bh, because it may be called when
- * there're already no other refs
- */
-object_ref(OBJECT(s));
-qemu_bh_schedule(s->cleanup_bh);
-}
-
 static void migrate_fd_cleanup_bh(void *opaque)
 {
 MigrationState *s = opaque;
@@ -3140,7 +3139,7 @@ static void migration_iteration_finish(MigrationState *s)
 error_report("%s: Unknown ending state %d", __func__, s->state);
 break;
 }
-migrate_fd_cleanup_schedule(s);
+migration_bh_schedule(s, s->cleanup_bh);
 bql_unlock();
 }
 
@@ -3171,7 +3170,7 @@ static void bg_migration_iteration_finish(MigrationState 
*s)
 break;
 }
 
-migrate_fd_cleanup_schedule(s);
+migration_bh_schedule(s, s->cleanup_bh);
 bql_unlock();
 }
 
@@ -3487,9 +3486,7 @@ static void *bg_migration_thread(void *opaque)
  * writes to virtio VQs memory which is in write-protected region.
  */
 s->vm_start_bh = qemu_bh_new(bg_migration_vm_start_bh, s);
-object_ref(OBJECT(s));
-qemu_bh_schedule(s->vm_start_bh);
-
+migration_bh_schedule(s, s->vm_start_bh);
 bql_unlock();
 
 while (migration_is_active(s)) {
-- 
2.43.0

[PULL 03/14] migration: Make threshold_size an uint64_t

2024-01-28 Thread peterx

From: Peter Xu 

It's always used to compare against another uint64_t.  Make it always clear
that it's never a negative.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240117075848.139045-2-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/migration.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/migration.h b/migration/migration.h
index 17972dac34..a589ae8650 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -296,7 +296,7 @@ struct MigrationState {
  * this threshold; it's calculated from the requested downtime and
  * measured bandwidth, or avail-switchover-bandwidth if specified.
  */
-int64_t threshold_size;
+uint64_t threshold_size;
 
 /* params from 'migrate-set-parameters' */
 MigrationParameters parameters;
-- 
2.43.0

[PULL 02/14] migration: Plug memory leak on HMP migrate error path

2024-01-28 Thread peterx

From: Markus Armbruster 

hmp_migrate() leaks @caps when qmp_migrate() fails.  Plug the leak
with g_autoptr().

Fixes: 967f2de5c9ec (migration: Implement MigrateChannelList to hmp migration 
flow.) v8.2.0-rc0
Fixes: CID 1533125
Signed-off-by: Markus Armbruster 
Link: https://lore.kernel.org/r/20240117140722.3979657-1-arm...@redhat.com
[peterx: fix CID number as reported by Peter Maydell]
Signed-off-by: Peter Xu 
---
 migration/migration-hmp-cmds.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 740a219aa4..99b49df5dd 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -764,7 +764,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 bool resume = qdict_get_try_bool(qdict, "resume", false);
 const char *uri = qdict_get_str(qdict, "uri");
 Error *err = NULL;
-MigrationChannelList *caps = NULL;
+g_autoptr(MigrationChannelList) caps = NULL;
 g_autoptr(MigrationChannel) channel = NULL;
 
 if (inc) {
@@ -789,8 +789,6 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 return;
 }
 
-qapi_free_MigrationChannelList(caps);
-
 if (!detach) {
 HMPMigrationStatus *status;
 
-- 
2.43.0

[PULL 13/14] migration: Centralize BH creation and dispatch

2024-01-28 Thread peterx

From: Fabiano Rosas 

Now that the migration state reference counting is correct, further
wrap the bottom half dispatch process to avoid future issues.

Move BH creation and scheduling together and wrap the dispatch with an
intermediary function that will ensure we always keep the ref/unref
balanced.

Also move the responsibility of deleting the BH into the wrapper and
remove the now unnecessary pointers.

Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240119233922.32588-6-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.h |  5 +---
 migration/migration.c | 65 +--
 migration/savevm.c|  7 +
 3 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index a589ae8650..f2c8b8f286 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -159,8 +159,6 @@ struct MigrationIncomingState {
 /* PostCopyFD's for external userfaultfds & handlers of shared memory */
 GArray   *postcopy_remote_fds;
 
-QEMUBH *bh;
-
 int state;
 
 /*
@@ -255,8 +253,6 @@ struct MigrationState {
 
 /*< public >*/
 QemuThread thread;
-QEMUBH *vm_start_bh;
-QEMUBH *cleanup_bh;
 /* Protected by qemu_file_lock */
 QEMUFile *to_dst_file;
 /* Postcopy specific transfer channel */
@@ -528,6 +524,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void 
*opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
+void migration_bh_schedule(QEMUBHFunc *cb, void *opaque);
 void migration_cancel(const Error *error);
 
 void migration_populate_vfio_info(MigrationInfo *info);
diff --git a/migration/migration.c b/migration/migration.c
index 0e7f101d64..d5f705ceef 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -199,8 +199,39 @@ void migration_object_init(void)
 dirty_bitmap_mig_init();
 }
 
-static void migration_bh_schedule(MigrationState *s, QEMUBH *bh)
+typedef struct {
+QEMUBH *bh;
+QEMUBHFunc *cb;
+void *opaque;
+} MigrationBH;
+
+static void migration_bh_dispatch_bh(void *opaque)
 {
+MigrationState *s = migrate_get_current();
+MigrationBH *migbh = opaque;
+
+/* cleanup this BH */
+qemu_bh_delete(migbh->bh);
+migbh->bh = NULL;
+
+/* dispatch the other one */
+migbh->cb(migbh->opaque);
+object_unref(OBJECT(s));
+
+g_free(migbh);
+}
+
+void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
+{
+MigrationState *s = migrate_get_current();
+MigrationBH *migbh = g_new0(MigrationBH, 1);
+QEMUBH *bh = qemu_bh_new(migration_bh_dispatch_bh, migbh);
+
+/* Store these to dispatch when the BH runs */
+migbh->bh = bh;
+migbh->cb = cb;
+migbh->opaque = opaque;
+
 /*
  * Ref the state for bh, because it may be called when
  * there're already no other refs
@@ -656,9 +687,7 @@ static void process_incoming_migration_bh(void *opaque)
  */
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COMPLETED);
-qemu_bh_delete(mis->bh);
 migration_incoming_state_destroy();
-object_unref(OBJECT(migrate_get_current()));
 }
 
 static void coroutine_fn
@@ -723,8 +752,7 @@ process_incoming_migration_co(void *opaque)
 goto fail;
 }
 
-mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
-migration_bh_schedule(migrate_get_current(), mis->bh);
+migration_bh_schedule(process_incoming_migration_bh, mis);
 return;
 fail:
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
@@ -1285,9 +1313,6 @@ void migrate_set_state(int *state, int old_state, int 
new_state)
 
 static void migrate_fd_cleanup(MigrationState *s)
 {
-qemu_bh_delete(s->cleanup_bh);
-s->cleanup_bh = NULL;
-
 g_free(s->hostname);
 s->hostname = NULL;
 json_writer_free(s->vmdesc);
@@ -1343,9 +1368,7 @@ static void migrate_fd_cleanup(MigrationState *s)
 
 static void migrate_fd_cleanup_bh(void *opaque)
 {
-MigrationState *s = opaque;
-migrate_fd_cleanup(s);
-object_unref(OBJECT(s));
+migrate_fd_cleanup(opaque);
 }
 
 void migrate_set_error(MigrationState *s, const Error *error)
@@ -1568,8 +1591,6 @@ int migrate_init(MigrationState *s, Error **errp)
  * parameters/capabilities that the user set, and
  * locks.
  */
-s->cleanup_bh = 0;
-s->vm_start_bh = 0;
 s->to_dst_file = NULL;
 s->state = MIGRATION_STATUS_NONE;
 s->rp_state.from_dst_file = NULL;
@@ -3139,7 +3160,8 @@ static void migration_iteration_finish(MigrationState *s)
 error_report("%s: Unknown ending state %d", __func__, s->state);
 break;
 }
-migration_bh_schedule(s, s->cleanup_bh);
+
+migration_bh_schedule(migrate_fd_cleanup_bh, s);
 bql_unlock();
 }
 
@@ -3170,7 +3192,7 @@ static void bg_migration_iteration_finish(MigrationState 
*s)
 break;
 }
 
-migration_bh_schedule(s,

[PULL 00/14] Migration 20240126 patches

2024-01-28 Thread peterx

From: Peter Xu 

The following changes since commit 7a1dc45af581d2b643cdbf33c01fd96271616fbd:

  Merge tag 'pull-target-arm-20240126' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-01-26 
18:16:35 +)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-20240126-pull-request

for you to fetch changes up to 57fd4b4e10756448acd6c90ce041ba8dc9313efc:

  Make 'uri' optional for migrate QAPI (2024-01-29 11:02:12 +0800)


Migration Pull

[dropped fabiano's patch on modifying cpu model for arm migration tests for
 now]

- Fabiano's patchset to fix migration state references in BHs
- Fabiano's new 'n-1' migration test for CI
- Het's fix on making "uri" optional in QMP migrate cmd
- Markus's HMP leak fix reported by Coverity
- Paolo's cleanup on uffd to replace u64 usage
- Peter's small migration cleanup series all over the places



Fabiano Rosas (8):
  ci: Add a migration compatibility test job
  ci: Disable migration compatibility tests for aarch64
  migration/yank: Use channel features
  migration: Fix use-after-free of migration state object
  migration: Take reference to migration state around
bg_migration_vm_start_bh
  migration: Reference migration state around
loadvm_postcopy_handle_run_bh
  migration: Add a wrapper to qemu_bh_schedule
  migration: Centralize BH creation and dispatch

Het Gala (1):
  Make 'uri' optional for migrate QAPI

Markus Armbruster (1):
  migration: Plug memory leak on HMP migrate error path

Paolo Bonzini (1):
  userfaultfd: use 1ULL to build ioctl masks

Peter Xu (3):
  migration: Make threshold_size an uint64_t
  migration: Drop unnecessary check in ram's pending_exact()
  analyze-migration.py: Remove trick on parsing ramblocks

 qapi/migration.json   |  2 +-
 migration/migration.h |  7 +-
 migration/migration-hmp-cmds.c|  4 +-
 migration/migration.c | 82 +--
 migration/postcopy-ram.c  | 16 ++---
 migration/ram.c   |  9 ++-
 migration/savevm.c|  5 +-
 migration/yank_functions.c|  6 +-
 subprojects/libvhost-user/libvhost-user.c |  2 +-
 tests/qtest/migration-test.c  |  4 +-
 .gitlab-ci.d/buildtest.yml| 64 ++
 scripts/analyze-migration.py  | 11 +--
 12 files changed, 134 insertions(+), 78 deletions(-)

-- 
2.43.0

Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64

2024-01-28 Thread Peter Xu

On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
> Peter Maydell  writes:
> 
> > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas  wrote:
> >>
> >> pet...@redhat.com writes:
> >>
> >> > From: Fabiano Rosas 
> >> >
> >> > The 'max' cpu is not expected to be stable in terms of features across
> >> > QEMU versions, so it should not be expected to migrate.
> >> >
> >> > While the tests currently all pass with -cpu max, that is only because
> >> > we're not testing across QEMU versions, which is the more common
> >> > use-case for migration.
> >> >
> >> > We've recently introduced compatibility tests that use two different
> >> > QEMU versions and the tests are now failing for aarch64. The next
> >> > patch adds those tests to CI, so we cannot use the 'max' cpu
> >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of
> >> > features.
> >> >
> >> > Suggested-by: Peter Maydell 
> >> > Signed-off-by: Fabiano Rosas 
> >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-faro...@suse.de
> >> > Signed-off-by: Peter Xu 
> >> > ---
> >> >  tests/qtest/migration-test.c | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> > index 7675519cfa..15713f3666 100644
> >> > --- a/tests/qtest/migration-test.c
> >> > +++ b/tests/qtest/migration-test.c
> >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, 
> >> > QTestState **to,
> >> >  memory_size = "150M";
> >> >  machine_alias = "virt";
> >> >  machine_opts = "gic-version=max";
> >> > -arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
> >> > +arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", 
> >> > bootpath);
> >> >  start_address = ARM_TEST_MEM_START;
> >> >  end_address = ARM_TEST_MEM_END;
> >> >  } else {
> >>
> >> This breaks the tests on an arm host with KVM support. We could drop
> >> this patch from the PR, it doesn't affect anything else.
> >>
> >> Or squash in:
> >>
> >> -->8--
> >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001
> >> From: Fabiano Rosas 
> >> Date: Fri, 26 Jan 2024 11:33:15 -0300
> >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for 
> >> aarch64
> >>
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  tests/qtest/migration-test.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> index 15713f3666..2ba9cab684 100644
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, 
> >> QTestState **to,
> >>  memory_size = "150M";
> >>  machine_alias = "virt";
> >>  machine_opts = "gic-version=max";
> >> -arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", 
> >> bootpath);
> >> +arch_opts = g_strdup_printf("-cpu %s -kernel %s",
> >> +qtest_has_accel("kvm") ?
> >> +"host" : "neoverse-n1", bootpath);
> >>  start_address = ARM_TEST_MEM_START;
> >>  end_address = ARM_TEST_MEM_END;
> >>  } else {
> >
> > If you want to do that then a comment explaining why would be
> > helpful for future readers, I think.
> 
> Ok, let's drop this one then, I'll resend.

I'll drop this one for now then, thanks.

Just to double check: Fabiano, you meant that "-cpu host" won't hit the
same issue as what "-cpu max" would have for the new "n-1" CI test, right?

I can also wait to read your patch if that will contain the explanations.

-- 
Peter Xu

Re: [PATCH v4 02/66] RAMBlock: Add support of KVM private guest memfd

2024-01-28 Thread Xiaoyao Li


On 1/26/2024 9:57 PM, David Hildenbrand wrote:

  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
  {
  uint8_t mask = mr->dirty_log_mask;
diff --git a/system/physmem.c b/system/physmem.c
index c1b22bac77c2..4735b0462ed9 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1841,6 +1841,17 @@ static void ram_block_add(RAMBlock *new_block, 
Error **errp)

  }
  }
+    if (kvm_enabled() && (new_block->flags & RAM_GUEST_MEMFD) &&
+    new_block->guest_memfd < 0) {


How could we have a guest_memfd already at this point? Smells more like 
an assert(new_block->guest_memfd < 0);


you are right. I will change it to the assert()

+    /* TODO: to decide if KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is 
supported */


I suggest dropping that completely. As long as it's not upstream, not 
even the name of that thing is stable.


OK

+    new_block->guest_memfd = 
kvm_create_guest_memfd(new_block->max_length,

+    0, errp);
+    if (new_block->guest_memfd < 0) {
+    qemu_mutex_unlock_ramlist();
+    return;
+    }
+    }
+



In general, LGTM. With the two nits above:

Reviewed-by: David Hildenbrand 



Thanks!

Re: [PATCH v4 33/66] i386/tdx: Make memory type private by default

2024-01-28 Thread Xiaoyao Li


On 1/26/2024 10:58 PM, David Hildenbrand wrote:

On 25.01.24 04:22, Xiaoyao Li wrote:

By default (due to the recent UPM change), restricted memory attribute is
shared.  Convert the memory region from shared to private at the memory
slot creation time.

add kvm region registering function to check the flag
and convert the region, and add memory listener to TDX guest code to set
the flag to the possible memory region.

Without this patch
- Secure-EPT violation on private area
- KVM_MEMORY_FAULT EXIT (kvm -> qemu)
- qemu converts the 4K page from shared to private
- Resume VCPU execution
- Secure-EPT violation again
- KVM resolves EPT Violation
This also prevents huge page because page conversion is done at 4K
granularity.  Although it's possible to merge 4K private mapping into
2M large page, it slows guest boot.

With this patch
- After memory slot creation, convert the region from private to shared
- Secure-EPT violation on private area.
- KVM resolves EPT Violation

Originated-from: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
---
  include/exec/memory.h |  1 +
  target/i386/kvm/tdx.c | 20 
  2 files changed, 21 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7229fcc0415f..f25959f6d30f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -850,6 +850,7 @@ struct IOMMUMemoryRegion {
  #define MEMORY_LISTENER_PRIORITY_MIN    0
  #define MEMORY_LISTENER_PRIORITY_ACCEL  10
  #define MEMORY_LISTENER_PRIORITY_DEV_BACKEND    10
+#define MEMORY_LISTENER_PRIORITY_ACCEL_HIGH 20
  /**
   * struct MemoryListener: callbacks structure for updates to the 
physical memory map

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 7b250d80bc1d..f892551821ce 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
  #include "standard-headers/asm-x86/kvm_para.h"
  #include "sysemu/kvm.h"
  #include "sysemu/sysemu.h"
+#include "exec/address-spaces.h"
  #include "hw/i386/x86.h"
  #include "kvm_i386.h"
@@ -621,6 +622,19 @@ int tdx_pre_create_vcpu(CPUState *cpu, Error **errp)
  return 0;
  }
+static void tdx_guest_region_add(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+    memory_region_set_default_private(section->mr);
+}


That looks fishy. Why is TDX to decide what happens to other memory 
regions it doesn't own?


We should define that behavior when creating these memory region, and 
TDX could sanity check that they have been setup properly.


Let me ask differently: For which memory region where we have 
RAM_GUEST_MEMFD set would we *not* want to set private as default right 
from the start?


All memory regions have RAM_GUEST_MEMFD set will benefit from being 
private as default, for TDX guest.


I will update the implementation to set RAM_DEFAULT_PRIVATE flag when 
guest_memfd is created successfully, like


diff --git a/system/physmem.c b/system/physmem.c
index fc59470191ef..60676689c807 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1850,6 +1850,8 @@ static void ram_block_add(RAMBlock *new_block, 
Error **errp)

 qemu_mutex_unlock_ramlist();
 return;
 }
+
+new_block->flags |= RAM_DEFAULT_PRIVATE;
 }

then this patch can be dropped, and the calling of 
memory_region_set_default_private(mr) of Patch 45 can be dropped too.


I think this is what you suggested, right?

[RESEND v2 2/2] hw/riscv/virt-acpi-build.c: Generate SPCR table

2024-01-28 Thread Sia Jee Heng

Generate Serial Port Console Redirection Table (SPCR) for RISC-V
virtual machine.

Signed-off-by: Sia Jee Heng 
Reviewed-by: Daniel Henrique Barboza 
---
 hw/riscv/virt-acpi-build.c | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/hw/riscv/virt-acpi-build.c b/hw/riscv/virt-acpi-build.c
index 26c7e4482d..7fc5071c84 100644
--- a/hw/riscv/virt-acpi-build.c
+++ b/hw/riscv/virt-acpi-build.c
@@ -174,6 +174,42 @@ acpi_dsdt_add_uart(Aml *scope, const MemMapEntry 
*uart_memmap,
 aml_append(scope, dev);
 }
 
+/*
+ * Serial Port Console Redirection Table (SPCR)
+ * Rev: 1.07
+ */
+
+static void
+spcr_setup(GArray *table_data, BIOSLinker *linker, RISCVVirtState *s)
+{
+AcpiSpcrData serial = {
+.interface_type = 0,   /* 16550 compatible */
+.base_addr.id = AML_AS_SYSTEM_MEMORY,
+.base_addr.width = 32,
+.base_addr.offset = 0,
+.base_addr.size = 1,
+.base_addr.addr = s->memmap[VIRT_UART0].base,
+.interrupt_type = (1 << 4),/* Bit[4] RISC-V PLIC/APLIC */
+.pc_interrupt = 0,
+.interrupt = UART0_IRQ,
+.baud_rate = 7,/* 15200 */
+.parity = 0,
+.stop_bits = 1,
+.flow_control = 0,
+.terminal_type = 3,/* ANSI */
+.language = 0, /* Language */
+.pci_device_id = 0x,   /* not a PCI device*/
+.pci_vendor_id = 0x,   /* not a PCI device*/
+.pci_bus = 0,
+.pci_device = 0,
+.pci_function = 0,
+.pci_flags = 0,
+.pci_segment = 0,
+};
+
+build_spcr(table_data, linker, , 2, s->oem_id, s->oem_table_id);
+}
+
 /* RHCT Node[N] starts at offset 56 */
 #define RHCT_NODE_ARRAY_OFFSET 56
 
@@ -555,6 +591,9 @@ static void virt_acpi_build(RISCVVirtState *s, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_rhct(tables_blob, tables->linker, s);
 
+acpi_add_table(table_offsets, tables_blob);
+spcr_setup(tables_blob, tables->linker, s);
+
 acpi_add_table(table_offsets, tables_blob);
 {
 AcpiMcfgInfo mcfg = {
-- 
2.34.1

[RESEND v2 1/2] hw/arm/virt-acpi-build.c: Migrate SPCR creation to common location

2024-01-28 Thread Sia Jee Heng

RISC-V should also generate the SPCR in a manner similar to ARM.
Therefore, instead of replicating the code, relocate this function
to the common AML build.

Signed-off-by: Sia Jee Heng 
---
 hw/acpi/aml-build.c | 51 
 hw/arm/virt-acpi-build.c| 68 +++--
 include/hw/acpi/acpi-defs.h | 33 ++
 include/hw/acpi/aml-build.h |  4 +++
 4 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index af66bde0f5..f3904650e4 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1994,6 +1994,57 @@ static void build_processor_hierarchy_node(GArray *tbl, 
uint32_t flags,
 }
 }
 
+void build_spcr(GArray *table_data, BIOSLinker *linker,
+const AcpiSpcrData *f, const uint8_t rev,
+const char *oem_id, const char *oem_table_id)
+{
+AcpiTable table = { .sig = "SPCR", .rev = rev, .oem_id = oem_id,
+.oem_table_id = oem_table_id };
+
+acpi_table_begin(, table_data);
+/* Interface type */
+build_append_int_noprefix(table_data, f->interface_type, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 3);
+/* Base Address */
+build_append_gas(table_data, f->base_addr.id, f->base_addr.width,
+ f->base_addr.offset, f->base_addr.size,
+ f->base_addr.addr);
+/* Interrupt type */
+build_append_int_noprefix(table_data, f->interrupt_type, 1);
+/* IRQ */
+build_append_int_noprefix(table_data, f->pc_interrupt, 1);
+/* Global System Interrupt */
+build_append_int_noprefix(table_data, f->interrupt, 4);
+/* Baud Rate */
+build_append_int_noprefix(table_data, f->baud_rate, 1);
+/* Parity */
+build_append_int_noprefix(table_data, f->parity, 1);
+/* Stop Bits */
+build_append_int_noprefix(table_data, f->stop_bits, 1);
+/* Flow Control */
+build_append_int_noprefix(table_data, f->flow_control, 1);
+/* Terminal Type */
+build_append_int_noprefix(table_data, f->terminal_type, 1);
+/* PCI Device ID  */
+build_append_int_noprefix(table_data, f->pci_device_id, 2);
+/* PCI Vendor ID */
+build_append_int_noprefix(table_data, f->pci_vendor_id, 2);
+/* PCI Bus Number */
+build_append_int_noprefix(table_data, f->pci_bus, 1);
+/* PCI Device Number */
+build_append_int_noprefix(table_data, f->pci_device, 1);
+/* PCI Function Number */
+build_append_int_noprefix(table_data, f->pci_function, 1);
+/* PCI Flags */
+build_append_int_noprefix(table_data, f->pci_flags, 4);
+/* PCI Segment */
+build_append_int_noprefix(table_data, f->pci_segment, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+acpi_table_end(linker, );
+}
 /*
  * ACPI spec, Revision 6.3
  * 5.2.29 Processor Properties Topology Table (PPTT)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a22a2f43a5..195767c0f0 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -431,48 +431,34 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  * Rev: 1.07
  */
 static void
-build_spcr(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+spcr_setup(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 {
-AcpiTable table = { .sig = "SPCR", .rev = 2, .oem_id = vms->oem_id,
-.oem_table_id = vms->oem_table_id };
-
-acpi_table_begin(, table_data);
-
-/* Interface Type */
-build_append_int_noprefix(table_data, 3, 1); /* ARM PL011 UART */
-build_append_int_noprefix(table_data, 0, 3); /* Reserved */
-/* Base Address */
-build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 32, 0, 3,
- vms->memmap[VIRT_UART].base);
-/* Interrupt Type */
-build_append_int_noprefix(table_data,
-(1 << 3) /* Bit[3] ARMH GIC interrupt */, 1);
-build_append_int_noprefix(table_data, 0, 1); /* IRQ */
-/* Global System Interrupt */
-build_append_int_noprefix(table_data,
-  vms->irqmap[VIRT_UART] + ARM_SPI_BASE, 4);
-build_append_int_noprefix(table_data, 3 /* 9600 */, 1); /* Baud Rate */
-build_append_int_noprefix(table_data, 0 /* No Parity */, 1); /* Parity */
-/* Stop Bits */
-build_append_int_noprefix(table_data, 1 /* 1 Stop bit */, 1);
-/* Flow Control */
-build_append_int_noprefix(table_data,
-(1 << 1) /* RTS/CTS hardware flow control */, 1);
-/* Terminal Type */
-build_append_int_noprefix(table_data, 0 /* VT100 */, 1);
-build_append_int_noprefix(table_data, 0, 1); /* Language */
-/* PCI Device ID  */
-build_append_int_noprefix(table_data, 0x /* not a PCI device*/, 2);
-/* PCI Vendor ID */
-build_append_int_noprefix(table_data, 0x /* not a PCI device*/, 2);
-build_append_int_noprefix(table_data, 0, 1); /* PCI Bus Number */
-

[RESEND v2 0/2] RISC-V: ACPI: Enable SPCR

2024-01-28 Thread Sia Jee Heng

This series focuses on enabling the Serial Port Console Redirection (SPCR)
table for the RISC-V virt platform. Considering that ARM utilizes the same
function, the initial patch involves migrating the build_spcr function to
common code. This consolidation ensures that RISC-V avoids duplicating the
function.

The patch set is built upon Alistair's riscv-to-apply.next branch

Changes in v2:
- Renamed the build_spcr_rev2() function to spcr_setup().
- SPCR table version is passed from spcr_setup() to the common
  build_spcr() function.
- Added "Reviewed-by" from Daniel for patch 2.
- The term 'RFC' has been removed from this series, as the dependency code
  from [1] has been merged into Alistair's riscv-to-apply.next branch. The
  first series of this patch can be found at [2].

[1] 
https://lore.kernel.org/qemu-devel/20231218150247.466427-1-suni...@ventanamicro.com/
[2] 
https://lore.kernel.org/qemu-devel/20240105090608.5745-1-jeeheng@starfivetech.com/

Sia Jee Heng (2):
  hw/arm/virt-acpi-build.c: Migrate SPCR creation to common location
  hw/riscv/virt-acpi-build.c: Generate SPCR table

 hw/acpi/aml-build.c | 51 
 hw/arm/virt-acpi-build.c| 68 +++--
 hw/riscv/virt-acpi-build.c  | 39 +
 include/hw/acpi/acpi-defs.h | 33 ++
 include/hw/acpi/aml-build.h |  4 +++
 5 files changed, 154 insertions(+), 41 deletions(-)

-- 
2.34.1

RE: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing work

2024-01-28 Thread Liu, Yuan1

> -Original Message-
> From: Fabiano Rosas 
> Sent: Saturday, January 27, 2024 6:20 AM
> To: qemu-devel@nongnu.org
> Cc: Peter Xu ; Hao Xiang ;
> Liu, Yuan1 ; Bryan Zhang 
> Subject: [PATCH 0/5] migration/multifd: Prerequisite cleanups for ongoing
> work
> 
> Hi,
> 
> Here are two cleanups that are prerequiste for the fixed-ram work, but
> also affect the other series on the list at the moment, so I want to make
> sure it works for everyone:
> 
> 1) Separate multifd_ops from compression. The multifd_ops are
>currently coupled with the multifd_compression parameter.
> 
> We're adding new multifd_ops in the fixed-ram work and adding new
> compression ops in the compression work.
> 2) Add a new send hook. The multifd_send_thread code currently does
>some twists to support zero copy, which is a socket-only feature.
> 
> This might affect the zero page and DSA work which add code to
> multifd_send_thread.

Thank you for your reminder， I reviewed the patch set and there is 
a question.

Because this change has an impact on the previous live migration 
With IAA Patch, does the submission of the next version needs 
to be submitted based on this change?

> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1154332360
> 
> (I also tested zero copy locally. We cannot add a test for it because it
> needs root due to memory locking limits)
> 
> Fabiano Rosas (5):
>   migration/multifd: Separate compression ops from non-compression
>   migration/multifd: Move multifd_socket_ops to socket.c
>   migration/multifd: Add multifd_ops->send
>   migration/multifd: Simplify zero copy send
>   migration/multifd: Move zero copy flag into multifd_socket_setup
> 
>  migration/multifd-zlib.c |   9 ++-
>  migration/multifd-zstd.c |   9 ++-
>  migration/multifd.c  | 164 +--
>  migration/multifd.h  |   6 +-
>  migration/socket.c   |  90 -
>  5 files changed, 128 insertions(+), 150 deletions(-)
> 
> --
> 2.35.3

[PATCH] hw/scsi/lsi53c895a: add missing decrement of reentrancy counter

2024-01-28 Thread Sven Schnelle

When the maximum count of SCRIPTS instructions is reached, the code
stops execution and returns, but fails to decrement the reentrancy
counter. This effectively renders the SCSI controller unusable
because on next entry the reentrancy counter is still above the limit.

This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS
loops.

Fixes: b987718bbb ("hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI 
controller (CVE-2023-0330)")
Signed-off-by: Sven Schnelle 
---
 hw/scsi/lsi53c895a.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 34e3b89287..d607a5f9fb 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -1159,6 +1159,7 @@ again:
 lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
 lsi_disconnect(s);
 trace_lsi_execute_script_stop();
+reentrancy_level--;
 return;
 }
 insn = read_dword(s, s->dsp);
-- 
2.43.0

[Stable-8.2.1 64/71] block/blklogwrites: Fix a bug when logging "write zeroes" operations.

2024-01-28 Thread Michael Tokarev

From: Ari Sundholm 

There is a bug in the blklogwrites driver pertaining to logging "write
zeroes" operations, causing log corruption. This can be easily observed
by setting detect-zeroes to something other than "off" for the driver.

The issue is caused by a concurrency bug pertaining to the fact that
"write zeroes" operations have to be logged in two parts: first the log
entry metadata, then the zeroed-out region. While the log entry
metadata is being written by bdrv_co_pwritev(), another operation may
begin in the meanwhile and modify the state of the blklogwrites driver.
This is as intended by the coroutine-driven I/O model in QEMU, of
course.

Unfortunately, this specific scenario is mishandled. A short example:
1. Initially, in the current operation (#1), the current log sector
number in the driver state is only incremented by the number of sectors
taken by the log entry metadata, after which the log entry metadata is
written. The current operation yields.
2. Another operation (#2) may start while the log entry metadata is
being written. It uses the current log position as the start offset for
its log entry. This is in the sector right after the operation #1 log
entry metadata, which is bad!
3. After bdrv_co_pwritev() returns (#1), the current log sector
number is reread from the driver state in order to find out the start
offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the
offset will be the sector right after the (misplaced) operation #2 log
entry, which means that the zeroed-out region begins at the wrong
offset.
4. As a result of the above, the log is corrupt.

Fix this by only reading the driver metadata once, computing the
offsets and sizes in one go (including the optional zeroed-out region)
and setting the log sector number to the appropriate value for the next
operation in line.

Signed-off-by: Ari Sundholm 
Cc: qemu-sta...@nongnu.org
Message-ID: <20240109184646.1128475-1-meg...@gmx.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit a9c8ea95470c27a8a02062b67f9fa6940e828ab6)
Signed-off-by: Michael Tokarev 

diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 3678f6cf42..84e03f309f 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -328,22 +328,39 @@ static void coroutine_fn GRAPH_RDLOCK
 blk_log_writes_co_do_log(BlkLogWritesLogReq *lr)
 {
 BDRVBlkLogWritesState *s = lr->bs->opaque;
-uint64_t cur_log_offset = s->cur_log_sector << s->sectorbits;
 
-s->nr_entries++;
-s->cur_log_sector +=
-ROUND_UP(lr->qiov->size, s->sectorsize) >> s->sectorbits;
+/*
+ * Determine the offsets and sizes of different parts of the entry, and
+ * update the state of the driver.
+ *
+ * This needs to be done in one go, before any actual I/O is done, as the
+ * log entry may have to be written in two parts, and the state of the
+ * driver may be modified by other driver operations while waiting for the
+ * I/O to complete.
+ */
+const uint64_t entry_start_sector = s->cur_log_sector;
+const uint64_t entry_offset = entry_start_sector << s->sectorbits;
+const uint64_t qiov_aligned_size = ROUND_UP(lr->qiov->size, s->sectorsize);
+const uint64_t entry_aligned_size = qiov_aligned_size +
+ROUND_UP(lr->zero_size, s->sectorsize);
+const uint64_t entry_nr_sectors = entry_aligned_size >> s->sectorbits;
 
-lr->log_ret = bdrv_co_pwritev(s->log_file, cur_log_offset, lr->qiov->size,
+s->nr_entries++;
+s->cur_log_sector += entry_nr_sectors;
+
+/*
+ * Write the log entry. Note that if this is a "write zeroes" operation,
+ * only the entry header is written here, with the zeroing being done
+ * separately below.
+ */
+lr->log_ret = bdrv_co_pwritev(s->log_file, entry_offset, lr->qiov->size,
   lr->qiov, 0);
 
 /* Logging for the "write zeroes" operation */
 if (lr->log_ret == 0 && lr->zero_size) {
-cur_log_offset = s->cur_log_sector << s->sectorbits;
-s->cur_log_sector +=
-ROUND_UP(lr->zero_size, s->sectorsize) >> s->sectorbits;
+const uint64_t zeroes_offset = entry_offset + qiov_aligned_size;
 
-lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, cur_log_offset,
+lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, zeroes_offset,
 lr->zero_size, 0);
 }
 
-- 
2.39.2

[Stable-8.2.1 69/71] target/xtensa: fix OOB TLB entry access

2024-01-28 Thread Michael Tokarev

From: Max Filippov 

r[id]tlb[01], [iw][id]tlb opcodes use TLB way index passed in a register
by the guest. The host uses 3 bits of the index for ITLB indexing and 4
bits for DTLB, but there's only 7 entries in the ITLB array and 10 in
the DTLB array, so a malicious guest may trigger out-of-bound access to
these arrays.

Change split_tlb_entry_spec return type to bool to indicate whether TLB
way passed to it is valid. Change get_tlb_entry to return NULL in case
invalid TLB way is requested. Add assertion to xtensa_tlb_get_entry that
requested TLB way and entry indices are valid. Add checks to the
[rwi]tlb helpers that requested TLB way is valid and return 0 or do
nothing when it's not.

Cc: qemu-sta...@nongnu.org
Fixes: b67ea0cd7441 ("target-xtensa: implement memory protection options")
Signed-off-by: Max Filippov 
Reviewed-by: Peter Maydell 
Message-id: 20231215120307.545381-1-jcmvb...@gmail.com
Signed-off-by: Peter Maydell 
(cherry picked from commit 604927e357c2b292c70826e4ce42574ad126ef32)
Signed-off-by: Michael Tokarev 

diff --git a/target/xtensa/mmu_helper.c b/target/xtensa/mmu_helper.c
index 12552a3347..2fda4e887c 100644
--- a/target/xtensa/mmu_helper.c
+++ b/target/xtensa/mmu_helper.c
@@ -224,22 +224,31 @@ static void split_tlb_entry_spec_way(const CPUXtensaState 
*env, uint32_t v,
  * Split TLB address into TLB way, entry index and VPN (with index).
  * See ISA, 4.6.5.5 - 4.6.5.8 for the TLB addressing format
  */
-static void split_tlb_entry_spec(CPUXtensaState *env, uint32_t v, bool dtlb,
-uint32_t *vpn, uint32_t *wi, uint32_t *ei)
+static bool split_tlb_entry_spec(CPUXtensaState *env, uint32_t v, bool dtlb,
+ uint32_t *vpn, uint32_t *wi, uint32_t *ei)
 {
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 *wi = v & (dtlb ? 0xf : 0x7);
-split_tlb_entry_spec_way(env, v, dtlb, vpn, *wi, ei);
+if (*wi < (dtlb ? env->config->dtlb.nways : env->config->itlb.nways)) {
+split_tlb_entry_spec_way(env, v, dtlb, vpn, *wi, ei);
+return true;
+} else {
+return false;
+}
 } else {
 *vpn = v & REGION_PAGE_MASK;
 *wi = 0;
 *ei = (v >> 29) & 0x7;
+return true;
 }
 }
 
 static xtensa_tlb_entry *xtensa_tlb_get_entry(CPUXtensaState *env, bool dtlb,
   unsigned wi, unsigned ei)
 {
+const xtensa_tlb *tlb = dtlb ? >config->dtlb : >config->itlb;
+
+assert(wi < tlb->nways && ei < tlb->way_size[wi]);
 return dtlb ?
 env->dtlb[wi] + ei :
 env->itlb[wi] + ei;
@@ -252,11 +261,14 @@ static xtensa_tlb_entry *get_tlb_entry(CPUXtensaState 
*env,
 uint32_t wi;
 uint32_t ei;
 
-split_tlb_entry_spec(env, v, dtlb, , , );
-if (pwi) {
-*pwi = wi;
+if (split_tlb_entry_spec(env, v, dtlb, , , )) {
+if (pwi) {
+*pwi = wi;
+}
+return xtensa_tlb_get_entry(env, dtlb, wi, ei);
+} else {
+return NULL;
 }
-return xtensa_tlb_get_entry(env, dtlb, wi, ei);
 }
 
 static void xtensa_tlb_set_entry_mmu(const CPUXtensaState *env,
@@ -482,7 +494,12 @@ uint32_t HELPER(rtlb0)(CPUXtensaState *env, uint32_t v, 
uint32_t dtlb)
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 uint32_t wi;
 const xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, );
-return (entry->vaddr & get_vpn_mask(env, dtlb, wi)) | entry->asid;
+
+if (entry) {
+return (entry->vaddr & get_vpn_mask(env, dtlb, wi)) | entry->asid;
+} else {
+return 0;
+}
 } else {
 return v & REGION_PAGE_MASK;
 }
@@ -491,7 +508,12 @@ uint32_t HELPER(rtlb0)(CPUXtensaState *env, uint32_t v, 
uint32_t dtlb)
 uint32_t HELPER(rtlb1)(CPUXtensaState *env, uint32_t v, uint32_t dtlb)
 {
 const xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, NULL);
-return entry->paddr | entry->attr;
+
+if (entry) {
+return entry->paddr | entry->attr;
+} else {
+return 0;
+}
 }
 
 void HELPER(itlb)(CPUXtensaState *env, uint32_t v, uint32_t dtlb)
@@ -499,7 +521,7 @@ void HELPER(itlb)(CPUXtensaState *env, uint32_t v, uint32_t 
dtlb)
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 uint32_t wi;
 xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, );
-if (entry->variable && entry->asid) {
+if (entry && entry->variable && entry->asid) {
 tlb_flush_page(env_cpu(env), entry->vaddr);
 entry->asid = 0;
 }
@@ -537,8 +559,9 @@ void HELPER(wtlb)(CPUXtensaState *env, uint32_t p, uint32_t 
v, uint32_t dtlb)
 uint32_t vpn;
 uint32_t wi;
 uint32_t ei;
-split_tlb_entry_spec(env, v, dtlb, , , );
-xtensa_tlb_set_entry(env, dtlb, wi, ei, vpn, p);
+if (split_tlb_entry_spec(env, v, dtlb, , , )) {
+xtensa_tlb_set_entry(env, dtlb, wi, ei, vpn, p);
+}
 }
 
 /*!
-- 
2.39.2

[Stable-8.1.5 35/36] target/xtensa: fix OOB TLB entry access

2024-01-28 Thread Michael Tokarev

From: Max Filippov 

r[id]tlb[01], [iw][id]tlb opcodes use TLB way index passed in a register
by the guest. The host uses 3 bits of the index for ITLB indexing and 4
bits for DTLB, but there's only 7 entries in the ITLB array and 10 in
the DTLB array, so a malicious guest may trigger out-of-bound access to
these arrays.

Change split_tlb_entry_spec return type to bool to indicate whether TLB
way passed to it is valid. Change get_tlb_entry to return NULL in case
invalid TLB way is requested. Add assertion to xtensa_tlb_get_entry that
requested TLB way and entry indices are valid. Add checks to the
[rwi]tlb helpers that requested TLB way is valid and return 0 or do
nothing when it's not.

Cc: qemu-sta...@nongnu.org
Fixes: b67ea0cd7441 ("target-xtensa: implement memory protection options")
Signed-off-by: Max Filippov 
Reviewed-by: Peter Maydell 
Message-id: 20231215120307.545381-1-jcmvb...@gmail.com
Signed-off-by: Peter Maydell 
(cherry picked from commit 604927e357c2b292c70826e4ce42574ad126ef32)
Signed-off-by: Michael Tokarev 

diff --git a/target/xtensa/mmu_helper.c b/target/xtensa/mmu_helper.c
index fa66e8e867..c4b4df4d74 100644
--- a/target/xtensa/mmu_helper.c
+++ b/target/xtensa/mmu_helper.c
@@ -226,22 +226,31 @@ static void split_tlb_entry_spec_way(const CPUXtensaState 
*env, uint32_t v,
  * Split TLB address into TLB way, entry index and VPN (with index).
  * See ISA, 4.6.5.5 - 4.6.5.8 for the TLB addressing format
  */
-static void split_tlb_entry_spec(CPUXtensaState *env, uint32_t v, bool dtlb,
-uint32_t *vpn, uint32_t *wi, uint32_t *ei)
+static bool split_tlb_entry_spec(CPUXtensaState *env, uint32_t v, bool dtlb,
+ uint32_t *vpn, uint32_t *wi, uint32_t *ei)
 {
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 *wi = v & (dtlb ? 0xf : 0x7);
-split_tlb_entry_spec_way(env, v, dtlb, vpn, *wi, ei);
+if (*wi < (dtlb ? env->config->dtlb.nways : env->config->itlb.nways)) {
+split_tlb_entry_spec_way(env, v, dtlb, vpn, *wi, ei);
+return true;
+} else {
+return false;
+}
 } else {
 *vpn = v & REGION_PAGE_MASK;
 *wi = 0;
 *ei = (v >> 29) & 0x7;
+return true;
 }
 }
 
 static xtensa_tlb_entry *xtensa_tlb_get_entry(CPUXtensaState *env, bool dtlb,
   unsigned wi, unsigned ei)
 {
+const xtensa_tlb *tlb = dtlb ? >config->dtlb : >config->itlb;
+
+assert(wi < tlb->nways && ei < tlb->way_size[wi]);
 return dtlb ?
 env->dtlb[wi] + ei :
 env->itlb[wi] + ei;
@@ -254,11 +263,14 @@ static xtensa_tlb_entry *get_tlb_entry(CPUXtensaState 
*env,
 uint32_t wi;
 uint32_t ei;
 
-split_tlb_entry_spec(env, v, dtlb, , , );
-if (pwi) {
-*pwi = wi;
+if (split_tlb_entry_spec(env, v, dtlb, , , )) {
+if (pwi) {
+*pwi = wi;
+}
+return xtensa_tlb_get_entry(env, dtlb, wi, ei);
+} else {
+return NULL;
 }
-return xtensa_tlb_get_entry(env, dtlb, wi, ei);
 }
 
 static void xtensa_tlb_set_entry_mmu(const CPUXtensaState *env,
@@ -484,7 +496,12 @@ uint32_t HELPER(rtlb0)(CPUXtensaState *env, uint32_t v, 
uint32_t dtlb)
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 uint32_t wi;
 const xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, );
-return (entry->vaddr & get_vpn_mask(env, dtlb, wi)) | entry->asid;
+
+if (entry) {
+return (entry->vaddr & get_vpn_mask(env, dtlb, wi)) | entry->asid;
+} else {
+return 0;
+}
 } else {
 return v & REGION_PAGE_MASK;
 }
@@ -493,7 +510,12 @@ uint32_t HELPER(rtlb0)(CPUXtensaState *env, uint32_t v, 
uint32_t dtlb)
 uint32_t HELPER(rtlb1)(CPUXtensaState *env, uint32_t v, uint32_t dtlb)
 {
 const xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, NULL);
-return entry->paddr | entry->attr;
+
+if (entry) {
+return entry->paddr | entry->attr;
+} else {
+return 0;
+}
 }
 
 void HELPER(itlb)(CPUXtensaState *env, uint32_t v, uint32_t dtlb)
@@ -501,7 +523,7 @@ void HELPER(itlb)(CPUXtensaState *env, uint32_t v, uint32_t 
dtlb)
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 uint32_t wi;
 xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, );
-if (entry->variable && entry->asid) {
+if (entry && entry->variable && entry->asid) {
 tlb_flush_page(env_cpu(env), entry->vaddr);
 entry->asid = 0;
 }
@@ -539,8 +561,9 @@ void HELPER(wtlb)(CPUXtensaState *env, uint32_t p, uint32_t 
v, uint32_t dtlb)
 uint32_t vpn;
 uint32_t wi;
 uint32_t ei;
-split_tlb_entry_spec(env, v, dtlb, , , );
-xtensa_tlb_set_entry(env, dtlb, wi, ei, vpn, p);
+if (split_tlb_entry_spec(env, v, dtlb, , , )) {
+xtensa_tlb_set_entry(env, dtlb, wi, ei, vpn, p);
+}
 }
 
 /*!
-- 
2.39.2

[Stable-8.2.1 70/71] target/arm: Fix A64 scalar SQSHRN and SQRSHRN

2024-01-28 Thread Michael Tokarev

From: Peter Maydell 

In commit 1b7bc9b5c8bf374dd we changed handle_vec_simd_sqshrn() so
that instead of starting with a 0 value and depositing in each new
element from the narrowing operation, it instead started with the raw
result of the narrowing operation of the first element.

This is fine in the vector case, because the deposit operations for
the second and subsequent elements will always overwrite any higher
bits that might have been in the first element's result value in
tcg_rd.  However in the scalar case we only go through this loop
once.  The effect is that for a signed narrowing operation, if the
result is negative then we will now return a value where the bits
above the first element are incorrectly 1 (because the narrowfn
returns a sign-extended result, not one that is truncated to the
element size).

Fix this by using an extract operation to get exactly the correct
bits of the output of the narrowfn for element 1, instead of a
plain move.

Cc: qemu-sta...@nongnu.org
Fixes: 1b7bc9b5c8bf374dd3 ("target/arm: Avoid tcg_const_ptr in 
handle_vec_simd_sqshrn")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2089
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20240123153416.877308-1-peter.mayd...@linaro.org
(cherry picked from commit 6fffc8378562c7fea6290c430b4f653f830a4c1a)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index a2e49c39f9..f2d05c589c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8221,7 +8221,7 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool 
is_scalar, bool is_q,
 narrowfn(tcg_rd_narrowed, tcg_env, tcg_rd);
 tcg_gen_extu_i32_i64(tcg_rd, tcg_rd_narrowed);
 if (i == 0) {
-tcg_gen_mov_i64(tcg_final, tcg_rd);
+tcg_gen_extract_i64(tcg_final, tcg_rd, 0, esize);
 } else {
 tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, 
esize);
 }
-- 
2.39.2

[Stable-8.2.1 58/71] coroutine-ucontext: Save fake stack for pooled coroutine

2024-01-28 Thread Michael Tokarev

From: Akihiko Odaki 

Coroutine may be pooled even after COROUTINE_TERMINATE if
CONFIG_COROUTINE_POOL is enabled and fake stack should be saved in
such a case to keep AddressSanitizerUseAfterReturn working. Even worse,
I'm seeing stack corruption without fake stack being saved.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240117-asan-v2-1-26f9e1ea6...@daynix.com>
(cherry picked from commit d9945ccda08ef83b09ac7725b6ee2d1959f2c0c0)
Signed-off-by: Michael Tokarev 

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 7b304c79d9..8ef603d081 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -119,13 +119,11 @@ void finish_switch_fiber(void *fake_stack_save)
 
 /* always_inline is required to avoid TSan runtime fatal errors. */
 static inline __attribute__((always_inline))
-void start_switch_fiber_asan(CoroutineAction action, void **fake_stack_save,
+void start_switch_fiber_asan(void **fake_stack_save,
  const void *bottom, size_t size)
 {
 #ifdef CONFIG_ASAN
-__sanitizer_start_switch_fiber(
-action == COROUTINE_TERMINATE ? NULL : fake_stack_save,
-bottom, size);
+__sanitizer_start_switch_fiber(fake_stack_save, bottom, size);
 #endif
 }
 
@@ -165,7 +163,7 @@ static void coroutine_trampoline(int i0, int i1)
 if (!sigsetjmp(self->env, 0)) {
 CoroutineUContext *leaderp = get_ptr_leader();
 
-start_switch_fiber_asan(COROUTINE_YIELD, _stack_save,
+start_switch_fiber_asan(_stack_save,
 leaderp->stack, leaderp->stack_size);
 start_switch_fiber_tsan(_stack_save, self, true); /* true=caller 
*/
 siglongjmp(*(sigjmp_buf *)co->entry_arg, 1);
@@ -226,8 +224,7 @@ Coroutine *qemu_coroutine_new(void)
 
 /* swapcontext() in, siglongjmp() back out */
 if (!sigsetjmp(old_env, 0)) {
-start_switch_fiber_asan(COROUTINE_YIELD, _stack_save, co->stack,
-co->stack_size);
+start_switch_fiber_asan(_stack_save, co->stack, co->stack_size);
 start_switch_fiber_tsan(_stack_save,
 co, false); /* false=not caller */
 
@@ -269,10 +266,28 @@ static inline void 
valgrind_stack_deregister(CoroutineUContext *co)
 #endif
 #endif
 
+#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
+static void coroutine_fn terminate_asan(void *opaque)
+{
+CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, opaque);
+
+set_current(opaque);
+start_switch_fiber_asan(NULL, to->stack, to->stack_size);
+G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN));
+siglongjmp(to->env, COROUTINE_ENTER);
+}
+#endif
+
 void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
 
+#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
+co_->entry_arg = qemu_coroutine_self();
+co_->entry = terminate_asan;
+qemu_coroutine_switch(co_->entry_arg, co_, COROUTINE_ENTER);
+#endif
+
 #ifdef CONFIG_VALGRIND_H
 valgrind_stack_deregister(co);
 #endif
@@ -305,8 +320,10 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
 
 ret = sigsetjmp(from->env, 0);
 if (ret == 0) {
-start_switch_fiber_asan(action, _stack_save, to->stack,
-to->stack_size);
+start_switch_fiber_asan(IS_ENABLED(CONFIG_COROUTINE_POOL) ||
+action != COROUTINE_TERMINATE ?
+_stack_save : NULL,
+to->stack, to->stack_size);
 start_switch_fiber_tsan(_stack_save,
 to, false); /* false=not caller */
 siglongjmp(to->env, action);
-- 
2.39.2

[Stable-7.2.9 23/30] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status

2024-01-28 Thread Michael Tokarev

From: Fiona Ebner 

Using fleecing backup like in [0] on a qcow2 image (with metadata
preallocation) can lead to the following assertion failure:

> bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.

In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
will be set by the qcow2 driver, so the caller will recursively check
the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
chain, in bdrv_co_do_block_status() for the snapshot-access driver,
the assertion failure will happen, because both flags are set.

To fix it, clear the recurse flag after the recursive check was done.

In detail:

> #0  qcow2_co_block_status

Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID.

> #1  bdrv_co_do_block_status

Because of the data flag, bdrv_co_do_block_status() will now also set
BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
bdrv_co_do_block_status() for the bdrv_file child will be called,
which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.

Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.

> #2  bdrv_co_common_block_status_above
> #3  bdrv_co_block_status_above
> #4  bdrv_co_block_status
> #5  cbw_co_snapshot_block_status
> #6  bdrv_co_snapshot_block_status
> #7  snapshot_access_co_block_status
> #8  bdrv_co_do_block_status

Return value is propagated all the way up to here, where the assertion
failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
both set.

> #9  bdrv_co_common_block_status_above
> #10 bdrv_co_block_status_above
> #11 block_copy_block_status
> #12 block_copy_dirty_clusters
> #13 block_copy_common
> #14 block_copy_async_co_entry
> #15 coroutine_trampoline

[0]:

> #!/bin/bash
> rm /tmp/disk.qcow2
> ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev 
> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> --blockdev 
> qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> --blockdev 
> qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> < {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", 
> "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", 
> "file": "node3", "node-name": "snap0" } }
> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": 
> "node1", "sync": "full", "job-id": "backup0" } }
> EOF

Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20240116154839.401030-1-f.eb...@proxmox.com
Signed-off-by: Stefan Hajnoczi 
(cherry picked from commit 8a9be7992426c8920d4178e7dca59306a18c7a3a)
Signed-off-by: Michael Tokarev 

diff --git a/block/io.c b/block/io.c
index bbaa0d1b2d..4589a58917 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2595,6 +2595,16 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 ret |= (ret2 & BDRV_BLOCK_ZERO);
 }
 }
+
+/*
+ * Now that the recursive search was done, clear the flag. Otherwise,
+ * with more complicated block graphs like snapshot-access ->
+ * copy-before-write -> qcow2, where the return value will be 
propagated
+ * further up to a parent bdrv_co_do_block_status() call, both the
+ * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which is
+ * not allowed.
+ */
+ret &= ~BDRV_BLOCK_RECURSE;
 }
 
 out:
-- 
2.39.2

[Stable-8.1.5 30/36] block/blklogwrites: Fix a bug when logging "write zeroes" operations.

2024-01-28 Thread Michael Tokarev

From: Ari Sundholm 

There is a bug in the blklogwrites driver pertaining to logging "write
zeroes" operations, causing log corruption. This can be easily observed
by setting detect-zeroes to something other than "off" for the driver.

The issue is caused by a concurrency bug pertaining to the fact that
"write zeroes" operations have to be logged in two parts: first the log
entry metadata, then the zeroed-out region. While the log entry
metadata is being written by bdrv_co_pwritev(), another operation may
begin in the meanwhile and modify the state of the blklogwrites driver.
This is as intended by the coroutine-driven I/O model in QEMU, of
course.

Unfortunately, this specific scenario is mishandled. A short example:
1. Initially, in the current operation (#1), the current log sector
number in the driver state is only incremented by the number of sectors
taken by the log entry metadata, after which the log entry metadata is
written. The current operation yields.
2. Another operation (#2) may start while the log entry metadata is
being written. It uses the current log position as the start offset for
its log entry. This is in the sector right after the operation #1 log
entry metadata, which is bad!
3. After bdrv_co_pwritev() returns (#1), the current log sector
number is reread from the driver state in order to find out the start
offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the
offset will be the sector right after the (misplaced) operation #2 log
entry, which means that the zeroed-out region begins at the wrong
offset.
4. As a result of the above, the log is corrupt.

Fix this by only reading the driver metadata once, computing the
offsets and sizes in one go (including the optional zeroed-out region)
and setting the log sector number to the appropriate value for the next
operation in line.

Signed-off-by: Ari Sundholm 
Cc: qemu-sta...@nongnu.org
Message-ID: <20240109184646.1128475-1-meg...@gmx.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit a9c8ea95470c27a8a02062b67f9fa6940e828ab6)
Signed-off-by: Michael Tokarev 

diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 3ea7141cb5..5d715383c9 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -324,22 +324,39 @@ static void coroutine_fn GRAPH_RDLOCK
 blk_log_writes_co_do_log(BlkLogWritesLogReq *lr)
 {
 BDRVBlkLogWritesState *s = lr->bs->opaque;
-uint64_t cur_log_offset = s->cur_log_sector << s->sectorbits;
 
-s->nr_entries++;
-s->cur_log_sector +=
-ROUND_UP(lr->qiov->size, s->sectorsize) >> s->sectorbits;
+/*
+ * Determine the offsets and sizes of different parts of the entry, and
+ * update the state of the driver.
+ *
+ * This needs to be done in one go, before any actual I/O is done, as the
+ * log entry may have to be written in two parts, and the state of the
+ * driver may be modified by other driver operations while waiting for the
+ * I/O to complete.
+ */
+const uint64_t entry_start_sector = s->cur_log_sector;
+const uint64_t entry_offset = entry_start_sector << s->sectorbits;
+const uint64_t qiov_aligned_size = ROUND_UP(lr->qiov->size, s->sectorsize);
+const uint64_t entry_aligned_size = qiov_aligned_size +
+ROUND_UP(lr->zero_size, s->sectorsize);
+const uint64_t entry_nr_sectors = entry_aligned_size >> s->sectorbits;
 
-lr->log_ret = bdrv_co_pwritev(s->log_file, cur_log_offset, lr->qiov->size,
+s->nr_entries++;
+s->cur_log_sector += entry_nr_sectors;
+
+/*
+ * Write the log entry. Note that if this is a "write zeroes" operation,
+ * only the entry header is written here, with the zeroing being done
+ * separately below.
+ */
+lr->log_ret = bdrv_co_pwritev(s->log_file, entry_offset, lr->qiov->size,
   lr->qiov, 0);
 
 /* Logging for the "write zeroes" operation */
 if (lr->log_ret == 0 && lr->zero_size) {
-cur_log_offset = s->cur_log_sector << s->sectorbits;
-s->cur_log_sector +=
-ROUND_UP(lr->zero_size, s->sectorsize) >> s->sectorbits;
+const uint64_t zeroes_offset = entry_offset + qiov_aligned_size;
 
-lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, cur_log_offset,
+lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, zeroes_offset,
 lr->zero_size, 0);
 }
 
-- 
2.39.2

[Stable-8.2.1 65/71] iotests: add filter_qmp_generated_node_ids()

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

Add a filter function for QMP responses that contain QEMU's
automatically generated node ids. The ids change between runs and must
be masked in the reference output.

The next commit will use this new function.

Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-2-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit da62b507a20510d819bcfbe8f5e573409b954006)
Signed-off-by: Michael Tokarev 

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e5c5798c71..ea48af4a7b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -651,6 +651,13 @@ def _filter(_key, value):
 def filter_generated_node_ids(msg):
 return re.sub("#block[0-9]+", "NODE_NAME", msg)
 
+def filter_qmp_generated_node_ids(qmsg):
+def _filter(_key, value):
+if is_str(value):
+return filter_generated_node_ids(value)
+return value
+return filter_qmp(qmsg, _filter)
+
 def filter_img_info(output: str, filename: str,
 drop_child_info: bool = True) -> str:
 lines = []
-- 
2.39.2

[Stable-8.1.5 28/36] tcg/arm: Fix SIGILL in tcg_out_qemu_st_direct

2024-01-28 Thread Michael Tokarev

From: Joseph Burt 

When tcg_out_qemu_st_{index,direct} were merged, the direct case for
MO_64 was omitted, causing qemu_st_i64 to be encoded as 0x due
to underflow when adding h.base and h.index.

Fixes: 1df6d611bdc2 ("tcg/arm: Introduce HostAddress")
Signed-off-by: Joseph Burt 
Message-Id: <20240121211439.100829-1-caseo...@gmail.com>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
(cherry picked from commit 9f6523e8e4689cafdbed7c10b7cf7c775b5a607b)
Signed-off-by: Michael Tokarev 

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 83e286088f..1864b8054f 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1667,6 +1667,9 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp 
opc, TCGReg datalo,
 } else {
 tcg_out_strd_r(s, h.cond, datalo, h.base, h.index);
 }
+} else if (h.index < 0) {
+tcg_out_st32_12(s, h.cond, datalo, h.base, 0);
+tcg_out_st32_12(s, h.cond, datahi, h.base, 4);
 } else if (h.index_scratch) {
 tcg_out_st32_rwb(s, h.cond, datalo, h.index, h.base);
 tcg_out_st32_12(s, h.cond, datahi, h.index, 4);
-- 
2.39.2

[Stable-8.2.1 56/71] accel/tcg: Revert mapping of PCREL translation block to multiple virtual addresses

2024-01-28 Thread Michael Tokarev

From: Paolo Bonzini 

This is causing regressions that have not been analyzed yet.  Revert the
change on stable branches.

Cc: qemu-sta...@nongnu.org
Cc: Michael Tokarev 
Related: https://gitlab.com/qemu-project/qemu/-/issues/2092
Signed-off-by: Paolo Bonzini 
Signed-off-by: Michael Tokarev 

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c938eb96f8..6a4af14d32 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -183,7 +183,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
 
-if ((tb_cflags(tb) & CF_PCREL || tb->pc == desc->pc) &&
+if (tb->pc == desc->pc &&
 tb_page_addr0(tb) == desc->page_addr0 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
@@ -233,7 +233,7 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, 
vaddr pc,
 return NULL;
 }
 desc.page_addr0 = phys_pc;
-h = tb_hash_func(phys_pc, (cflags & CF_PCREL ? 0 : pc),
+h = tb_hash_func(phys_pc, pc,
  flags, cs_base, cflags);
 return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
 }
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 3d2a896220..0d069a081e 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -47,7 +47,7 @@ static bool tb_cmp(const void *ap, const void *bp)
 const TranslationBlock *a = ap;
 const TranslationBlock *b = bp;
 
-return ((tb_cflags(a) & CF_PCREL || a->pc == b->pc) &&
+return (a->pc == b->pc &&
 a->cs_base == b->cs_base &&
 a->flags == b->flags &&
 (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
@@ -916,7 +916,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, 
bool rm_from_page_list)
 
 /* remove the TB from the hash list */
 phys_pc = tb_page_addr0(tb);
-h = tb_hash_func(phys_pc, (orig_cflags & CF_PCREL ? 0 : tb->pc),
+h = tb_hash_func(phys_pc, tb->pc,
  tb->flags, tb->cs_base, orig_cflags);
 if (!qht_remove(_ctx.htable, tb, h)) {
 return;
@@ -983,7 +983,7 @@ TranslationBlock *tb_link_page(TranslationBlock *tb)
 tb_record(tb);
 
 /* add in the hash table */
-h = tb_hash_func(tb_page_addr0(tb), (tb->cflags & CF_PCREL ? 0 : tb->pc),
+h = tb_hash_func(tb_page_addr0(tb), tb->pc,
  tb->flags, tb->cs_base, tb->cflags);
 qht_insert(_ctx.htable, tb, h, _tb);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 79a88f5fb7..c1708afcb0 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -327,9 +327,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 gen_code_buf = tcg_ctx->code_gen_ptr;
 tb->tc.ptr = tcg_splitwx_to_rx(gen_code_buf);
-if (!(cflags & CF_PCREL)) {
-tb->pc = pc;
-}
+tb->pc = pc;
 tb->cs_base = cs_base;
 tb->flags = flags;
 tb->cflags = cflags;
-- 
2.39.2

[Stable-8.2.1 67/71] monitor: only run coroutine commands in qemu_aio_context

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not
polled during nested event loops. The coroutine currently reschedules
itself in the main loop's qemu_aio_context AioContext, which is polled
during nested event loops. One known problem is that QMP device-add
calls drain_call_rcu(), which temporarily drops the BQL, leading to all
sorts of havoc like other vCPU threads re-entering device emulation code
while another vCPU thread is waiting in device emulation code with
aio_poll().

Paolo Bonzini suggested running non-coroutine QMP handlers in the
iohandler AioContext. This avoids trouble with nested event loops. His
original idea was to move coroutine rescheduling to
monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch()
because we don't know if the QMP handler needs to run in coroutine
context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have
been nicer since it's associated with the monitor implementation and not
as general as qmp_dispatch(), which is also used by qemu-ga.

A number of qemu-iotests need updated .out files because the order of
QMP events vs QMP responses has changed.

Solves Issue #1933.

Cc: qemu-sta...@nongnu.org
Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use 
drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Buglink: https://issues.redhat.com/browse/RHEL-17369
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-4-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Tested-by: Fiona Ebner 
Signed-off-by: Kevin Wolf 
(cherry picked from commit effd60c878176bcaf97fa7ce2b12d04bb8ead6f7)
Signed-off-by: Michael Tokarev 

diff --git a/monitor/qmp.c b/monitor/qmp.c
index 6eee450fe4..a239945e8d 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -321,14 +321,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 qemu_coroutine_yield();
 }
 
-/*
- * Move the coroutine from iohandler_ctx to qemu_aio_context for
- * executing the command handler so that it can make progress if it
- * involves an AIO_WAIT_WHILE().
- */
-aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
-
 /* Process request */
 if (req_obj->req) {
 if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) {
@@ -355,15 +347,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 }
 
 qmp_request_free(req_obj);
-
-/*
- * Yield and reschedule so the main loop stays responsive.
- *
- * Move back to iohandler_ctx so that nested event loops for
- * qemu_aio_context don't start new monitor commands.
- */
-aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
 }
 qatomic_set(_dispatcher_co, NULL);
 }
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 28b6bb..176b549473 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -206,9 +206,31 @@ QDict *coroutine_mixed_fn qmp_dispatch(const 
QmpCommandList *cmds, QObject *requ
 assert(!(oob && qemu_in_coroutine()));
 assert(monitor_cur() == NULL);
 if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
+if (qemu_in_coroutine()) {
+/*
+ * Move the coroutine from iohandler_ctx to qemu_aio_context for
+ * executing the command handler so that it can make progress if it
+ * involves an AIO_WAIT_WHILE().
+ */
+aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
+}
+
 monitor_set_cur(qemu_coroutine_self(), cur_mon);
 cmd->fn(args, , );
 monitor_set_cur(qemu_coroutine_self(), NULL);
+
+if (qemu_in_coroutine()) {
+/*
+ * Yield and reschedule so the main loop stays responsive.
+ *
+ * Move back to iohandler_ctx so that nested event loops for
+ * qemu_aio_context don't start new monitor commands.
+ */
+aio_co_schedule(iohandler_get_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
+}
 } else {
/*
 * Actual context doesn't match the one the command needs.
@@ -232,7 +254,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
 .errp   = ,
 .co = qemu_coroutine_self(),
 };
-aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
+aio_bh_schedule_oneshot(iohandler_get_aio_context(), 
do_qmp_dispatch_bh,
 );
 qemu_coroutine_yield();
 }
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index

[Stable-8.2.1 71/71] target/arm: Fix incorrect aa64_tidcp1 feature check

2024-01-28 Thread Michael Tokarev

From: Peter Maydell 

A typo in the implementation of isar_feature_aa64_tidcp1() means we
were checking the field in the wrong ID register, so we might have
provided the feature on CPUs that don't have it and not provided
it on CPUs that should have it. Correct this bug.

Cc: qemu-sta...@nongnu.org
Fixes: 9cd0c0dec97be9 "target/arm: Implement FEAT_TIDCP1"
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2120
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20240123160333.958841-1-peter.mayd...@linaro.org
(cherry picked from commit ee0a2e3c9d2991a11c13ffadb15e4d0add43c257)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 954d358268..165a497f7b 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -771,7 +771,7 @@ static inline bool isar_feature_aa64_hcx(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR1, TIDCP1) != 0;
+return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
 }
 
 static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id)
-- 
2.39.2

[Stable-8.2.1 62/71] tcg/arm: Fix SIGILL in tcg_out_qemu_st_direct

2024-01-28 Thread Michael Tokarev

From: Joseph Burt 

When tcg_out_qemu_st_{index,direct} were merged, the direct case for
MO_64 was omitted, causing qemu_st_i64 to be encoded as 0x due
to underflow when adding h.base and h.index.

Fixes: 1df6d611bdc2 ("tcg/arm: Introduce HostAddress")
Signed-off-by: Joseph Burt 
Message-Id: <20240121211439.100829-1-caseo...@gmail.com>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
(cherry picked from commit 9f6523e8e4689cafdbed7c10b7cf7c775b5a607b)
Signed-off-by: Michael Tokarev 

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index fc78566494..a9aa8aa91c 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1662,6 +1662,9 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp 
opc, TCGReg datalo,
 } else {
 tcg_out_strd_r(s, h.cond, datalo, h.base, h.index);
 }
+} else if (h.index < 0) {
+tcg_out_st32_12(s, h.cond, datalo, h.base, 0);
+tcg_out_st32_12(s, h.cond, datahi, h.base, 4);
 } else if (h.index_scratch) {
 tcg_out_st32_rwb(s, h.cond, datalo, h.index, h.base);
 tcg_out_st32_12(s, h.cond, datahi, h.index, 4);
-- 
2.39.2

[Stable-8.2.1 68/71] qtest: bump aspeed_smc-test timeout to 6 minutes

2024-01-28 Thread Michael Tokarev

From: Daniel P. Berrangé 

On a loaded system with --enable-debug, this test can take longer than
5 minutes. Raising the timeout to 6 minutes gives greater headroom for
such situations.

Signed-off-by: Daniel P. Berrangé 
[thuth: Increase the timeout to 6 minutes for very loaded systems]
Signed-off-by: Thomas Huth 
Message-Id: <20231215070357.10888-11-th...@redhat.com>
Signed-off-by: Alex Bennée 
(cherry picked from commit e8a12fe31f776c60fec993513cd1b1e66c2b8e29)
Signed-off-by: Michael Tokarev 
(Mjt: context fixup in tests/qtest/meson.build)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 47dabf91d0..fc14ae95a3 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,5 +1,6 @@
 slow_qtests = {
   'ahci-test' : 60,
+  'aspeed_smc-test': 360,
   'bios-tables-test' : 120,
   'boot-serial-test' : 60,
   'migration-test' : 150,
-- 
2.39.2

[Stable-8.1.5 36/36] target/arm: Fix A64 scalar SQSHRN and SQRSHRN

2024-01-28 Thread Michael Tokarev

From: Peter Maydell 

In commit 1b7bc9b5c8bf374dd we changed handle_vec_simd_sqshrn() so
that instead of starting with a 0 value and depositing in each new
element from the narrowing operation, it instead started with the raw
result of the narrowing operation of the first element.

This is fine in the vector case, because the deposit operations for
the second and subsequent elements will always overwrite any higher
bits that might have been in the first element's result value in
tcg_rd.  However in the scalar case we only go through this loop
once.  The effect is that for a signed narrowing operation, if the
result is negative then we will now return a value where the bits
above the first element are incorrectly 1 (because the narrowfn
returns a sign-extended result, not one that is truncated to the
element size).

Fix this by using an extract operation to get exactly the correct
bits of the output of the narrowfn for element 1, instead of a
plain move.

Cc: qemu-sta...@nongnu.org
Fixes: 1b7bc9b5c8bf374dd3 ("target/arm: Avoid tcg_const_ptr in 
handle_vec_simd_sqshrn")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2089
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20240123153416.877308-1-peter.mayd...@linaro.org
(cherry picked from commit 6fffc8378562c7fea6290c430b4f653f830a4c1a)
Signed-off-by: Michael Tokarev 

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 7267f172d7..4e54cb7502 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8116,7 +8116,7 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool 
is_scalar, bool is_q,
 narrowfn(tcg_rd_narrowed, cpu_env, tcg_rd);
 tcg_gen_extu_i32_i64(tcg_rd, tcg_rd_narrowed);
 if (i == 0) {
-tcg_gen_mov_i64(tcg_final, tcg_rd);
+tcg_gen_extract_i64(tcg_final, tcg_rd, 0, esize);
 } else {
 tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, 
esize);
 }
-- 
2.39.2

[Stable-8.1.5 34/36] qtest: bump aspeed_smc-test timeout to 6 minutes

2024-01-28 Thread Michael Tokarev

From: Daniel P. Berrangé 

On a loaded system with --enable-debug, this test can take longer than
5 minutes. Raising the timeout to 6 minutes gives greater headroom for
such situations.

Signed-off-by: Daniel P. Berrangé 
[thuth: Increase the timeout to 6 minutes for very loaded systems]
Signed-off-by: Thomas Huth 
Message-Id: <20231215070357.10888-11-th...@redhat.com>
Signed-off-by: Alex Bennée 
(cherry picked from commit e8a12fe31f776c60fec993513cd1b1e66c2b8e29)
Signed-off-by: Michael Tokarev 
(Mjt: context fixup in tests/qtest/meson.build)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index b071d400b3..f9e77bbd26 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,5 +1,6 @@
 slow_qtests = {
   'ahci-test' : 60,
+  'aspeed_smc-test': 360,
   'bios-tables-test' : 120,
   'boot-serial-test' : 60,
   'migration-test' : 150,
-- 
2.39.2

[Stable-8.2.1 60/71] linux-user: Fixed cpu restore with pc 0 on SIGBUS

2024-01-28 Thread Michael Tokarev

From: Robbin Ehn 

Commit f4e1168198 (linux-user: Split out host_sig{segv,bus}_handler)
introduced a bug, when returning from host_sigbus_handler the PC is
never set. Thus cpu_loop_exit_restore is called with a zero PC and
we immediate get a SIGSEGV.

Signed-off-by: Robbin Ehn 
Fixes: f4e1168198 ("linux-user: Split out host_sig{segv,bus}_handler")
Reviewed-by: Palmer Dabbelt 
Message-Id: <33f27425878fb529b9e39ef22c303f6e0d90525f.ca...@rivosinc.com>
Signed-off-by: Richard Henderson 
(cherry picked from commit 6d913158b5023ac948b8fd649d77fc86e28072f6)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/signal.c b/linux-user/signal.c
index b35d1e512f..c9527adfa3 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -925,7 +925,7 @@ static void host_sigsegv_handler(CPUState *cpu, siginfo_t 
*info,
 cpu_loop_exit_sigsegv(cpu, guest_addr, access_type, maperr, pc);
 }
 
-static void host_sigbus_handler(CPUState *cpu, siginfo_t *info,
+static uintptr_t host_sigbus_handler(CPUState *cpu, siginfo_t *info,
 host_sigcontext *uc)
 {
 uintptr_t pc = host_signal_pc(uc);
@@ -947,6 +947,7 @@ static void host_sigbus_handler(CPUState *cpu, siginfo_t 
*info,
 sigprocmask(SIG_SETMASK, host_signal_mask(uc), NULL);
 cpu_loop_exit_sigbus(cpu, guest_addr, access_type, pc);
 }
+return pc;
 }
 
 static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
@@ -974,7 +975,7 @@ static void host_signal_handler(int host_sig, siginfo_t 
*info, void *puc)
 host_sigsegv_handler(cpu, info, uc);
 return;
 case SIGBUS:
-host_sigbus_handler(cpu, info, uc);
+pc = host_sigbus_handler(cpu, info, uc);
 sync_sig = true;
 break;
 case SIGILL:
-- 
2.39.2

[Stable-8.2.1 63/71] virtio-net: correctly copy vnet header when flushing TX

2024-01-28 Thread Michael Tokarev

From: Jason Wang 

When HASH_REPORT is negotiated, the guest_hdr_len might be larger than
the size of the mergeable rx buffer header. Using
virtio_net_hdr_mrg_rxbuf during the header swap might lead a stack
overflow in this case. Fixing this by using virtio_net_hdr_v1_hash
instead.

Reported-by: Xiao Lei 
Cc: Yuri Benditovich 
Cc: qemu-sta...@nongnu.org
Cc: Mauro Matteo Cascella 
Fixes: CVE-2023-6693
Fixes: e22f0603fb2f ("virtio-net: reference implementation of hash report")
Reviewed-by: Michael Tokarev 
Signed-off-by: Jason Wang 
(cherry picked from commit 2220e8189fb94068dbad333228659fbac819abb0)
Signed-off-by: Michael Tokarev 

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 80c56f0cfc..73024babd4 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -674,6 +674,11 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
+/*
+ * Note: when extending the vnet header, please make sure to
+ * change the vnet header copying logic in virtio_net_flush_tx()
+ * as well.
+ */
 if (version_1) {
 n->guest_hdr_len = hash_report ?
 sizeof(struct virtio_net_hdr_v1_hash) :
@@ -2693,7 +2698,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 ssize_t ret;
 unsigned int out_num;
 struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
-struct virtio_net_hdr_mrg_rxbuf mhdr;
+struct virtio_net_hdr_v1_hash vhdr;
 
 elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
 if (!elem) {
@@ -2710,7 +2715,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 if (n->has_vnet_hdr) {
-if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
+if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
 virtqueue_detach_element(q->tx_vq, elem, 0);
@@ -2718,8 +2723,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 return -EINVAL;
 }
 if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) );
-sg2[0].iov_base = 
+virtio_net_hdr_swap(vdev, (void *) );
+sg2[0].iov_base = 
 sg2[0].iov_len = n->guest_hdr_len;
 out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
out_sg, out_num,
-- 
2.39.2

[Stable-8.2.1 59/71] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status

2024-01-28 Thread Michael Tokarev

From: Fiona Ebner 

Using fleecing backup like in [0] on a qcow2 image (with metadata
preallocation) can lead to the following assertion failure:

> bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.

In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
will be set by the qcow2 driver, so the caller will recursively check
the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
chain, in bdrv_co_do_block_status() for the snapshot-access driver,
the assertion failure will happen, because both flags are set.

To fix it, clear the recurse flag after the recursive check was done.

In detail:

> #0  qcow2_co_block_status

Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID.

> #1  bdrv_co_do_block_status

Because of the data flag, bdrv_co_do_block_status() will now also set
BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
bdrv_co_do_block_status() for the bdrv_file child will be called,
which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.

Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.

> #2  bdrv_co_common_block_status_above
> #3  bdrv_co_block_status_above
> #4  bdrv_co_block_status
> #5  cbw_co_snapshot_block_status
> #6  bdrv_co_snapshot_block_status
> #7  snapshot_access_co_block_status
> #8  bdrv_co_do_block_status

Return value is propagated all the way up to here, where the assertion
failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
both set.

> #9  bdrv_co_common_block_status_above
> #10 bdrv_co_block_status_above
> #11 block_copy_block_status
> #12 block_copy_dirty_clusters
> #13 block_copy_common
> #14 block_copy_async_co_entry
> #15 coroutine_trampoline

[0]:

> #!/bin/bash
> rm /tmp/disk.qcow2
> ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev 
> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> --blockdev 
> qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> --blockdev 
> qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> < {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", 
> "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", 
> "file": "node3", "node-name": "snap0" } }
> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": 
> "node1", "sync": "full", "job-id": "backup0" } }
> EOF

Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20240116154839.401030-1-f.eb...@proxmox.com
Signed-off-by: Stefan Hajnoczi 
(cherry picked from commit 8a9be7992426c8920d4178e7dca59306a18c7a3a)
Signed-off-by: Michael Tokarev 

diff --git a/block/io.c b/block/io.c
index 7e62fabbf5..d202987770 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2619,6 +2619,16 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool 
want_zero,
 ret |= (ret2 & BDRV_BLOCK_ZERO);
 }
 }
+
+/*
+ * Now that the recursive search was done, clear the flag. Otherwise,
+ * with more complicated block graphs like snapshot-access ->
+ * copy-before-write -> qcow2, where the return value will be 
propagated
+ * further up to a parent bdrv_co_do_block_status() call, both the
+ * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which is
+ * not allowed.
+ */
+ret &= ~BDRV_BLOCK_RECURSE;
 }
 
 out:
-- 
2.39.2

[Stable-8.2.1 00/71] Patch Round-up for stable 8.2.1, frozen on 2024-01-27

2024-01-28 Thread Michael Tokarev

The following patches are queued for QEMU stable v8.2.1:

  https://gitlab.com/qemu-project/qemu/-/commits/staging-8.2

Patch freeze is 2024-01-27 (frozen), and the release is planned for 2024-01-29:

  https://wiki.qemu.org/Planning/8.2

This series includes the edk2 changes (moving edk2 binaries to edk2 snapshot
fixing a bug).  I'm not entirely happy with that, but having a known bug seems
to be worse than having a snapshot, unreleased version of edk2.

Please respond here or CC qemu-sta...@nongnu.org on any additional patches
you think should (or shouldn't) be included in the release.

The changes which are staging for inclusion, with the original commit hash
from master branch, are given below the bottom line.

Thanks!

/mjt

--
01* d3007d348ada Kevin Wolf:
   block: Fix crash when loading snapshot on inactive node
02* 5a7f21efaf99 Kevin Wolf:
   vl: Improve error message for conflicting -incoming and -loadvm
03* bb6e2511eb48 Kevin Wolf:
   iotests: Basic tests for internal snapshots
04* d424db235434 Natanael Copa:
   target/riscv/kvm: do not use non-portable strerrorname_np()
05* 9d5b42beb697 Elen Avan:
   include/ui/rect.h: fix qemu_rect_init() mis-assignment
06* 007531586aa8 Paolo Bonzini:
   configure: use a native non-cross compiler for linux-user
07* 219615740425 Paolo Bonzini:
   target/i386: the sgx_epc_get_section stub is reachable
08* 25145a7d7735 Pavel Pisa:
   hw/net/can/sja1000: fix bug for single acceptance filter and standard 
   frame
09* 5cb0e7abe163 Xu Lu:
   target/riscv: Fix mcycle/minstret increment behavior
10* 4ad87cd4b225 Michael Tokarev:
   chardev/char.c: fix "abstract device type" error message
11* 09a36158c283 Michael Tokarev:
   audio/audio.c: remove trailing newline in error_setg
12* 0c7ffc977195 Bin Meng:
   hw/net: cadence_gem: Fix MDIO_OP_xxx values
13* 2c5107e1b455 Max Erenberg:
   edu: fix DMA range upper bound check
14* 213ae3ffda46 Cédric Le Goater:
   vfio/container: Replace basename with g_path_get_basename
15* 9353b6da430f Volker Rümelin:
   hw/vfio: fix iteration over global VFIODevice list
16* 82a65e3188ab Peter Maydell:
   hw/intc/arm_gicv3_cpuif: handle LPIs in in the list registers
17* ca5bed07d0e7 Richard Henderson:
   tcg/ppc: Use new registers for LQ destination
18* 1d513e06d966 Natanael Copa:
   util: fix build with musl libc on ppc64le
19* ca8b0cc8e917 Gerd Hoffmann:
   tests/acpi: allow tests/data/acpi/virt/SSDT.memhp changes
20* c3667412582c Gerd Hoffmann:
   edk2: update to git snapshot
21* 6f79fa5f097a Gerd Hoffmann:
   edk2: update build config, set PcdUninstallMemAttrProtocol = TRUE.
22* 505872015196 Gerd Hoffmann:
   edk2: update binaries to git snapshot
23* 55abfc1ffbe5 Gerd Hoffmann:
   tests/acpi: update expected data files
24* 704f7cad5105 Gerd Hoffmann:
   tests/acpi: disallow tests/data/acpi/virt/SSDT.memhp changes
25* c98873ee4a0c Samuel Tardieu:
   tests/qtest/virtio-ccw: Fix device presence checking
26* e358a25a97c7 Ilya Leoshkevich:
   target/s390x: Fix LAE setting a wrong access register
27* 52a21689cd82 Peter Maydell:
   .gitlab-ci.d/buildtest.yml: Work around htags bug when environment is 
   large
28* b16a45bc5e0e Alex Bennée:
   readthodocs: fully specify a build environment
29* 92039f61af89 Helge Deller:
   hw/hppa/machine: Allow up to 3840 MB total memory
30* d8a3220005d7 Helge Deller:
   hw/hppa/machine: Disable default devices with --nodefaults option
31* 3b57c15f0205 Helge Deller:
   hw/pci-host/astro: Add missing astro & elroy registers for NetBSD
32* 6ce18d530638 Helge Deller:
   target/hppa: Fix PDC address translation on PA2.0 with PSW.W=0
33* ed35afcb331a Helge Deller:
   hw/hppa: Move software power button address back into PDC
34* 5915b67013eb Helge Deller:
   target/hppa: Avoid accessing %gr0 when raising exception
35* 3824e0d643f3 Helge Deller:
   target/hppa: Export function hppa_set_ior_and_isr()
36* 910ada0225d1 Helge Deller:
   target/hppa: Fix IOR and ISR on unaligned access trap
37* 31efbe72c6cc Helge Deller:
   target/hppa: Fix IOR and ISR on error in probe
38* 4bda8224fa89 Helge Deller:
   target/hppa: Update SeaBIOS-hppa to version 15
39* 410c2a4d75f5 Anastasia Belova:
   load_elf: fix iterator's type for elf file processing
40* a58506b748b8 Richard Henderson:
   target/i386: Do not re-compute new pc with CF_PCREL
41* 2926eab89699 guoguangyao:
   target/i386: fix incorrect EIP in PC-relative translation blocks
42* 729ba8e933f8 Paolo Bonzini:
   target/i386: pcrel: store low bits of physical address in data[0]
43* 484aecf2d3a7 Philippe Mathieu-Daudé:
   backends/cryptodev: Do not ignore throttle/backends Errors
44* 3b14a555fdb6 Gerd Hoffmann:
   hw/pflash: refactor pflash_data_write()
45* 5dd58358a570 Gerd Hoffmann:
   hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p
46* 284a7ee2e290 Gerd Hoffmann:
   hw/pflash: implement update buffer for block writes
47* 44ce1b5d2fc7 Nick Briggs:
   migration/rdma: define htonll/ntohll only if not predefined
48* 84a6835e004c Mark

[Stable-8.2.1 55/71] acpi/tests/avocado/bits: wait for 200 seconds for SHUTDOWN event from bits VM

2024-01-28 Thread Michael Tokarev

From: Ani Sinha 

By default, the timeout to receive any specified event from the QEMU VM is 60
seconds set by the python avocado test framework. Please see event_wait() and
events_wait() in python/qemu/machine/machine.py. If the matching event is not
triggered within that interval, an asyncio.TimeoutError is generated. Since the
timeout for the bits avocado test is 200 secs, we need to make event_wait()
timeout of the same value as well so that an early timeout is not triggered by
the avocado framework.

CC: peter.mayd...@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2077
Signed-off-by: Ani Sinha 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20240117042556.3360190-1-anisi...@redhat.com
Signed-off-by: Peter Maydell 
(cherry picked from commit 7ef4c41e91d59d72a3b8bc022a6cb3e81787a50a)
Signed-off-by: Michael Tokarev 

diff --git a/tests/avocado/acpi-bits.py b/tests/avocado/acpi-bits.py
index 68b9e98d4e..efe4f52ee0 100644
--- a/tests/avocado/acpi-bits.py
+++ b/tests/avocado/acpi-bits.py
@@ -54,6 +54,8 @@
 deps = ["xorriso", "mformat"] # dependent tools needed in the test setup/box.
 supported_platforms = ['x86_64'] # supported test platforms.
 
+# default timeout of 120 secs is sometimes not enough for bits test.
+BITS_TIMEOUT = 200
 
 def which(tool):
 """ looks up the full path for @tool, returns None if not found
@@ -133,7 +135,7 @@ class AcpiBitsTest(QemuBaseTest): #pylint: 
disable=too-many-instance-attributes
 
 """
 # in slower systems the test can take as long as 3 minutes to complete.
-timeout = 200
+timeout = BITS_TIMEOUT
 
 def __init__(self, *args, **kwargs):
 super().__init__(*args, **kwargs)
@@ -400,7 +402,8 @@ def test_acpi_smbios_bits(self):
 
 # biosbits has been configured to run all the specified test suites
 # in batch mode and then automatically initiate a vm shutdown.
-# Rely on avocado's unit test timeout.
-self._vm.event_wait('SHUTDOWN')
+# Set timeout to BITS_TIMEOUT for SHUTDOWN event from bits VM at par
+# with the avocado test timeout.
+self._vm.event_wait('SHUTDOWN', timeout=BITS_TIMEOUT)
 self._vm.wait(timeout=None)
 self.parse_log()
-- 
2.39.2

[Stable-8.2.1 66/71] iotests: port 141 to Python for reliable QMP testing

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

The common.qemu bash functions allow tests to interact with the QMP
monitor of a QEMU process. I spent two days trying to update 141 when
the order of the test output changed, but found it would still fail
occassionally because printf() and QMP events race with synchronous QMP
communication.

I gave up and ported 141 to the existing Python API for QMP tests. The
Python API is less affected by the order in which QEMU prints output
because it does not print all QMP traffic by default.

The next commit changes the order in which QMP messages are received.
Make 141 reliable first.

Cc: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-3-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit 9ee2dd4c22a3639c5462b3fc20df60c005c3de64)
Signed-off-by: Michael Tokarev 

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index a37030ee17..a7d3985a02 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -1,9 +1,12 @@
-#!/usr/bin/env bash
+#!/usr/bin/env python3
 # group: rw auto quick
 #
 # Test case for ejecting BDSs with block jobs still running on them
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Originally written in bash by Hanna Czenczek, ported to Python by Stefan
+# Hajnoczi.
+#
+# Copyright Red Hat
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -19,177 +22,129 @@
 # along with this program.  If not, see .
 #
 
-# creator
-owner=hre...@redhat.com
-
-seq="$(basename $0)"
-echo "QA output created by $seq"
-
-status=1   # failure is the default!
-
-_cleanup()
-{
-_cleanup_qemu
-_cleanup_test_img
-for img in "$TEST_DIR"/{b,m,o}.$IMGFMT; do
-_rm_test_img "$img"
-done
-}
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-# get standard environment, filters and checks
-. ./common.rc
-. ./common.filter
-. ./common.qemu
-
-# Needs backing file and backing format support
-_supported_fmt qcow2 qed
-_supported_proto file
-_supported_os Linux
-
-
-test_blockjob()
-{
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-add',
-  'arguments': {
-  'node-name': 'drv0',
-  'driver': '$IMGFMT',
-  'file': {
-  'driver': 'file',
-  'filename': '$TEST_IMG'
-  }}}" \
-'return'
-
-# If "$2" is an event, we may or may not see it before the
-# {"return": {}}.  Therefore, filter the {"return": {}} out both
-# here and in the next command.  (Naturally, if we do not see it
-# here, we will see it before the next command can be executed,
-# so it will appear in the next _send_qemu_cmd's output.)
-_send_qemu_cmd $QEMU_HANDLE \
-"$1" \
-"$2" \
-| _filter_img_create | _filter_qmp_empty_return
-
-# We want this to return an error because the block job is still running
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'error' | _filter_generated_node_ids | _filter_qmp_empty_return
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'block-job-cancel',
-  'arguments': {'device': 'job0'}}" \
-"$3"
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'return'
-}
-
-
-TEST_IMG="$TEST_DIR/b.$IMGFMT" _make_test_img 1M
-TEST_IMG="$TEST_DIR/m.$IMGFMT" _make_test_img -b "$TEST_DIR/b.$IMGFMT" -F 
$IMGFMT 1M
-_make_test_img -b "$TEST_DIR/m.$IMGFMT" 1M -F $IMGFMT
-
-_launch_qemu -nodefaults
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'qmp_capabilities'}" \
-'return'
-
-echo
-echo '=== Testing drive-backup ==='
-echo
-
-# drive-backup will not send BLOCK_JOB_READY by itself, and cancelling the job
-# will consequently result in BLOCK_JOB_CANCELLED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-backup',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'return' \
-'"status": "null"'
-
-echo
-echo '=== Testing drive-mirror ==='
-echo
-
-# drive-mirror will send BLOCK_JOB_READY basically immediately, and cancelling
-# the job will consequently result in BLOCK_JOB_COMPLETED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-mirror',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'BLOCK_JOB_READY' \
-'"status": "null"'
-
-echo
-echo '=== Testing active block-commit ==='
-echo
-
-# An active block-commit will send BLOCK_JOB_READY basically immediately, and
-# cancelling

[Stable-8.2.1 61/71] linux-user/riscv: Adjust vdso signal frame cfa offsets

2024-01-28 Thread Michael Tokarev

From: Richard Henderson 

A typo in sizeof_reg put the registers at the wrong offset.

Simplify the expressions to use positive addresses from the
start of uc_mcontext instead of negative addresses from the
end of uc_mcontext.

Reported-by: Vineet Gupta 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
(cherry picked from commit 1b21fe27e75a59bfe2513f5abcc6a18cfc35cfc8)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/riscv/vdso-32.so b/linux-user/riscv/vdso-32.so
index 1ad1e5cbbb..c2ce2a4757 100755
Binary files a/linux-user/riscv/vdso-32.so and b/linux-user/riscv/vdso-32.so 
differ
diff --git a/linux-user/riscv/vdso-64.so b/linux-user/riscv/vdso-64.so
index 83992bebe6..ae49f5b043 100755
Binary files a/linux-user/riscv/vdso-64.so and b/linux-user/riscv/vdso-64.so 
differ
diff --git a/linux-user/riscv/vdso.S b/linux-user/riscv/vdso.S
index a86d8fc488..c37275233a 100644
--- a/linux-user/riscv/vdso.S
+++ b/linux-user/riscv/vdso.S
@@ -101,12 +101,12 @@ endf __vdso_flush_icache
.cfi_startproc simple
.cfi_signal_frame
 
-#define sizeof_reg (__riscv_xlen / 4)
+#define sizeof_reg (__riscv_xlen / 8)
 #define sizeof_freg8
-#define B_GR   (offsetof_uc_mcontext - sizeof_rt_sigframe)
-#define B_FR   (offsetof_uc_mcontext - sizeof_rt_sigframe + offsetof_freg0)
+#define B_GR   0
+#define B_FR   offsetof_freg0
 
-   .cfi_def_cfa2, sizeof_rt_sigframe
+   .cfi_def_cfa2, offsetof_uc_mcontext
 
/* Return address */
.cfi_return_column 64
-- 
2.39.2

[Stable-8.2.1 57/71] tcg/s390x: Fix encoding of VRIc, VRSa, VRSc insns

2024-01-28 Thread Michael Tokarev

From: Richard Henderson 

While the format names the second vector register 'v3',
it is still in the second position (bits 12-15) and
the argument to RXB must match.

Example error:
 -   e7 00 00 10 2a 33   verllf  %v16,%v0,16
 +   e7 00 00 10 2c 33   verllf  %v16,%v16,16

Cc: qemu-sta...@nongnu.org
Reported-by: Michael Tokarev 
Fixes: 22cb37b4172 ("tcg/s390x: Implement vector shift operations")
Fixes: 79cada8693d ("tcg/s390x: Implement tcg_out_dup*_vec")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2054
Reviewed-by: Thomas Huth 
Tested-by: Michael Tokarev 
Message-Id: <20240117213646.159697-2-richard.hender...@linaro.org>
Signed-off-by: Richard Henderson 
(cherry picked from commit c1ddc18f37108498f45d57afd6bf33a23b703648)
Signed-off-by: Michael Tokarev 

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..7f6b84aa2c 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -683,7 +683,7 @@ static void tcg_out_insn_VRIc(TCGContext *s, S390Opcode op,
 tcg_debug_assert(is_vector_reg(v3));
 tcg_out16(s, (op & 0xff00) | ((v1 & 0xf) << 4) | (v3 & 0xf));
 tcg_out16(s, i2);
-tcg_out16(s, (op & 0x00ff) | RXB(v1, 0, v3, 0) | (m4 << 12));
+tcg_out16(s, (op & 0x00ff) | RXB(v1, v3, 0, 0) | (m4 << 12));
 }
 
 static void tcg_out_insn_VRRa(TCGContext *s, S390Opcode op,
@@ -738,7 +738,7 @@ static void tcg_out_insn_VRSa(TCGContext *s, S390Opcode op, 
TCGReg v1,
 tcg_debug_assert(is_vector_reg(v3));
 tcg_out16(s, (op & 0xff00) | ((v1 & 0xf) << 4) | (v3 & 0xf));
 tcg_out16(s, b2 << 12 | d2);
-tcg_out16(s, (op & 0x00ff) | RXB(v1, 0, v3, 0) | (m4 << 12));
+tcg_out16(s, (op & 0x00ff) | RXB(v1, v3, 0, 0) | (m4 << 12));
 }
 
 static void tcg_out_insn_VRSb(TCGContext *s, S390Opcode op, TCGReg v1,
@@ -762,7 +762,7 @@ static void tcg_out_insn_VRSc(TCGContext *s, S390Opcode op, 
TCGReg r1,
 tcg_debug_assert(is_vector_reg(v3));
 tcg_out16(s, (op & 0xff00) | (r1 << 4) | (v3 & 0xf));
 tcg_out16(s, b2 << 12 | d2);
-tcg_out16(s, (op & 0x00ff) | RXB(0, 0, v3, 0) | (m4 << 12));
+tcg_out16(s, (op & 0x00ff) | RXB(0, v3, 0, 0) | (m4 << 12));
 }
 
 static void tcg_out_insn_VRX(TCGContext *s, S390Opcode op, TCGReg v1,
-- 
2.39.2

[Stable-8.1.5 29/36] virtio-net: correctly copy vnet header when flushing TX

2024-01-28 Thread Michael Tokarev

From: Jason Wang 

When HASH_REPORT is negotiated, the guest_hdr_len might be larger than
the size of the mergeable rx buffer header. Using
virtio_net_hdr_mrg_rxbuf during the header swap might lead a stack
overflow in this case. Fixing this by using virtio_net_hdr_v1_hash
instead.

Reported-by: Xiao Lei 
Cc: Yuri Benditovich 
Cc: qemu-sta...@nongnu.org
Cc: Mauro Matteo Cascella 
Fixes: CVE-2023-6693
Fixes: e22f0603fb2f ("virtio-net: reference implementation of hash report")
Reviewed-by: Michael Tokarev 
Signed-off-by: Jason Wang 
(cherry picked from commit 2220e8189fb94068dbad333228659fbac819abb0)
Signed-off-by: Michael Tokarev 

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9a93a2df01..fbed66e6ca 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -667,6 +667,11 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
+/*
+ * Note: when extending the vnet header, please make sure to
+ * change the vnet header copying logic in virtio_net_flush_tx()
+ * as well.
+ */
 if (version_1) {
 n->guest_hdr_len = hash_report ?
 sizeof(struct virtio_net_hdr_v1_hash) :
@@ -2674,7 +2679,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 ssize_t ret;
 unsigned int out_num;
 struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
-struct virtio_net_hdr_mrg_rxbuf mhdr;
+struct virtio_net_hdr_v1_hash vhdr;
 
 elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
 if (!elem) {
@@ -2691,7 +2696,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 if (n->has_vnet_hdr) {
-if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
+if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
 virtqueue_detach_element(q->tx_vq, elem, 0);
@@ -2699,8 +2704,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 return -EINVAL;
 }
 if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) );
-sg2[0].iov_base = 
+virtio_net_hdr_swap(vdev, (void *) );
+sg2[0].iov_base = 
 sg2[0].iov_len = n->guest_hdr_len;
 out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
out_sg, out_num,
-- 
2.39.2

[Stable-8.1.5 31/36] iotests: add filter_qmp_generated_node_ids()

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

Add a filter function for QMP responses that contain QEMU's
automatically generated node ids. The ids change between runs and must
be masked in the reference output.

The next commit will use this new function.

Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-2-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit da62b507a20510d819bcfbe8f5e573409b954006)
Signed-off-by: Michael Tokarev 

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index ef66fbd62b..1dd41e68ee 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -646,6 +646,13 @@ def _filter(_key, value):
 def filter_generated_node_ids(msg):
 return re.sub("#block[0-9]+", "NODE_NAME", msg)
 
+def filter_qmp_generated_node_ids(qmsg):
+def _filter(_key, value):
+if is_str(value):
+return filter_generated_node_ids(value)
+return value
+return filter_qmp(qmsg, _filter)
+
 def filter_img_info(output: str, filename: str,
 drop_child_info: bool = True) -> str:
 lines = []
-- 
2.39.2

[Stable-7.2.9 27/30] iotests: port 141 to Python for reliable QMP testing

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

The common.qemu bash functions allow tests to interact with the QMP
monitor of a QEMU process. I spent two days trying to update 141 when
the order of the test output changed, but found it would still fail
occassionally because printf() and QMP events race with synchronous QMP
communication.

I gave up and ported 141 to the existing Python API for QMP tests. The
Python API is less affected by the order in which QEMU prints output
because it does not print all QMP traffic by default.

The next commit changes the order in which QMP messages are received.
Make 141 reliable first.

Cc: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-3-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit 9ee2dd4c22a3639c5462b3fc20df60c005c3de64)
Signed-off-by: Michael Tokarev 

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index a37030ee17..a7d3985a02 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -1,9 +1,12 @@
-#!/usr/bin/env bash
+#!/usr/bin/env python3
 # group: rw auto quick
 #
 # Test case for ejecting BDSs with block jobs still running on them
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Originally written in bash by Hanna Czenczek, ported to Python by Stefan
+# Hajnoczi.
+#
+# Copyright Red Hat
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -19,177 +22,129 @@
 # along with this program.  If not, see .
 #
 
-# creator
-owner=hre...@redhat.com
-
-seq="$(basename $0)"
-echo "QA output created by $seq"
-
-status=1   # failure is the default!
-
-_cleanup()
-{
-_cleanup_qemu
-_cleanup_test_img
-for img in "$TEST_DIR"/{b,m,o}.$IMGFMT; do
-_rm_test_img "$img"
-done
-}
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-# get standard environment, filters and checks
-. ./common.rc
-. ./common.filter
-. ./common.qemu
-
-# Needs backing file and backing format support
-_supported_fmt qcow2 qed
-_supported_proto file
-_supported_os Linux
-
-
-test_blockjob()
-{
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-add',
-  'arguments': {
-  'node-name': 'drv0',
-  'driver': '$IMGFMT',
-  'file': {
-  'driver': 'file',
-  'filename': '$TEST_IMG'
-  }}}" \
-'return'
-
-# If "$2" is an event, we may or may not see it before the
-# {"return": {}}.  Therefore, filter the {"return": {}} out both
-# here and in the next command.  (Naturally, if we do not see it
-# here, we will see it before the next command can be executed,
-# so it will appear in the next _send_qemu_cmd's output.)
-_send_qemu_cmd $QEMU_HANDLE \
-"$1" \
-"$2" \
-| _filter_img_create | _filter_qmp_empty_return
-
-# We want this to return an error because the block job is still running
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'error' | _filter_generated_node_ids | _filter_qmp_empty_return
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'block-job-cancel',
-  'arguments': {'device': 'job0'}}" \
-"$3"
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'return'
-}
-
-
-TEST_IMG="$TEST_DIR/b.$IMGFMT" _make_test_img 1M
-TEST_IMG="$TEST_DIR/m.$IMGFMT" _make_test_img -b "$TEST_DIR/b.$IMGFMT" -F 
$IMGFMT 1M
-_make_test_img -b "$TEST_DIR/m.$IMGFMT" 1M -F $IMGFMT
-
-_launch_qemu -nodefaults
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'qmp_capabilities'}" \
-'return'
-
-echo
-echo '=== Testing drive-backup ==='
-echo
-
-# drive-backup will not send BLOCK_JOB_READY by itself, and cancelling the job
-# will consequently result in BLOCK_JOB_CANCELLED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-backup',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'return' \
-'"status": "null"'
-
-echo
-echo '=== Testing drive-mirror ==='
-echo
-
-# drive-mirror will send BLOCK_JOB_READY basically immediately, and cancelling
-# the job will consequently result in BLOCK_JOB_COMPLETED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-mirror',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'BLOCK_JOB_READY' \
-'"status": "null"'
-
-echo
-echo '=== Testing active block-commit ==='
-echo
-
-# An active block-commit will send BLOCK_JOB_READY basically immediately, and
-# cancelling

[Stable-8.1.5 25/36] readthodocs: fully specify a build environment

2024-01-28 Thread Michael Tokarev

From: Alex Bennée 

This is now expected by rtd so I've expanded using their example as
22.04 is one of our supported platforms. I tried to work out if there
was an easy way to re-generate a requirements.txt from our
pythondeps.toml but in the end went for the easier solution.

Cc:  
Signed-off-by: Alex Bennée 
Message-Id: <20231221174200.2693694-1-alex.ben...@linaro.org>
(cherry picked from commit b16a45bc5e0e329a16af8a2e020a6e7044f9afa2)
Signed-off-by: Michael Tokarev 

diff --git a/.readthedocs.yml b/.readthedocs.yml
index 7fb7b8dd61..0b262469ce 100644
--- a/.readthedocs.yml
+++ b/.readthedocs.yml
@@ -5,16 +5,21 @@
 # Required
 version: 2
 
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+python: "3.11"
+
 # Build documentation in the docs/ directory with Sphinx
 sphinx:
   configuration: docs/conf.py
 
+# We recommend specifying your dependencies to enable reproducible builds:
+# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+  install:
+- requirements: docs/requirements.txt
+
 # We want all the document formats
 formats: all
-
-# For consistency, we require that QEMU's Sphinx extensions
-# run with at least the same minimum version of Python that
-# we require for other Python in our codebase (our conf.py
-# enforces this, and some code needs it.)
-python:
-  version: 3.6
diff --git a/docs/requirements.txt b/docs/requirements.txt
new file mode 100644
index 00..691e5218ec
--- /dev/null
+++ b/docs/requirements.txt
@@ -0,0 +1,2 @@
+sphinx==5.3.0
+sphinx_rtd_theme==1.1.1
-- 
2.39.2

[Stable-8.1.5 33/36] monitor: only run coroutine commands in qemu_aio_context

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not
polled during nested event loops. The coroutine currently reschedules
itself in the main loop's qemu_aio_context AioContext, which is polled
during nested event loops. One known problem is that QMP device-add
calls drain_call_rcu(), which temporarily drops the BQL, leading to all
sorts of havoc like other vCPU threads re-entering device emulation code
while another vCPU thread is waiting in device emulation code with
aio_poll().

Paolo Bonzini suggested running non-coroutine QMP handlers in the
iohandler AioContext. This avoids trouble with nested event loops. His
original idea was to move coroutine rescheduling to
monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch()
because we don't know if the QMP handler needs to run in coroutine
context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have
been nicer since it's associated with the monitor implementation and not
as general as qmp_dispatch(), which is also used by qemu-ga.

A number of qemu-iotests need updated .out files because the order of
QMP events vs QMP responses has changed.

Solves Issue #1933.

Cc: qemu-sta...@nongnu.org
Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use 
drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Buglink: https://issues.redhat.com/browse/RHEL-17369
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-4-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Tested-by: Fiona Ebner 
Signed-off-by: Kevin Wolf 
(cherry picked from commit effd60c878176bcaf97fa7ce2b12d04bb8ead6f7)
Signed-off-by: Michael Tokarev 

diff --git a/monitor/qmp.c b/monitor/qmp.c
index 6eee450fe4..a239945e8d 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -321,14 +321,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 qemu_coroutine_yield();
 }
 
-/*
- * Move the coroutine from iohandler_ctx to qemu_aio_context for
- * executing the command handler so that it can make progress if it
- * involves an AIO_WAIT_WHILE().
- */
-aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
-
 /* Process request */
 if (req_obj->req) {
 if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) {
@@ -355,15 +347,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 }
 
 qmp_request_free(req_obj);
-
-/*
- * Yield and reschedule so the main loop stays responsive.
- *
- * Move back to iohandler_ctx so that nested event loops for
- * qemu_aio_context don't start new monitor commands.
- */
-aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
 }
 qatomic_set(_dispatcher_co, NULL);
 }
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 28b6bb..176b549473 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -206,9 +206,31 @@ QDict *coroutine_mixed_fn qmp_dispatch(const 
QmpCommandList *cmds, QObject *requ
 assert(!(oob && qemu_in_coroutine()));
 assert(monitor_cur() == NULL);
 if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
+if (qemu_in_coroutine()) {
+/*
+ * Move the coroutine from iohandler_ctx to qemu_aio_context for
+ * executing the command handler so that it can make progress if it
+ * involves an AIO_WAIT_WHILE().
+ */
+aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
+}
+
 monitor_set_cur(qemu_coroutine_self(), cur_mon);
 cmd->fn(args, , );
 monitor_set_cur(qemu_coroutine_self(), NULL);
+
+if (qemu_in_coroutine()) {
+/*
+ * Yield and reschedule so the main loop stays responsive.
+ *
+ * Move back to iohandler_ctx so that nested event loops for
+ * qemu_aio_context don't start new monitor commands.
+ */
+aio_co_schedule(iohandler_get_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
+}
 } else {
/*
 * Actual context doesn't match the one the command needs.
@@ -232,7 +254,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
 .errp   = ,
 .co = qemu_coroutine_self(),
 };
-aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
+aio_bh_schedule_oneshot(iohandler_get_aio_context(), 
do_qmp_dispatch_bh,
 );
 qemu_coroutine_yield();
 }
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index

[Stable-7.2.9 29/30] qtest: bump aspeed_smc-test timeout to 6 minutes

2024-01-28 Thread Michael Tokarev

From: Daniel P. Berrangé 

On a loaded system with --enable-debug, this test can take longer than
5 minutes. Raising the timeout to 6 minutes gives greater headroom for
such situations.

Signed-off-by: Daniel P. Berrangé 
[thuth: Increase the timeout to 6 minutes for very loaded systems]
Signed-off-by: Thomas Huth 
Message-Id: <20231215070357.10888-11-th...@redhat.com>
Signed-off-by: Alex Bennée 
(cherry picked from commit e8a12fe31f776c60fec993513cd1b1e66c2b8e29)
Signed-off-by: Michael Tokarev 
(Mjt: context fixup in tests/qtest/meson.build)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c07a5b1a5f..c00b92b89a 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -6,6 +6,7 @@ endif
 
 slow_qtests = {
   'ahci-test' : 60,
+  'aspeed_smc-test': 360,
   'bios-tables-test' : 120,
   'boot-serial-test' : 60,
   'migration-test' : 150,
-- 
2.39.2

[Stable-8.1.5 32/36] iotests: port 141 to Python for reliable QMP testing

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

The common.qemu bash functions allow tests to interact with the QMP
monitor of a QEMU process. I spent two days trying to update 141 when
the order of the test output changed, but found it would still fail
occassionally because printf() and QMP events race with synchronous QMP
communication.

I gave up and ported 141 to the existing Python API for QMP tests. The
Python API is less affected by the order in which QEMU prints output
because it does not print all QMP traffic by default.

The next commit changes the order in which QMP messages are received.
Make 141 reliable first.

Cc: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-3-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit 9ee2dd4c22a3639c5462b3fc20df60c005c3de64)
Signed-off-by: Michael Tokarev 

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index a37030ee17..a7d3985a02 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -1,9 +1,12 @@
-#!/usr/bin/env bash
+#!/usr/bin/env python3
 # group: rw auto quick
 #
 # Test case for ejecting BDSs with block jobs still running on them
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Originally written in bash by Hanna Czenczek, ported to Python by Stefan
+# Hajnoczi.
+#
+# Copyright Red Hat
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -19,177 +22,129 @@
 # along with this program.  If not, see .
 #
 
-# creator
-owner=hre...@redhat.com
-
-seq="$(basename $0)"
-echo "QA output created by $seq"
-
-status=1   # failure is the default!
-
-_cleanup()
-{
-_cleanup_qemu
-_cleanup_test_img
-for img in "$TEST_DIR"/{b,m,o}.$IMGFMT; do
-_rm_test_img "$img"
-done
-}
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-# get standard environment, filters and checks
-. ./common.rc
-. ./common.filter
-. ./common.qemu
-
-# Needs backing file and backing format support
-_supported_fmt qcow2 qed
-_supported_proto file
-_supported_os Linux
-
-
-test_blockjob()
-{
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-add',
-  'arguments': {
-  'node-name': 'drv0',
-  'driver': '$IMGFMT',
-  'file': {
-  'driver': 'file',
-  'filename': '$TEST_IMG'
-  }}}" \
-'return'
-
-# If "$2" is an event, we may or may not see it before the
-# {"return": {}}.  Therefore, filter the {"return": {}} out both
-# here and in the next command.  (Naturally, if we do not see it
-# here, we will see it before the next command can be executed,
-# so it will appear in the next _send_qemu_cmd's output.)
-_send_qemu_cmd $QEMU_HANDLE \
-"$1" \
-"$2" \
-| _filter_img_create | _filter_qmp_empty_return
-
-# We want this to return an error because the block job is still running
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'error' | _filter_generated_node_ids | _filter_qmp_empty_return
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'block-job-cancel',
-  'arguments': {'device': 'job0'}}" \
-"$3"
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'return'
-}
-
-
-TEST_IMG="$TEST_DIR/b.$IMGFMT" _make_test_img 1M
-TEST_IMG="$TEST_DIR/m.$IMGFMT" _make_test_img -b "$TEST_DIR/b.$IMGFMT" -F 
$IMGFMT 1M
-_make_test_img -b "$TEST_DIR/m.$IMGFMT" 1M -F $IMGFMT
-
-_launch_qemu -nodefaults
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'qmp_capabilities'}" \
-'return'
-
-echo
-echo '=== Testing drive-backup ==='
-echo
-
-# drive-backup will not send BLOCK_JOB_READY by itself, and cancelling the job
-# will consequently result in BLOCK_JOB_CANCELLED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-backup',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'return' \
-'"status": "null"'
-
-echo
-echo '=== Testing drive-mirror ==='
-echo
-
-# drive-mirror will send BLOCK_JOB_READY basically immediately, and cancelling
-# the job will consequently result in BLOCK_JOB_COMPLETED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-mirror',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'BLOCK_JOB_READY' \
-'"status": "null"'
-
-echo
-echo '=== Testing active block-commit ==='
-echo
-
-# An active block-commit will send BLOCK_JOB_READY basically immediately, and
-# cancelling

[Stable-8.1.5 v3 00/36] Patch Round-up for stable 8.1.5, frozen on 2024-01-27

2024-01-28 Thread Michael Tokarev

The following patches are queued for QEMU stable v8.1.5:

  https://gitlab.com/qemu-project/qemu/-/commits/staging-8.1

Patch freeze is 2024-01-27 (frozen), and the release is planned for 2024-01-29:

  https://wiki.qemu.org/Planning/8.1

Please respond here or CC qemu-sta...@nongnu.org on any additional patches
you think should (or shouldn't) be included in the release.

The changes which are staging for inclusion, with the original commit hash
from master branch, are given below the bottom line.

This is supposed to be last release of 8.1.x stable/bugfix series.

Thanks!

/mjt

--
01* d3007d348ada Kevin Wolf:
   block: Fix crash when loading snapshot on inactive node
02* 5a7f21efaf99 Kevin Wolf:
   vl: Improve error message for conflicting -incoming and -loadvm
03* bb6e2511eb48 Kevin Wolf:
   iotests: Basic tests for internal snapshots
04* 25145a7d7735 Pavel Pisa:
   hw/net/can/sja1000: fix bug for single acceptance filter and standard frame
05* 5cb0e7abe163 Xu Lu:
   target/riscv: Fix mcycle/minstret increment behavior
06* 4ad87cd4b225 Michael Tokarev:
   chardev/char.c: fix "abstract device type" error message
07* 82a65e3188ab Peter Maydell:
   hw/intc/arm_gicv3_cpuif: handle LPIs in in the list registers
08* 1d513e06d966 Natanael Copa:
   util: fix build with musl libc on ppc64le
09* c98873ee4a0c Samuel Tardieu:
   tests/qtest/virtio-ccw: Fix device presence checking
10* e358a25a97c7 Ilya Leoshkevich:
   target/s390x: Fix LAE setting a wrong access register
11* 52a21689cd82 Peter Maydell:
   .gitlab-ci.d/buildtest.yml: Work around htags bug when environment is large
12* 410c2a4d75f5 Anastasia Belova:
   load_elf: fix iterator's type for elf file processing
13* a58506b748b8 Richard Henderson:
   target/i386: Do not re-compute new pc with CF_PCREL
14* 2926eab89699 guoguangyao:
   target/i386: fix incorrect EIP in PC-relative translation blocks
15* 729ba8e933f8 Paolo Bonzini:
   target/i386: pcrel: store low bits of physical address in data[0]
16* 484aecf2d3a7 Philippe Mathieu-Daudé:
   backends/cryptodev: Do not ignore throttle/backends Errors
17* 3b14a555fdb6 Gerd Hoffmann:
   hw/pflash: refactor pflash_data_write()
18* 5dd58358a570 Gerd Hoffmann:
   hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p
19* 284a7ee2e290 Gerd Hoffmann:
   hw/pflash: implement update buffer for block writes
20* 44ce1b5d2fc7 Nick Briggs:
   migration/rdma: define htonll/ntohll only if not predefined
21* 84a6835e004c Mark Cave-Ayland:
   hw/scsi/esp-pci: use correct address register for PCI DMA transfers
22* 6b41417d934b Mark Cave-Ayland:
   hw/scsi/esp-pci: generate PCI interrupt from separate ESP and PCI sources
23* 1e8e6644e063 Mark Cave-Ayland:
   hw/scsi/esp-pci: synchronise setting of DMA_STAT_DONE with ESP completion 
   interrupt
24* c2d7de557d19 Mark Cave-Ayland:
   hw/scsi/esp-pci: set DMA_STAT_BCMBLT when BLAST command issued
25 b16a45bc5e0e Alex Bennée:
   readthodocs: fully specify a build environment
26 4b06bb5826 Paolo Bonzini:
   accel/tcg: Revert mapping of PCREL translation block to multiple virtual 
   addresses
27 8a9be7992426 Fiona Ebner:
   block/io: clear BDRV_BLOCK_RECURSE flag after recursing in 
   bdrv_co_block_status
28 9f6523e8e468 Joseph Burt:
   tcg/arm: Fix SIGILL in tcg_out_qemu_st_direct
29 2220e8189fb9 Jason Wang:
   virtio-net: correctly copy vnet header when flushing TX
30 a9c8ea95470c Ari Sundholm:
   block/blklogwrites: Fix a bug when logging "write zeroes" operations.
31 da62b507a205 Stefan Hajnoczi:
   iotests: add filter_qmp_generated_node_ids()
32 9ee2dd4c22a3 Stefan Hajnoczi:
   iotests: port 141 to Python for reliable QMP testing
33 effd60c87817 Stefan Hajnoczi:
   monitor: only run coroutine commands in qemu_aio_context
34 e8a12fe31f77 Daniel P. Berrangé:
   qtest: bump aspeed_smc-test timeout to 6 minutes
35 604927e357c2 Max Filippov:
   target/xtensa: fix OOB TLB entry access
36 6fffc8378562 Peter Maydell:
   target/arm: Fix A64 scalar SQSHRN and SQRSHRN

(commit(s) marked with * were in previous series and are not resent)

[Stable-7.2.9 24/30] virtio-net: correctly copy vnet header when flushing TX

2024-01-28 Thread Michael Tokarev

From: Jason Wang 

When HASH_REPORT is negotiated, the guest_hdr_len might be larger than
the size of the mergeable rx buffer header. Using
virtio_net_hdr_mrg_rxbuf during the header swap might lead a stack
overflow in this case. Fixing this by using virtio_net_hdr_v1_hash
instead.

Reported-by: Xiao Lei 
Cc: Yuri Benditovich 
Cc: qemu-sta...@nongnu.org
Cc: Mauro Matteo Cascella 
Fixes: CVE-2023-6693
Fixes: e22f0603fb2f ("virtio-net: reference implementation of hash report")
Reviewed-by: Michael Tokarev 
Signed-off-by: Jason Wang 
(cherry picked from commit 2220e8189fb94068dbad333228659fbac819abb0)
Signed-off-by: Michael Tokarev 

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 06f35ac2d8..412cba4927 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -664,6 +664,11 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
+/*
+ * Note: when extending the vnet header, please make sure to
+ * change the vnet header copying logic in virtio_net_flush_tx()
+ * as well.
+ */
 if (version_1) {
 n->guest_hdr_len = hash_report ?
 sizeof(struct virtio_net_hdr_v1_hash) :
@@ -2630,7 +2635,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 ssize_t ret;
 unsigned int out_num;
 struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
-struct virtio_net_hdr_mrg_rxbuf mhdr;
+struct virtio_net_hdr_v1_hash vhdr;
 
 elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
 if (!elem) {
@@ -2647,7 +2652,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 if (n->has_vnet_hdr) {
-if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
+if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
 virtqueue_detach_element(q->tx_vq, elem, 0);
@@ -2655,8 +2660,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 return -EINVAL;
 }
 if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) );
-sg2[0].iov_base = 
+virtio_net_hdr_swap(vdev, (void *) );
+sg2[0].iov_base = 
 sg2[0].iov_len = n->guest_hdr_len;
 out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
out_sg, out_num,
-- 
2.39.2

[Stable-8.1.5 26/36] accel/tcg: Revert mapping of PCREL translation block to multiple virtual addresses

2024-01-28 Thread Michael Tokarev

From: Paolo Bonzini 

This is causing regressions that have not been analyzed yet.  Revert the
change on stable branches.

Cc: qemu-sta...@nongnu.org
Cc: Michael Tokarev 
Related: https://gitlab.com/qemu-project/qemu/-/issues/2092
Signed-off-by: Paolo Bonzini 
Signed-off-by: Michael Tokarev 

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c724e8b6f1..d10c8eb956 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -182,7 +182,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
 
-if ((tb_cflags(tb) & CF_PCREL || tb->pc == desc->pc) &&
+if (tb->pc == desc->pc &&
 tb_page_addr0(tb) == desc->page_addr0 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
@@ -232,7 +232,7 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, 
vaddr pc,
 return NULL;
 }
 desc.page_addr0 = phys_pc;
-h = tb_hash_func(phys_pc, (cflags & CF_PCREL ? 0 : pc),
+h = tb_hash_func(phys_pc, pc,
  flags, cs_base, cflags);
 return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
 }
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 85684f2b3d..5c7a76bf88 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -46,7 +46,7 @@ static bool tb_cmp(const void *ap, const void *bp)
 const TranslationBlock *a = ap;
 const TranslationBlock *b = bp;
 
-return ((tb_cflags(a) & CF_PCREL || a->pc == b->pc) &&
+return (a->pc == b->pc &&
 a->cs_base == b->cs_base &&
 a->flags == b->flags &&
 (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
@@ -916,7 +916,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, 
bool rm_from_page_list)
 
 /* remove the TB from the hash list */
 phys_pc = tb_page_addr0(tb);
-h = tb_hash_func(phys_pc, (orig_cflags & CF_PCREL ? 0 : tb->pc),
+h = tb_hash_func(phys_pc, tb->pc,
  tb->flags, tb->cs_base, orig_cflags);
 if (!qht_remove(_ctx.htable, tb, h)) {
 return;
@@ -983,7 +983,7 @@ TranslationBlock *tb_link_page(TranslationBlock *tb)
 tb_record(tb);
 
 /* add in the hash table */
-h = tb_hash_func(tb_page_addr0(tb), (tb->cflags & CF_PCREL ? 0 : tb->pc),
+h = tb_hash_func(tb_page_addr0(tb), tb->pc,
  tb->flags, tb->cs_base, tb->cflags);
 qht_insert(_ctx.htable, tb, h, _tb);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index b2d4e22c17..678ddeff37 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -326,9 +326,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 gen_code_buf = tcg_ctx->code_gen_ptr;
 tb->tc.ptr = tcg_splitwx_to_rx(gen_code_buf);
-if (!(cflags & CF_PCREL)) {
-tb->pc = pc;
-}
+tb->pc = pc;
 tb->cs_base = cs_base;
 tb->flags = flags;
 tb->cflags = cflags;
-- 
2.39.2

[Stable-8.1.5 27/36] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status

2024-01-28 Thread Michael Tokarev

From: Fiona Ebner 

Using fleecing backup like in [0] on a qcow2 image (with metadata
preallocation) can lead to the following assertion failure:

> bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.

In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
will be set by the qcow2 driver, so the caller will recursively check
the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
chain, in bdrv_co_do_block_status() for the snapshot-access driver,
the assertion failure will happen, because both flags are set.

To fix it, clear the recurse flag after the recursive check was done.

In detail:

> #0  qcow2_co_block_status

Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID.

> #1  bdrv_co_do_block_status

Because of the data flag, bdrv_co_do_block_status() will now also set
BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
bdrv_co_do_block_status() for the bdrv_file child will be called,
which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.

Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.

> #2  bdrv_co_common_block_status_above
> #3  bdrv_co_block_status_above
> #4  bdrv_co_block_status
> #5  cbw_co_snapshot_block_status
> #6  bdrv_co_snapshot_block_status
> #7  snapshot_access_co_block_status
> #8  bdrv_co_do_block_status

Return value is propagated all the way up to here, where the assertion
failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
both set.

> #9  bdrv_co_common_block_status_above
> #10 bdrv_co_block_status_above
> #11 block_copy_block_status
> #12 block_copy_dirty_clusters
> #13 block_copy_common
> #14 block_copy_async_co_entry
> #15 coroutine_trampoline

[0]:

> #!/bin/bash
> rm /tmp/disk.qcow2
> ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev 
> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> --blockdev 
> qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> --blockdev 
> qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> < {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", 
> "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", 
> "file": "node3", "node-name": "snap0" } }
> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": 
> "node1", "sync": "full", "job-id": "backup0" } }
> EOF

Signed-off-by: Fiona Ebner 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20240116154839.401030-1-f.eb...@proxmox.com
Signed-off-by: Stefan Hajnoczi 
(cherry picked from commit 8a9be7992426c8920d4178e7dca59306a18c7a3a)
Signed-off-by: Michael Tokarev 

diff --git a/block/io.c b/block/io.c
index 055fcf7438..83d1b1dfdc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2585,6 +2585,16 @@ bdrv_co_block_status(BlockDriverState *bs, bool 
want_zero,
 ret |= (ret2 & BDRV_BLOCK_ZERO);
 }
 }
+
+/*
+ * Now that the recursive search was done, clear the flag. Otherwise,
+ * with more complicated block graphs like snapshot-access ->
+ * copy-before-write -> qcow2, where the return value will be 
propagated
+ * further up to a parent bdrv_co_do_block_status() call, both the
+ * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which is
+ * not allowed.
+ */
+ret &= ~BDRV_BLOCK_RECURSE;
 }
 
 out:
-- 
2.39.2

[Stable-7.2.9 v3 00/30] Patch Round-up for stable 7.2.9, frozen on 2024-01-27

2024-01-28 Thread Michael Tokarev

The following patches are queued for QEMU stable v7.2.9:

  https://gitlab.com/qemu-project/qemu/-/commits/staging-7.2

Patch freeze is 2024-01-27 (frozen), and the release is planned for 2024-01-29:

  https://wiki.qemu.org/Planning/7.2

Please respond here or CC qemu-sta...@nongnu.org on any additional patches
you think should (or shouldn't) be included in the release.

The changes which are staging for inclusion, with the original commit hash
from master branch, are given below the bottom line.

Thanks!

/mjt

--
01* d3007d348ada Kevin Wolf:
   block: Fix crash when loading snapshot on inactive node
02* 5a7f21efaf99 Kevin Wolf:
   vl: Improve error message for conflicting -incoming and -loadvm
03* bb6e2511eb48 Kevin Wolf:
   iotests: Basic tests for internal snapshots
04* 5cb0e7abe163 Xu Lu:
   target/riscv: Fix mcycle/minstret increment behavior
05* 4ad87cd4b225 Michael Tokarev:
   chardev/char.c: fix "abstract device type" error message
06* 82a65e3188ab Peter Maydell:
   hw/intc/arm_gicv3_cpuif: handle LPIs in in the list registers
07* e358a25a97c7 Ilya Leoshkevich:
   target/s390x: Fix LAE setting a wrong access register
08* 52a21689cd82 Peter Maydell:
   .gitlab-ci.d/buildtest.yml: Work around htags bug when environment is large
09* 410c2a4d75f5 Anastasia Belova:
   load_elf: fix iterator's type for elf file processing
10* b5e0d5d22fbf Richard Henderson:
   target/i386: Fix 32-bit wrapping of pc/eip computation
11* a58506b748b8 Richard Henderson:
   target/i386: Do not re-compute new pc with CF_PCREL
12* 2926eab89699 guoguangyao:
   target/i386: fix incorrect EIP in PC-relative translation blocks
13* 729ba8e933f8 Paolo Bonzini:
   target/i386: pcrel: store low bits of physical address in data[0]
14* 3b14a555fdb6 Gerd Hoffmann:
   hw/pflash: refactor pflash_data_write()
15* 5dd58358a570 Gerd Hoffmann:
   hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p
16* 284a7ee2e290 Gerd Hoffmann:
   hw/pflash: implement update buffer for block writes
17* 84a6835e004c Mark Cave-Ayland:
   hw/scsi/esp-pci: use correct address register for PCI DMA transfers
18* 6b41417d934b Mark Cave-Ayland:
   hw/scsi/esp-pci: generate PCI interrupt from separate ESP and PCI sources
19* 1e8e6644e063 Mark Cave-Ayland:
   hw/scsi/esp-pci: synchronise setting of DMA_STAT_DONE with ESP completion 
   interrupt
20* c2d7de557d19 Mark Cave-Ayland:
   hw/scsi/esp-pci: set DMA_STAT_BCMBLT when BLAST command issued
21 b16a45bc5e0e Alex Bennée:
   readthodocs: fully specify a build environment
22 b67924a048 Paolo Bonzini:
   accel/tcg: Revert mapping of PCREL translation block to multiple virtual 
   addresses
23 8a9be7992426 Fiona Ebner:
   block/io: clear BDRV_BLOCK_RECURSE flag after recursing in 
   bdrv_co_block_status
24 2220e8189fb9 Jason Wang:
   virtio-net: correctly copy vnet header when flushing TX
25 a9c8ea95470c Ari Sundholm:
   block/blklogwrites: Fix a bug when logging "write zeroes" operations.
26 da62b507a205 Stefan Hajnoczi:
   iotests: add filter_qmp_generated_node_ids()
27 9ee2dd4c22a3 Stefan Hajnoczi:
   iotests: port 141 to Python for reliable QMP testing
28 effd60c87817 Stefan Hajnoczi:
   monitor: only run coroutine commands in qemu_aio_context
29 e8a12fe31f77 Daniel P. Berrangé:
   qtest: bump aspeed_smc-test timeout to 6 minutes
30 604927e357c2 Max Filippov:
   target/xtensa: fix OOB TLB entry access

(commit(s) marked with * were in previous series and are not resent)

[Stable-7.2.9 30/30] target/xtensa: fix OOB TLB entry access

2024-01-28 Thread Michael Tokarev

From: Max Filippov 

r[id]tlb[01], [iw][id]tlb opcodes use TLB way index passed in a register
by the guest. The host uses 3 bits of the index for ITLB indexing and 4
bits for DTLB, but there's only 7 entries in the ITLB array and 10 in
the DTLB array, so a malicious guest may trigger out-of-bound access to
these arrays.

Change split_tlb_entry_spec return type to bool to indicate whether TLB
way passed to it is valid. Change get_tlb_entry to return NULL in case
invalid TLB way is requested. Add assertion to xtensa_tlb_get_entry that
requested TLB way and entry indices are valid. Add checks to the
[rwi]tlb helpers that requested TLB way is valid and return 0 or do
nothing when it's not.

Cc: qemu-sta...@nongnu.org
Fixes: b67ea0cd7441 ("target-xtensa: implement memory protection options")
Signed-off-by: Max Filippov 
Reviewed-by: Peter Maydell 
Message-id: 20231215120307.545381-1-jcmvb...@gmail.com
Signed-off-by: Peter Maydell 
(cherry picked from commit 604927e357c2b292c70826e4ce42574ad126ef32)
Signed-off-by: Michael Tokarev 

diff --git a/target/xtensa/mmu_helper.c b/target/xtensa/mmu_helper.c
index fa66e8e867..c4b4df4d74 100644
--- a/target/xtensa/mmu_helper.c
+++ b/target/xtensa/mmu_helper.c
@@ -226,22 +226,31 @@ static void split_tlb_entry_spec_way(const CPUXtensaState 
*env, uint32_t v,
  * Split TLB address into TLB way, entry index and VPN (with index).
  * See ISA, 4.6.5.5 - 4.6.5.8 for the TLB addressing format
  */
-static void split_tlb_entry_spec(CPUXtensaState *env, uint32_t v, bool dtlb,
-uint32_t *vpn, uint32_t *wi, uint32_t *ei)
+static bool split_tlb_entry_spec(CPUXtensaState *env, uint32_t v, bool dtlb,
+ uint32_t *vpn, uint32_t *wi, uint32_t *ei)
 {
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 *wi = v & (dtlb ? 0xf : 0x7);
-split_tlb_entry_spec_way(env, v, dtlb, vpn, *wi, ei);
+if (*wi < (dtlb ? env->config->dtlb.nways : env->config->itlb.nways)) {
+split_tlb_entry_spec_way(env, v, dtlb, vpn, *wi, ei);
+return true;
+} else {
+return false;
+}
 } else {
 *vpn = v & REGION_PAGE_MASK;
 *wi = 0;
 *ei = (v >> 29) & 0x7;
+return true;
 }
 }
 
 static xtensa_tlb_entry *xtensa_tlb_get_entry(CPUXtensaState *env, bool dtlb,
   unsigned wi, unsigned ei)
 {
+const xtensa_tlb *tlb = dtlb ? >config->dtlb : >config->itlb;
+
+assert(wi < tlb->nways && ei < tlb->way_size[wi]);
 return dtlb ?
 env->dtlb[wi] + ei :
 env->itlb[wi] + ei;
@@ -254,11 +263,14 @@ static xtensa_tlb_entry *get_tlb_entry(CPUXtensaState 
*env,
 uint32_t wi;
 uint32_t ei;
 
-split_tlb_entry_spec(env, v, dtlb, , , );
-if (pwi) {
-*pwi = wi;
+if (split_tlb_entry_spec(env, v, dtlb, , , )) {
+if (pwi) {
+*pwi = wi;
+}
+return xtensa_tlb_get_entry(env, dtlb, wi, ei);
+} else {
+return NULL;
 }
-return xtensa_tlb_get_entry(env, dtlb, wi, ei);
 }
 
 static void xtensa_tlb_set_entry_mmu(const CPUXtensaState *env,
@@ -484,7 +496,12 @@ uint32_t HELPER(rtlb0)(CPUXtensaState *env, uint32_t v, 
uint32_t dtlb)
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 uint32_t wi;
 const xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, );
-return (entry->vaddr & get_vpn_mask(env, dtlb, wi)) | entry->asid;
+
+if (entry) {
+return (entry->vaddr & get_vpn_mask(env, dtlb, wi)) | entry->asid;
+} else {
+return 0;
+}
 } else {
 return v & REGION_PAGE_MASK;
 }
@@ -493,7 +510,12 @@ uint32_t HELPER(rtlb0)(CPUXtensaState *env, uint32_t v, 
uint32_t dtlb)
 uint32_t HELPER(rtlb1)(CPUXtensaState *env, uint32_t v, uint32_t dtlb)
 {
 const xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, NULL);
-return entry->paddr | entry->attr;
+
+if (entry) {
+return entry->paddr | entry->attr;
+} else {
+return 0;
+}
 }
 
 void HELPER(itlb)(CPUXtensaState *env, uint32_t v, uint32_t dtlb)
@@ -501,7 +523,7 @@ void HELPER(itlb)(CPUXtensaState *env, uint32_t v, uint32_t 
dtlb)
 if (xtensa_option_enabled(env->config, XTENSA_OPTION_MMU)) {
 uint32_t wi;
 xtensa_tlb_entry *entry = get_tlb_entry(env, v, dtlb, );
-if (entry->variable && entry->asid) {
+if (entry && entry->variable && entry->asid) {
 tlb_flush_page(env_cpu(env), entry->vaddr);
 entry->asid = 0;
 }
@@ -539,8 +561,9 @@ void HELPER(wtlb)(CPUXtensaState *env, uint32_t p, uint32_t 
v, uint32_t dtlb)
 uint32_t vpn;
 uint32_t wi;
 uint32_t ei;
-split_tlb_entry_spec(env, v, dtlb, , , );
-xtensa_tlb_set_entry(env, dtlb, wi, ei, vpn, p);
+if (split_tlb_entry_spec(env, v, dtlb, , , )) {
+xtensa_tlb_set_entry(env, dtlb, wi, ei, vpn, p);
+}
 }
 
 /*!
-- 
2.39.2

[Stable-7.2.9 28/30] monitor: only run coroutine commands in qemu_aio_context

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not
polled during nested event loops. The coroutine currently reschedules
itself in the main loop's qemu_aio_context AioContext, which is polled
during nested event loops. One known problem is that QMP device-add
calls drain_call_rcu(), which temporarily drops the BQL, leading to all
sorts of havoc like other vCPU threads re-entering device emulation code
while another vCPU thread is waiting in device emulation code with
aio_poll().

Paolo Bonzini suggested running non-coroutine QMP handlers in the
iohandler AioContext. This avoids trouble with nested event loops. His
original idea was to move coroutine rescheduling to
monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch()
because we don't know if the QMP handler needs to run in coroutine
context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have
been nicer since it's associated with the monitor implementation and not
as general as qmp_dispatch(), which is also used by qemu-ga.

A number of qemu-iotests need updated .out files because the order of
QMP events vs QMP responses has changed.

Solves Issue #1933.

Cc: qemu-sta...@nongnu.org
Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use 
drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Buglink: https://issues.redhat.com/browse/RHEL-17369
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-4-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Tested-by: Fiona Ebner 
Signed-off-by: Kevin Wolf 
(cherry picked from commit effd60c878176bcaf97fa7ce2b12d04bb8ead6f7)
Signed-off-by: Michael Tokarev 
(Mjt: omit changes to tests missing in 7.2)

diff --git a/monitor/qmp.c b/monitor/qmp.c
index 092c527b6f..acd0a350c2 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -296,14 +296,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 qemu_coroutine_yield();
 }
 
-/*
- * Move the coroutine from iohandler_ctx to qemu_aio_context for
- * executing the command handler so that it can make progress if it
- * involves an AIO_WAIT_WHILE().
- */
-aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
-
 /* Process request */
 if (req_obj->req) {
 if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) {
@@ -330,15 +322,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 }
 
 qmp_request_free(req_obj);
-
-/*
- * Yield and reschedule so the main loop stays responsive.
- *
- * Move back to iohandler_ctx so that nested event loops for
- * qemu_aio_context don't start new monitor commands.
- */
-aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
 }
 }
 
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 0990873ec8..5d000fae87 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -206,9 +206,31 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject 
*request,
 assert(!(oob && qemu_in_coroutine()));
 assert(monitor_cur() == NULL);
 if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
+if (qemu_in_coroutine()) {
+/*
+ * Move the coroutine from iohandler_ctx to qemu_aio_context for
+ * executing the command handler so that it can make progress if it
+ * involves an AIO_WAIT_WHILE().
+ */
+aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
+}
+
 monitor_set_cur(qemu_coroutine_self(), cur_mon);
 cmd->fn(args, , );
 monitor_set_cur(qemu_coroutine_self(), NULL);
+
+if (qemu_in_coroutine()) {
+/*
+ * Yield and reschedule so the main loop stays responsive.
+ *
+ * Move back to iohandler_ctx so that nested event loops for
+ * qemu_aio_context don't start new monitor commands.
+ */
+aio_co_schedule(iohandler_get_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
+}
 } else {
/*
 * Actual context doesn't match the one the command needs.
@@ -232,7 +254,7 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject 
*request,
 .errp   = ,
 .co = qemu_coroutine_self(),
 };
-aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
+aio_bh_schedule_oneshot(iohandler_get_aio_context(), 
do_qmp_dispatch_bh,
 );
 qemu_coroutine_yield();
 }
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 329977d9b9..a37bf446e9 100644
---

[Stable-7.2.9 22/30] accel/tcg: Revert mapping of PCREL translation block to multiple virtual addresses

2024-01-28 Thread Michael Tokarev

From: Paolo Bonzini 

This is causing regressions that have not been analyzed yet.  Revert the
change on stable branches.

Cc: qemu-sta...@nongnu.org
Cc: Michael Tokarev 
Related: https://gitlab.com/qemu-project/qemu/-/issues/2092
Signed-off-by: Paolo Bonzini 
Signed-off-by: Michael Tokarev 

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 356fe348de..68fef3e01f 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -186,7 +186,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
 
-if ((TARGET_TB_PCREL || tb_pc(tb) == desc->pc) &&
+if (tb_pc(tb) == desc->pc &&
 tb_page_addr0(tb) == desc->page_addr0 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
@@ -238,7 +238,7 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 return NULL;
 }
 desc.page_addr0 = phys_pc;
-h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : pc),
+h = tb_hash_func(phys_pc, pc,
  flags, cflags, *cpu->trace_dstate);
 return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
 }
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 0cdb35548c..9d9f651c78 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -34,7 +34,7 @@ static bool tb_cmp(const void *ap, const void *bp)
 const TranslationBlock *a = ap;
 const TranslationBlock *b = bp;
 
-return ((TARGET_TB_PCREL || tb_pc(a) == tb_pc(b)) &&
+return (tb_pc(a) == tb_pc(b) &&
 a->cs_base == b->cs_base &&
 a->flags == b->flags &&
 (tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
@@ -269,7 +269,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, 
bool rm_from_page_list)
 
 /* remove the TB from the hash list */
 phys_pc = tb_page_addr0(tb);
-h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
+h = tb_hash_func(phys_pc, tb_pc(tb),
  tb->flags, orig_cflags, tb->trace_vcpu_dstate);
 if (!qht_remove(_ctx.htable, tb, h)) {
 return;
@@ -459,7 +459,7 @@ TranslationBlock *tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 }
 
 /* add in the hash table */
-h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : tb_pc(tb)),
+h = tb_hash_func(phys_pc, tb_pc(tb),
  tb->flags, tb->cflags, tb->trace_vcpu_dstate);
 qht_insert(_ctx.htable, tb, h, _tb);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ac3ee3740c..ed8ddee6e8 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -818,9 +818,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 gen_code_buf = tcg_ctx->code_gen_ptr;
 tb->tc.ptr = tcg_splitwx_to_rx(gen_code_buf);
-#if !TARGET_TB_PCREL
 tb->pc = pc;
-#endif
 tb->cs_base = cs_base;
 tb->flags = flags;
 tb->cflags = cflags;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 9b7bfbf09a..db677c856b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -503,7 +503,6 @@ struct tb_tc {
 };
 
 struct TranslationBlock {
-#if !TARGET_TB_PCREL
 /*
  * Guest PC corresponding to this block.  This must be the true
  * virtual address.  Therefore e.g. x86 stores EIP + CS_BASE, and
@@ -518,7 +517,6 @@ struct TranslationBlock {
  * deposited into the "current" PC.
  */
 target_ulong pc;
-#endif
 
 /*
  * Target-specific data associated with the TranslationBlock, e.g.:
@@ -604,11 +602,7 @@ struct TranslationBlock {
 /* Hide the read to avoid ifdefs for TARGET_TB_PCREL. */
 static inline target_ulong tb_pc(const TranslationBlock *tb)
 {
-#if TARGET_TB_PCREL
-qemu_build_not_reached();
-#else
 return tb->pc;
-#endif
 }
 
 /* Hide the qatomic_read to make code a little easier on the eyes */
-- 
2.39.2

[Stable-7.2.9 25/30] block/blklogwrites: Fix a bug when logging "write zeroes" operations.

2024-01-28 Thread Michael Tokarev

From: Ari Sundholm 

There is a bug in the blklogwrites driver pertaining to logging "write
zeroes" operations, causing log corruption. This can be easily observed
by setting detect-zeroes to something other than "off" for the driver.

The issue is caused by a concurrency bug pertaining to the fact that
"write zeroes" operations have to be logged in two parts: first the log
entry metadata, then the zeroed-out region. While the log entry
metadata is being written by bdrv_co_pwritev(), another operation may
begin in the meanwhile and modify the state of the blklogwrites driver.
This is as intended by the coroutine-driven I/O model in QEMU, of
course.

Unfortunately, this specific scenario is mishandled. A short example:
1. Initially, in the current operation (#1), the current log sector
number in the driver state is only incremented by the number of sectors
taken by the log entry metadata, after which the log entry metadata is
written. The current operation yields.
2. Another operation (#2) may start while the log entry metadata is
being written. It uses the current log position as the start offset for
its log entry. This is in the sector right after the operation #1 log
entry metadata, which is bad!
3. After bdrv_co_pwritev() returns (#1), the current log sector
number is reread from the driver state in order to find out the start
offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the
offset will be the sector right after the (misplaced) operation #2 log
entry, which means that the zeroed-out region begins at the wrong
offset.
4. As a result of the above, the log is corrupt.

Fix this by only reading the driver metadata once, computing the
offsets and sizes in one go (including the optional zeroed-out region)
and setting the log sector number to the appropriate value for the next
operation in line.

Signed-off-by: Ari Sundholm 
Cc: qemu-sta...@nongnu.org
Message-ID: <20240109184646.1128475-1-meg...@gmx.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit a9c8ea95470c27a8a02062b67f9fa6940e828ab6)
Signed-off-by: Michael Tokarev 

diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index cef9efe55d..ad589c4a2e 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -321,22 +321,39 @@ typedef struct {
 static void coroutine_fn blk_log_writes_co_do_log(BlkLogWritesLogReq *lr)
 {
 BDRVBlkLogWritesState *s = lr->bs->opaque;
-uint64_t cur_log_offset = s->cur_log_sector << s->sectorbits;
 
-s->nr_entries++;
-s->cur_log_sector +=
-ROUND_UP(lr->qiov->size, s->sectorsize) >> s->sectorbits;
+/*
+ * Determine the offsets and sizes of different parts of the entry, and
+ * update the state of the driver.
+ *
+ * This needs to be done in one go, before any actual I/O is done, as the
+ * log entry may have to be written in two parts, and the state of the
+ * driver may be modified by other driver operations while waiting for the
+ * I/O to complete.
+ */
+const uint64_t entry_start_sector = s->cur_log_sector;
+const uint64_t entry_offset = entry_start_sector << s->sectorbits;
+const uint64_t qiov_aligned_size = ROUND_UP(lr->qiov->size, s->sectorsize);
+const uint64_t entry_aligned_size = qiov_aligned_size +
+ROUND_UP(lr->zero_size, s->sectorsize);
+const uint64_t entry_nr_sectors = entry_aligned_size >> s->sectorbits;
 
-lr->log_ret = bdrv_co_pwritev(s->log_file, cur_log_offset, lr->qiov->size,
+s->nr_entries++;
+s->cur_log_sector += entry_nr_sectors;
+
+/*
+ * Write the log entry. Note that if this is a "write zeroes" operation,
+ * only the entry header is written here, with the zeroing being done
+ * separately below.
+ */
+lr->log_ret = bdrv_co_pwritev(s->log_file, entry_offset, lr->qiov->size,
   lr->qiov, 0);
 
 /* Logging for the "write zeroes" operation */
 if (lr->log_ret == 0 && lr->zero_size) {
-cur_log_offset = s->cur_log_sector << s->sectorbits;
-s->cur_log_sector +=
-ROUND_UP(lr->zero_size, s->sectorsize) >> s->sectorbits;
+const uint64_t zeroes_offset = entry_offset + qiov_aligned_size;
 
-lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, cur_log_offset,
+lr->log_ret = bdrv_co_pwrite_zeroes(s->log_file, zeroes_offset,
 lr->zero_size, 0);
 }
 
-- 
2.39.2

[Stable-7.2.9 26/30] iotests: add filter_qmp_generated_node_ids()

2024-01-28 Thread Michael Tokarev

From: Stefan Hajnoczi 

Add a filter function for QMP responses that contain QEMU's
automatically generated node ids. The ids change between runs and must
be masked in the reference output.

The next commit will use this new function.

Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240118144823.1497953-2-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
(cherry picked from commit da62b507a20510d819bcfbe8f5e573409b954006)
Signed-off-by: Michael Tokarev 
(Mjt: context fix in tests/qemu-iotests/iotests.py due to
 v7.2.0-939-gbcc6777ad6 "iotests: Filter child node information")

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index da7d6637e1..6f7c3c0e44 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -642,6 +642,13 @@ def _filter(_key, value):
 def filter_generated_node_ids(msg):
 return re.sub("#block[0-9]+", "NODE_NAME", msg)
 
+def filter_qmp_generated_node_ids(qmsg):
+def _filter(_key, value):
+if is_str(value):
+return filter_generated_node_ids(value)
+return value
+return filter_qmp(qmsg, _filter)
+
 def filter_img_info(output, filename):
 lines = []
 for line in output.split('\n'):
-- 
2.39.2

[Stable-7.2.9 21/30] readthodocs: fully specify a build environment

2024-01-28 Thread Michael Tokarev

From: Alex Bennée 

This is now expected by rtd so I've expanded using their example as
22.04 is one of our supported platforms. I tried to work out if there
was an easy way to re-generate a requirements.txt from our
pythondeps.toml but in the end went for the easier solution.

Cc:  
Signed-off-by: Alex Bennée 
Message-Id: <20231221174200.2693694-1-alex.ben...@linaro.org>
(cherry picked from commit b16a45bc5e0e329a16af8a2e020a6e7044f9afa2)
Signed-off-by: Michael Tokarev 

diff --git a/.readthedocs.yml b/.readthedocs.yml
index 7fb7b8dd61..0b262469ce 100644
--- a/.readthedocs.yml
+++ b/.readthedocs.yml
@@ -5,16 +5,21 @@
 # Required
 version: 2
 
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+python: "3.11"
+
 # Build documentation in the docs/ directory with Sphinx
 sphinx:
   configuration: docs/conf.py
 
+# We recommend specifying your dependencies to enable reproducible builds:
+# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+  install:
+- requirements: docs/requirements.txt
+
 # We want all the document formats
 formats: all
-
-# For consistency, we require that QEMU's Sphinx extensions
-# run with at least the same minimum version of Python that
-# we require for other Python in our codebase (our conf.py
-# enforces this, and some code needs it.)
-python:
-  version: 3.6
diff --git a/docs/requirements.txt b/docs/requirements.txt
new file mode 100644
index 00..691e5218ec
--- /dev/null
+++ b/docs/requirements.txt
@@ -0,0 +1,2 @@
+sphinx==5.3.0
+sphinx_rtd_theme==1.1.1
-- 
2.39.2

Re: [PATCH 07/33] target: Uninline cpu_mmu_index()

2024-01-28 Thread Philippe Mathieu-Daudé


On 28/1/24 17:41, Philippe Mathieu-Daudé wrote:

On 28/1/24 05:41, Richard Henderson wrote:

From: Anton Johansson 

Uninlines the target-defined cpu_mmu_index() function by moving its
definition to target/*/cpu.c.  This allows for compiling memory access
functions in accel/tcg/cputlb.c without having to know target specifics.

Signed-off-by: Anton Johansson 
Message-Id: <20240119144024.14289-13-a...@rev.ng>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
  include/exec/cpu-common.h | 10 ++
  target/alpha/cpu.h    |  9 -
  target/arm/cpu.h  | 13 -
  target/avr/cpu.h  |  7 ---
  target/cris/cpu.h |  4 
  target/hexagon/cpu.h  |  9 -
  target/hppa/cpu.h | 13 -
  target/i386/cpu.h |  7 ---
  target/loongarch/cpu.h    | 12 
  target/m68k/cpu.h |  4 
  target/microblaze/cpu.h   | 15 ---
  target/mips/cpu.h |  5 -
  target/nios2/cpu.h    |  6 --
  target/openrisc/cpu.h | 12 
  target/ppc/cpu.h  |  8 
  target/riscv/cpu.h    |  3 ---
  target/rx/cpu.h   |  5 -
  target/s390x/cpu.h    | 31 ---
  target/sh4/cpu.h  | 10 --
  target/sparc/cpu.h    | 28 
  target/tricore/cpu.h  |  5 -
  target/xtensa/cpu.h   |  5 -
  target/alpha/cpu.c    |  8 
  target/arm/cpu.c  |  5 +
  target/avr/cpu.c  |  5 +
  target/cris/cpu.c |  4 
  target/hexagon/cpu.c  |  9 +
  target/hppa/cpu.c | 13 +
  target/i386/cpu.c |  7 +++
  target/loongarch/cpu.c    | 12 
  target/m68k/cpu.c |  5 +
  target/microblaze/cpu.c   | 16 
  target/mips/cpu.c |  5 +
  target/nios2/cpu.c    |  6 ++
  target/openrisc/cpu.c | 12 
  target/ppc/cpu.c  |  9 +
  target/riscv/cpu_helper.c |  2 +-
  target/rx/cpu.c   |  5 +
  target/s390x/cpu.c    | 31 +++
  target/sh4/cpu.c  | 13 +
  target/sparc/cpu.c    | 28 
  target/tricore/cpu.c  |  5 +
  target/xtensa/cpu.c   |  4 
  43 files changed, 213 insertions(+), 212 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 3109c6b67d..4724135f30 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -34,6 +34,16 @@ void cpu_list_lock(void);
  void cpu_list_unlock(void);
  unsigned int cpu_list_generation_id_get(void);
+/**
+ * cpu_mmu_index:
+ * @env: The cpu environment
+ * @ifetch: True for code access, false for data access.
+ *
+ * Return the core mmu index for the current translation regime.
+ * This function is used by generic TCG code paths.
+ */
+int cpu_mmu_index(CPUArchState *env, bool ifetch);
+
  void tcg_iommu_init_notifier_list(CPUState *cpu);
  void tcg_iommu_free_notifier_list(CPUState *cpu);


I'm kind of reluctant to use CPUArchState in a -common.h header
(except in include/hw/core/cpu.h::cpu_env).

Last Wednesday community call I mentioned to Anton I have a branch
going in the same direction he is taking, and suggested him to wait
to compare and unify our works.


My bad I suppose, I should have replied to Anton series cover
to update you about the plan discussed during that call.

Also, if "This allows for compiling memory access functions in
accel/tcg/cputlb.c without having to know target specifics." I'd
rather see the corresponding meson change in the same patch.

Re: [PATCH 07/33] target: Uninline cpu_mmu_index()

2024-01-28 Thread Philippe Mathieu-Daudé


On 28/1/24 05:41, Richard Henderson wrote:

From: Anton Johansson 

Uninlines the target-defined cpu_mmu_index() function by moving its
definition to target/*/cpu.c.  This allows for compiling memory access
functions in accel/tcg/cputlb.c without having to know target specifics.

Signed-off-by: Anton Johansson 
Message-Id: <20240119144024.14289-13-a...@rev.ng>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
  include/exec/cpu-common.h | 10 ++
  target/alpha/cpu.h|  9 -
  target/arm/cpu.h  | 13 -
  target/avr/cpu.h  |  7 ---
  target/cris/cpu.h |  4 
  target/hexagon/cpu.h  |  9 -
  target/hppa/cpu.h | 13 -
  target/i386/cpu.h |  7 ---
  target/loongarch/cpu.h| 12 
  target/m68k/cpu.h |  4 
  target/microblaze/cpu.h   | 15 ---
  target/mips/cpu.h |  5 -
  target/nios2/cpu.h|  6 --
  target/openrisc/cpu.h | 12 
  target/ppc/cpu.h  |  8 
  target/riscv/cpu.h|  3 ---
  target/rx/cpu.h   |  5 -
  target/s390x/cpu.h| 31 ---
  target/sh4/cpu.h  | 10 --
  target/sparc/cpu.h| 28 
  target/tricore/cpu.h  |  5 -
  target/xtensa/cpu.h   |  5 -
  target/alpha/cpu.c|  8 
  target/arm/cpu.c  |  5 +
  target/avr/cpu.c  |  5 +
  target/cris/cpu.c |  4 
  target/hexagon/cpu.c  |  9 +
  target/hppa/cpu.c | 13 +
  target/i386/cpu.c |  7 +++
  target/loongarch/cpu.c| 12 
  target/m68k/cpu.c |  5 +
  target/microblaze/cpu.c   | 16 
  target/mips/cpu.c |  5 +
  target/nios2/cpu.c|  6 ++
  target/openrisc/cpu.c | 12 
  target/ppc/cpu.c  |  9 +
  target/riscv/cpu_helper.c |  2 +-
  target/rx/cpu.c   |  5 +
  target/s390x/cpu.c| 31 +++
  target/sh4/cpu.c  | 13 +
  target/sparc/cpu.c| 28 
  target/tricore/cpu.c  |  5 +
  target/xtensa/cpu.c   |  4 
  43 files changed, 213 insertions(+), 212 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 3109c6b67d..4724135f30 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -34,6 +34,16 @@ void cpu_list_lock(void);
  void cpu_list_unlock(void);
  unsigned int cpu_list_generation_id_get(void);
  
+/**

+ * cpu_mmu_index:
+ * @env: The cpu environment
+ * @ifetch: True for code access, false for data access.
+ *
+ * Return the core mmu index for the current translation regime.
+ * This function is used by generic TCG code paths.
+ */
+int cpu_mmu_index(CPUArchState *env, bool ifetch);
+
  void tcg_iommu_init_notifier_list(CPUState *cpu);
  void tcg_iommu_free_notifier_list(CPUState *cpu);


I'm kind of reluctant to use CPUArchState in a -common.h header
(except in include/hw/core/cpu.h::cpu_env).

Last Wednesday community call I mentioned to Anton I have a branch
going in the same direction he is taking, and suggested him to wait
to compare and unify our works.

Re: [PATCH 00/33] tcg patch queue, pre-pull

2024-01-28 Thread Philippe Mathieu-Daudé


On 28/1/24 05:41, Richard Henderson wrote:

Collect some patch sets, cherry-pick from others, with a few
changes of my own.  Patches that lack review:

   26-include-qemu-Add-TCGCPUOps-typedef-to-typedefs.h.patch
   27-target-loongarch-Constify-loongarch_tcg_ops.patch
   28-accel-tcg-Use-CPUState.cc-instead-of-CPU_GET_CLASS-i.patch
   31-accel-tcg-Inline-need_replay_interrupt.patch


r~


Anton Johansson (11):
   include/exec: Move vaddr defines to separate file
   hw/core: Include vaddr.h from cpu.h
   target: Use vaddr in gen_intermediate_code
   include/exec: Use vaddr in DisasContextBase for virtual addresses
   include/exec: typedef abi_ptr to vaddr
   target: Uninline cpu_mmu_index()
   target: Uninline cpu_get_tb_cpu_state()
   include/exec: Move PAGE_* macros to common header
   include/exec: Move cpu_*()/cpu_env() to common header
   include/hw/core: Move do_interrupt in TCGCPUOps
   include/hw/core: Remove i386 conditional on fake_user_interrupt


Nitpick: preferably s/^include\// in subject :)

Re: [PATCH 10/33] include/exec: Move cpu_*()/cpu_env() to common header

2024-01-28 Thread Philippe Mathieu-Daudé


On 28/1/24 05:41, Richard Henderson wrote:

From: Anton Johansson 

Functions are target independent.

Signed-off-by: Anton Johansson 
Message-Id: <20240119144024.14289-17-a...@rev.ng>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
  include/exec/cpu-all.h| 25 -
  include/exec/cpu-common.h | 26 ++
  2 files changed, 26 insertions(+), 25 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 16/33] accel/tcg/cpu-exec: Use RCU_READ_LOCK_GUARD

2024-01-28 Thread Philippe Mathieu-Daudé


On 28/1/24 05:41, Richard Henderson wrote:

From: Philippe Mathieu-Daudé 

Replace the manual rcu_read_(un)lock calls in cpu_exec().

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20240124074201.8239-2-phi...@linaro.org>
[rth: Use RCU_READ_LOCK_GUARD not WITH_RCU_READ_LOCK_GUARD]


Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Richard Henderson 
---
  accel/tcg/cpu-exec.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 40c268bfa1..950dad63cb 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -1050,7 +1050,7 @@ int cpu_exec(CPUState *cpu)
  return EXCP_HALTED;
  }
  
-rcu_read_lock();

+RCU_READ_LOCK_GUARD();
  cpu_exec_enter(cpu);
  
  /*

@@ -1064,8 +1064,6 @@ int cpu_exec(CPUState *cpu)
  ret = cpu_exec_setjmp(cpu, );
  
  cpu_exec_exit(cpu);

-rcu_read_unlock();
-
  return ret;
  }

Re: [PATCH 26/33] include/qemu: Add TCGCPUOps typedef to typedefs.h

2024-01-28 Thread Philippe Mathieu-Daudé


On 28/1/24 05:42, Richard Henderson wrote:

QEMU coding style recommends using structure typedefs.

Signed-off-by: Richard Henderson 
---
  include/hw/core/cpu.h  | 5 +
  include/qemu/typedefs.h| 1 +
  bsd-user/signal.c  | 4 ++--
  linux-user/signal.c| 4 ++--
  target/alpha/cpu.c | 2 +-
  target/arm/cpu.c   | 2 +-
  target/arm/tcg/cpu32.c | 2 +-
  target/avr/cpu.c   | 2 +-
  target/cris/cpu.c  | 4 ++--
  target/hexagon/cpu.c   | 2 +-
  target/hppa/cpu.c  | 2 +-
  target/i386/tcg/tcg-cpu.c  | 2 +-
  target/loongarch/cpu.c | 2 +-
  target/m68k/cpu.c  | 2 +-
  target/microblaze/cpu.c| 2 +-
  target/mips/cpu.c  | 2 +-
  target/nios2/cpu.c | 2 +-
  target/openrisc/cpu.c  | 2 +-
  target/ppc/cpu_init.c  | 2 +-
  target/riscv/tcg/tcg-cpu.c | 2 +-
  target/rx/cpu.c| 2 +-
  target/s390x/cpu.c | 2 +-
  target/sh4/cpu.c   | 2 +-
  target/sparc/cpu.c | 2 +-
  target/tricore/cpu.c   | 2 +-
  target/xtensa/cpu.c| 2 +-
  26 files changed, 29 insertions(+), 31 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 10/23] target/i386: Prefer fast cpu_env() over slower CPU QOM cast macro

2024-01-28 Thread Philippe Mathieu-Daudé


On 27/1/24 13:21, Zhao Liu wrote:

Hi Philippe,

On Fri, Jan 26, 2024 at 11:03:52PM +0100, Philippe Mathieu-Daudé wrote:

Date: Fri, 26 Jan 2024 23:03:52 +0100
From: Philippe Mathieu-Daudé 
Subject: [PATCH v2 10/23] target/i386: Prefer fast cpu_env() over slower
  CPU QOM cast macro
X-Mailer: git-send-email 2.41.0

Mechanical patch produced running the command documented
in scripts/coccinelle/cpu_env.cocci_template header.

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/i386/hvf/vmx.h   | 13 +++---
  hw/i386/vmmouse.c   |  6 ++---
  hw/i386/xen/xen-hvm.c   |  3 +--
  target/i386/arch_memory_mapping.c   |  3 +--
  target/i386/cpu-dump.c  |  3 +--
  target/i386/cpu.c   | 37 +--
  target/i386/helper.c| 39 -
  target/i386/hvf/hvf.c   |  8 ++
  target/i386/hvf/x86.c   |  4 +--
  target/i386/hvf/x86_emu.c   |  6 ++---
  target/i386/hvf/x86_task.c  | 10 +++-
  target/i386/hvf/x86hvf.c|  6 ++---
  target/i386/kvm/kvm.c   |  6 ++---
  target/i386/kvm/xen-emu.c   | 32 ---
  target/i386/tcg/sysemu/bpt_helper.c |  3 +--
  target/i386/tcg/tcg-cpu.c   | 14 +++
  target/i386/tcg/user/excp_helper.c  |  3 +--
  target/i386/tcg/user/seg_helper.c   |  3 +--
  18 files changed, 59 insertions(+), 140 deletions(-)



[snip]


diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index 3b1ef5f49a..1e7fd587fe 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -238,8 +238,7 @@ void hvf_get_msrs(CPUState *cs)
  
  int hvf_put_registers(CPUState *cs)

  {
-X86CPU *x86cpu = X86_CPU(cs);
-CPUX86State *env = >env;
+CPUX86State *env = cpu_env(cs);
  
  wreg(cs->accel->fd, HV_X86_RAX, env->regs[R_EAX]);

  wreg(cs->accel->fd, HV_X86_RBX, env->regs[R_EBX]);
@@ -282,8 +281,7 @@ int hvf_put_registers(CPUState *cs)
  
  int hvf_get_registers(CPUState *cs)

  {
-X86CPU *x86cpu = X86_CPU(cs);
-CPUX86State *env = >env;
+CPUX86State *env = cpu_env(cs);
  
  env->regs[R_EAX] = rreg(cs->accel->fd, HV_X86_RAX);

  env->regs[R_EBX] = rreg(cs->accel->fd, HV_X86_RBX);


In this file, there's another corner case:

diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index 3b1ef5f49a8a..9a145aa5aa4f 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -342,8 +342,7 @@ void vmx_clear_int_window_exiting(CPUState *cs)

  bool hvf_inject_interrupts(CPUState *cs)
  {
-X86CPU *x86cpu = X86_CPU(cs);
-CPUX86State *env = >env;
+CPUX86State *env = cpu_env(cs);

  uint8_t vector;
  uint64_t intr_type;
@@ -408,7 +407,7 @@ bool hvf_inject_interrupts(CPUState *cs)
  if (!(env->hflags & HF_INHIBIT_IRQ_MASK) &&
  (cs->interrupt_request & CPU_INTERRUPT_HARD) &&
  (env->eflags & IF_MASK) && !(info & VMCS_INTR_VALID)) {
-int line = cpu_get_pic_interrupt(>env);
+int line = cpu_get_pic_interrupt(env);
  cs->interrupt_request &= ~CPU_INTERRUPT_HARD;
  if (line >= 0) {
  wvmcs(cs->accel->fd, VMCS_ENTRY_INTR_INFO, line |


For this special case, I'm not sure if the script can cover it as well,
otherwise maybe it's OK to be cleaned up manually ;-).


BTW I forgot to mention I had to skip target/i386/tcg/translate.c
(7100 LoC) because it is too complex for Coccinelle.

Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place

2024-01-28 Thread Avihai Horon




On 25/01/2024 22:57, Fabiano Rosas wrote:

External email: Use caution opening links or attachments


Avihai Horon  writes:


The commit in the fixes line moved multifd thread creation to a
different location, but forgot to move the p->running = true assignment
as well. Thus, p->running is set to true before multifd thread is
actually created.

p->running is used in multifd_save_cleanup() to decide whether to join
the multifd thread or not.

With TLS, an error in multifd_tls_channel_connect() can lead to a
segmentation fault because p->running is true but p->thread is never
initialized, so multifd_save_cleanup() tries to join an uninitialized
thread.

Fix it by moving p->running = true assignment right after multifd thread
creation. Also move qio_channel_set_delay() to there, as this is where
it used to be originally.

Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon 

Just for context, I haven't looked at this patch yet, but we were
planning to remove p->running altogether:

https://lore.kernel.org/r/20231110200241.20679-1-faro...@suse.de


Thanks for putting me in the picture.
I see that there has been a discussion about the multifd 
creation/treadown flow.
In light of this discussion, I can already see a few problems in my 
series that I didn't notice before (such as the TLS handshake thread leak).
The thread you mentioned here and some of my patches point out some 
problems in multifd creation/treardown. I guess we can discuss it and 
see what's the best way to solve them.


Regarding this patch, your solution indeed solves the bug that this 
patch addresses, so maybe this could be dropped (or only noted in your 
patch).


Maybe I should also put you (and Peter) in context for this whole series 
-- I am writing it as preparation for adding a separate migration 
channel for VFIO device migration, so VFIO devices could be migrated in 
parallel.

So this series tries to lay down some foundations to facilitate it.

Re: [PATCH v3 00/46] Rework matching of network devices to -nic options

2024-01-28 Thread David Woodhouse

On Fri, 2024-01-26 at 17:51 +0100, Thomas Huth wrote:
> On 25/01/2024 01.38, Jason Wang wrote:
> > On Wed, Jan 24, 2024 at 9:14 PM David Woodhouse  wrote:
> > > 
> > > Hi Jason,
> > > 
> > > I think this series probably lives or dies with you. I think it's a
> > > worthwhile cleanup, but I no longer have an immediate need for it; I
> > > shipped a slightly ugly workaround in QEMU 8.2.
> > > 
> > > Please could you let me know if it's worth persisting with it?
> > 
> > Yes it is.
> 
> Agreed! It would be great of getting rid of the ugly global nd_table[] 
> finally!
> 
> > Thomas, I remember you've done tweaks for -nic in the past. would you
> > like to review this series?
> 
> I tried to skim through the patches today, some nits here and there, but 
> nothing sever, so IMHO it should be fine once they are fixed.

Thank you for the reviews! I think I've covered everything in v4 which
I posted yesterday, except possible one Reviewed-by: tag went AWOL
which I'll round up for next time.


smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH v2 2/2] e1000e: fix link state on resume

2024-01-28 Thread Akihiko Odaki


On 2024/01/24 19:40, Laurent Vivier wrote:

On resume e1000e_vm_state_change() always calls e1000e_autoneg_resume()
that sets link_down to false, and thus activates the link even
if we have disabled it.

The problem can be reproduced starting qemu in paused state (-S) and
then set the link to down. When we resume the machine the link appears
to be up.

Reproducer:

# qemu-system-x86_64 ... -device e1000e,netdev=netdev0,id=net0 -S

{"execute": "qmp_capabilities" }
{"execute": "set_link", "arguments": {"name": "net0", "up": false}}
{"execute": "cont" }

To fix the problem, merge the content of e1000e_vm_state_change()
into e1000e_core_post_load() as e1000 does.

Buglink: https://issues.redhat.com/browse/RHEL-21867
Fixes: 6f3fbe4ed06a ("net: Introduce e1000e device emulation")
Suggested-by: Akihiko Odaki 
Signed-off-by: Laurent Vivier 
---

Notes:
 v2: Add Fixes: and a comment about e1000e_intrmgr_resume() purpose.


Thanks for taking care of this.

Reviewed-by: Akihiko Odaki

Re: [PATCH v2 1/2] igb: fix link state on resume

2024-01-28 Thread Akihiko Odaki


On 2024/01/24 19:29, Laurent Vivier wrote:

On resume igb_vm_state_change() always calls igb_autoneg_resume()
that sets link_down to false, and thus activates the link even
if we have disabled it.

The problem can be reproduced starting qemu in paused state (-S) and
then set the link to down. When we resume the machine the link appears
to be up.

Reproducer:

# qemu-system-x86_64 ... -device igb,netdev=netdev0,id=net0 -S

{"execute": "qmp_capabilities" }
{"execute": "set_link", "arguments": {"name": "net0", "up": false}}
{"execute": "cont" }

To fix the problem, merge the content of igb_vm_state_change()
into igb_core_post_load() as e1000 does.

Buglink: https://issues.redhat.com/browse/RHEL-21867
Fixes: 3a977deebe6b ("Intrdocue igb device emulation")
Cc: akihiko.od...@daynix.com
Suggested-by: Akihiko Odaki 
Signed-off-by: Laurent Vivier 
---

Notes:
 v2: Add Fixes: and a comment about igb_intrmgr_resume() purpose.


Reviewed-by: Akihiko Odaki

Re: [PATCH v2 21/23] target/tricore: Prefer fast cpu_env() over slower CPU QOM cast macro

2024-01-28 Thread Bastian Koppelmann

On Fri, Jan 26, 2024 at 11:04:03PM +0100, Philippe Mathieu-Daudé wrote:
> Mechanical patch produced running the command documented
> in scripts/coccinelle/cpu_env.cocci_template header.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/tricore/cpu.c   | 20 
>  target/tricore/gdbstub.c   |  6 ++
>  target/tricore/helper.c|  3 +--
>  target/tricore/translate.c |  3 +--
>  4 files changed, 8 insertions(+), 24 deletions(-)

Reviewed-by: Bastian Koppelmann 

Cheers,
Bastian

[PATCH v10 3/3] target/riscv: Validate misa_mxl_max only once

2024-01-28 Thread Akihiko Odaki

misa_mxl_max is now a class member and initialized only once for each
class. This also moves the initialization of gdb_core_xml_file which
will be referenced before realization in the future.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 21 +
 target/riscv/tcg/tcg-cpu.c | 23 ---
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 4b742901e76e..4425bee1275e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1292,6 +1292,26 @@ static const MISAExtInfo misa_ext_info_arr[] = {
 MISA_EXT_INFO(RVG, "g", "General purpose (IMAFD_Zicsr_Zifencei)"),
 };
 
+static void riscv_cpu_validate_misa_mxl(RISCVCPUClass *mcc)
+{
+CPUClass *cc = CPU_CLASS(mcc);
+
+/* Validate that MISA_MXL is set properly. */
+switch (mcc->misa_mxl_max) {
+#ifdef TARGET_RISCV64
+case MXL_RV64:
+case MXL_RV128:
+cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
+break;
+#endif
+case MXL_RV32:
+cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 static int riscv_validate_misa_info_idx(uint32_t bit)
 {
 int idx;
@@ -1833,6 +1853,7 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
*data)
 RISCVCPUClass *mcc = RISCV_CPU_CLASS(c);
 
 mcc->misa_mxl_max = (uint32_t)(uintptr_t)data;
+riscv_cpu_validate_misa_mxl(mcc);
 }
 
 static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str,
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 20062acd0f0b..df198ee3a312 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -268,27 +268,6 @@ static void riscv_cpu_validate_misa_priv(CPURISCVState 
*env, Error **errp)
 }
 }
 
-static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu)
-{
-RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
-CPUClass *cc = CPU_CLASS(mcc);
-
-/* Validate that MISA_MXL is set properly. */
-switch (mcc->misa_mxl_max) {
-#ifdef TARGET_RISCV64
-case MXL_RV64:
-case MXL_RV128:
-cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
-break;
-#endif
-case MXL_RV32:
-cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
-break;
-default:
-g_assert_not_reached();
-}
-}
-
 static void riscv_cpu_validate_priv_spec(RISCVCPU *cpu, Error **errp)
 {
 CPURISCVState *env = >env;
@@ -935,8 +914,6 @@ static bool tcg_cpu_realize(CPUState *cs, Error **errp)
 return false;
 }
 
-riscv_cpu_validate_misa_mxl(cpu);
-
 #ifndef CONFIG_USER_ONLY
 CPURISCVState *env = >env;
 Error *local_err = NULL;

-- 
2.43.0

[PATCH v10 2/3] target/riscv: Move misa_mxl_max to class

2024-01-28 Thread Akihiko Odaki

misa_mxl_max is common for all instances of a RISC-V CPU class so they
are better put into class.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.h |   4 +-
 hw/riscv/boot.c|   3 +-
 target/riscv/cpu.c | 162 -
 target/riscv/gdbstub.c |  12 ++--
 target/riscv/kvm/kvm-cpu.c |  10 +--
 target/riscv/machine.c |   7 +-
 target/riscv/tcg/tcg-cpu.c |  12 ++--
 target/riscv/translate.c   |   3 +-
 8 files changed, 114 insertions(+), 99 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5f3955c38db4..d269d53e59c6 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -185,7 +185,6 @@ struct CPUArchState {
 
 /* RISCVMXL, but uint32_t for vmstate migration */
 uint32_t misa_mxl;  /* current mxl */
-uint32_t misa_mxl_max;  /* max mxl for this cpu */
 uint32_t misa_ext;  /* current extensions */
 uint32_t misa_ext_mask; /* max ext for this cpu */
 uint32_t xl;/* current xlen */
@@ -466,6 +465,7 @@ struct RISCVCPUClass {
 
 DeviceRealize parent_realize;
 ResettablePhases parent_phases;
+uint32_t misa_mxl_max;  /* max mxl for this cpu */
 };
 
 static inline int riscv_has_ext(CPURISCVState *env, target_ulong ext)
@@ -771,7 +771,7 @@ enum riscv_pmu_event_idx {
 /* used by tcg/tcg-cpu.c*/
 void isa_ext_update_enabled(RISCVCPU *cpu, uint32_t ext_offset, bool en);
 bool isa_ext_is_enabled(RISCVCPU *cpu, uint32_t ext_offset);
-void riscv_cpu_set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext);
+void riscv_cpu_set_misa_ext(CPURISCVState *env, uint32_t ext);
 
 typedef struct RISCVCPUMultiExtConfig {
 const char *name;
diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 0ffca05189f0..12f9792245a4 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -36,7 +36,8 @@
 
 bool riscv_is_32bit(RISCVHartArrayState *harts)
 {
-return harts->harts[0].env.misa_mxl_max == MXL_RV32;
+RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(>harts[0]);
+return mcc->misa_mxl_max == MXL_RV32;
 }
 
 /*
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8cbfc7e781ad..4b742901e76e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -281,9 +281,8 @@ const char *riscv_cpu_get_trap_name(target_ulong cause, 
bool async)
 }
 }
 
-void riscv_cpu_set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
+void riscv_cpu_set_misa_ext(CPURISCVState *env, uint32_t ext)
 {
-env->misa_mxl_max = env->misa_mxl = mxl;
 env->misa_ext_mask = env->misa_ext = ext;
 }
 
@@ -396,11 +395,7 @@ static void riscv_any_cpu_init(Object *obj)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
-#if defined(TARGET_RISCV32)
-riscv_cpu_set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
-#elif defined(TARGET_RISCV64)
-riscv_cpu_set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
-#endif
+riscv_cpu_set_misa_ext(env, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
 
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj),
@@ -421,16 +416,14 @@ static void riscv_max_cpu_init(Object *obj)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
-RISCVMXL mlx = MXL_RV64;
 
-#ifdef TARGET_RISCV32
-mlx = MXL_RV32;
-#endif
-riscv_cpu_set_misa(env, mlx, 0);
 env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
-set_satp_mode_max_supported(RISCV_CPU(obj), mlx == MXL_RV32 ?
-VM_1_10_SV32 : VM_1_10_SV57);
+#ifdef TARGET_RISCV32
+set_satp_mode_max_supported(cpu, VM_1_10_SV32);
+#else
+set_satp_mode_max_supported(cpu, VM_1_10_SV57);
+#endif
 #endif
 }
 
@@ -438,8 +431,6 @@ static void riscv_max_cpu_init(Object *obj)
 static void rv64_base_cpu_init(Object *obj)
 {
 CPURISCVState *env = _CPU(obj)->env;
-/* We set this in the realise function */
-riscv_cpu_set_misa(env, MXL_RV64, 0);
 /* Set latest version of privileged specification */
 env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
@@ -451,8 +442,7 @@ static void rv64_sifive_u_cpu_init(Object *obj)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
-riscv_cpu_set_misa(env, MXL_RV64,
-   RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
+riscv_cpu_set_misa_ext(env, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
@@ -470,7 +460,7 @@ static void rv64_sifive_e_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 RISCVCPU *cpu = RISCV_CPU(obj);
 
-riscv_cpu_set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
+riscv_cpu_set_misa_ext(env, RVI | RVM | RVA | RVC | RVU);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -487,7 +477,7 @@ static void

[PATCH v10 1/3] target/riscv: Remove misa_mxl validation

2024-01-28 Thread Akihiko Odaki

It is initialized with a simple assignment and there is little room for
error. In fact, the validation is even more complex.

Signed-off-by: Akihiko Odaki 
Acked-by: LIU Zhiwei 
Reviewed-by: Daniel Henrique Barboza 
Acked-by: Alistair Francis 
---
 target/riscv/tcg/tcg-cpu.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 14133ff66568..b85b0d036a61 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -268,7 +268,7 @@ static void riscv_cpu_validate_misa_priv(CPURISCVState 
*env, Error **errp)
 }
 }
 
-static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, Error **errp)
+static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu)
 {
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
 CPUClass *cc = CPU_CLASS(mcc);
@@ -288,11 +288,6 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 default:
 g_assert_not_reached();
 }
-
-if (env->misa_mxl_max != env->misa_mxl) {
-error_setg(errp, "misa_mxl_max must be equal to misa_mxl");
-return;
-}
 }
 
 static void riscv_cpu_validate_priv_spec(RISCVCPU *cpu, Error **errp)
@@ -932,7 +927,6 @@ static bool riscv_cpu_is_vendor(Object *cpu_obj)
 static bool tcg_cpu_realize(CPUState *cs, Error **errp)
 {
 RISCVCPU *cpu = RISCV_CPU(cs);
-Error *local_err = NULL;
 
 if (!riscv_cpu_tcg_compatible(cpu)) {
 g_autofree char *name = riscv_cpu_get_name(cpu);
@@ -941,14 +935,11 @@ static bool tcg_cpu_realize(CPUState *cs, Error **errp)
 return false;
 }
 
-riscv_cpu_validate_misa_mxl(cpu, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return false;
-}
+riscv_cpu_validate_misa_mxl(cpu);
 
 #ifndef CONFIG_USER_ONLY
 CPURISCVState *env = >env;
+Error *local_err = NULL;
 
 CPU(cs)->tcg_cflags |= CF_PCREL;
 

-- 
2.43.0

1 2 >

1 - 100 of 101 matches

Mail list logo