Re: [PATCH v3 17/32] target/tricore: Use generic helper to show CPU model names

2023-09-06 Thread Bastian Koppelmann
On Thu, Sep 07, 2023 at 10:35:38AM +1000, Gavin Shan wrote:
> For target/tricore, the CPU type name is always the combination of the
> CPU model name and suffix. The CPU model names have been correctly
> shown in tricore_cpu_list_entry().
> 
> Use generic helper cpu_model_from_type() to show the CPU model names
> in the above function. tricore_cpu_class_by_name() is also improved
> by merging the condition of '@oc == NULL' to object_class_dynamic_cast().
> 
> Signed-off-by: Gavin Shan 
> ---
>  target/tricore/cpu.c|  9 +
>  target/tricore/helper.c | 13 +
>  2 files changed, 10 insertions(+), 12 deletions(-)

Reviewed-by: Bastian Koppelmann 

Cheers,
Bastian



Re: [PATCH v2 02/12] hw/arm/virt-acpi-build.c: Migrate virtio creation to common location

2023-09-06 Thread Alistair Francis
On Fri, Aug 25, 2023 at 12:31 AM Sunil V L  wrote:
>
> RISC-V also needs to create the virtio in DSDT in the same way as ARM. So,
> instead of duplicating the code, move this function to the device specific
> file which is common across architectures.
>
> Suggested-by: Igor Mammedov 
> Signed-off-by: Sunil V L 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/virt-acpi-build.c| 29 ++---
>  hw/virtio/meson.build   |  1 +
>  hw/virtio/virtio-acpi.c | 28 
>  include/hw/virtio/virtio-acpi.h | 11 +++
>  4 files changed, 42 insertions(+), 27 deletions(-)
>  create mode 100644 hw/virtio/virtio-acpi.c
>  create mode 100644 include/hw/virtio/virtio-acpi.h
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index b8e725d953..69733f6663 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -58,6 +58,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/acpi/ghes.h"
>  #include "hw/acpi/viot.h"
> +#include "hw/virtio/virtio-acpi.h"
>
>  #define ARM_SPI_BASE 32
>
> @@ -118,32 +119,6 @@ static void acpi_dsdt_add_flash(Aml *scope, const 
> MemMapEntry *flash_memmap)
>  aml_append(scope, dev);
>  }
>
> -static void acpi_dsdt_add_virtio(Aml *scope,
> - const MemMapEntry *virtio_mmio_memmap,
> - uint32_t mmio_irq, int num)
> -{
> -hwaddr base = virtio_mmio_memmap->base;
> -hwaddr size = virtio_mmio_memmap->size;
> -int i;
> -
> -for (i = 0; i < num; i++) {
> -uint32_t irq = mmio_irq + i;
> -Aml *dev = aml_device("VR%02u", i);
> -aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0005")));
> -aml_append(dev, aml_name_decl("_UID", aml_int(i)));
> -aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
> -
> -Aml *crs = aml_resource_template();
> -aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE));
> -aml_append(crs,
> -   aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
> - AML_EXCLUSIVE, , 1));
> -aml_append(dev, aml_name_decl("_CRS", crs));
> -aml_append(scope, dev);
> -base += size;
> -}
> -}
> -
>  static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>uint32_t irq, VirtMachineState *vms)
>  {
> @@ -850,7 +825,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  acpi_dsdt_add_flash(scope, [VIRT_FLASH]);
>  }
>  fw_cfg_acpi_dsdt_add(scope, [VIRT_FW_CFG]);
> -acpi_dsdt_add_virtio(scope, [VIRT_MMIO],
> +virtio_acpi_dsdt_add(scope, [VIRT_MMIO],
>  (irqmap[VIRT_MMIO] + ARM_SPI_BASE), 
> NUM_VIRTIO_TRANSPORTS);
>  acpi_dsdt_add_pci(scope, memmap, irqmap[VIRT_PCIE] + ARM_SPI_BASE, vms);
>  if (vms->acpi_dev) {
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 13e7c6c272..3ae1242bcf 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -75,3 +75,4 @@ system_ss.add(when: 'CONFIG_ALL', if_true: 
> files('virtio-stub.c'))
>  system_ss.add(files('virtio-hmp-cmds.c'))
>
>  specific_ss.add_all(when: 'CONFIG_VIRTIO', if_true: specific_virtio_ss)
> +system_ss.add(when: 'CONFIG_ACPI', if_true: files('virtio-acpi.c'))
> diff --git a/hw/virtio/virtio-acpi.c b/hw/virtio/virtio-acpi.c
> new file mode 100644
> index 00..977499defd
> --- /dev/null
> +++ b/hw/virtio/virtio-acpi.c
> @@ -0,0 +1,28 @@
> +#include "hw/virtio/virtio-acpi.h"
> +#include "hw/acpi/aml-build.h"
> +
> +void virtio_acpi_dsdt_add(Aml *scope,
> +  const MemMapEntry *virtio_mmio_memmap,
> +  uint32_t mmio_irq, int num)
> +{
> +hwaddr base = virtio_mmio_memmap->base;
> +hwaddr size = virtio_mmio_memmap->size;
> +int i;
> +
> +for (i = 0; i < num; i++) {
> +uint32_t irq = mmio_irq + i;
> +Aml *dev = aml_device("VR%02u", i);
> +aml_append(dev, aml_name_decl("_HID", aml_string("LNRO0005")));
> +aml_append(dev, aml_name_decl("_UID", aml_int(i)));
> +aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
> +
> +Aml *crs = aml_resource_template();
> +aml_append(crs, aml_memory32_fixed(base, size, AML_READ_WRITE));
> +aml_append(crs,
> +   aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
> + AML_EXCLUSIVE, , 1));
> +aml_append(dev, aml_name_decl("_CRS", crs));
> +aml_append(scope, dev);
> +base += size;
> +}
> +}
> diff --git a/include/hw/virtio/virtio-acpi.h b/include/hw/virtio/virtio-acpi.h
> new file mode 100644
> index 00..b8687b1b42
> --- /dev/null
> +++ b/include/hw/virtio/virtio-acpi.h
> @@ -0,0 +1,11 @@
> +#ifndef VIRTIO_ACPI_H
> +#define VIRTIO_ACPI_H
> +
> +#include "qemu/osdep.h"
> +#include "exec/hwaddr.h"
> +
> 

Re: [PATCH v2 01/12] hw/arm/virt-acpi-build.c: Migrate fw_cfg creation to common location

2023-09-06 Thread Alistair Francis
On Fri, Aug 25, 2023 at 12:30 AM Sunil V L  wrote:
>
> RISC-V also needs to use the same code to create fw_cfg in DSDT. So, avoid
> code duplication by moving the code in arm and riscv to a device specific
> file.
>
> Suggested-by: Igor Mammedov 
> Signed-off-by: Sunil V L 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/virt-acpi-build.c   | 19 ++-
>  hw/nvram/fw_cfg-acpi.c | 17 +
>  hw/nvram/meson.build   |  1 +
>  hw/riscv/virt-acpi-build.c | 19 ++-
>  include/hw/nvram/fw_cfg_acpi.h |  9 +
>  5 files changed, 31 insertions(+), 34 deletions(-)
>  create mode 100644 hw/nvram/fw_cfg-acpi.c
>  create mode 100644 include/hw/nvram/fw_cfg_acpi.h
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 6b674231c2..b8e725d953 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -35,7 +35,7 @@
>  #include "target/arm/cpu.h"
>  #include "hw/acpi/acpi-defs.h"
>  #include "hw/acpi/acpi.h"
> -#include "hw/nvram/fw_cfg.h"
> +#include "hw/nvram/fw_cfg_acpi.h"
>  #include "hw/acpi/bios-linker-loader.h"
>  #include "hw/acpi/aml-build.h"
>  #include "hw/acpi/utils.h"
> @@ -94,21 +94,6 @@ static void acpi_dsdt_add_uart(Aml *scope, const 
> MemMapEntry *uart_memmap,
>  aml_append(scope, dev);
>  }
>
> -static void acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry 
> *fw_cfg_memmap)
> -{
> -Aml *dev = aml_device("FWCF");
> -aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
> -/* device present, functioning, decoding, not shown in UI */
> -aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
> -aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
> -
> -Aml *crs = aml_resource_template();
> -aml_append(crs, aml_memory32_fixed(fw_cfg_memmap->base,
> -   fw_cfg_memmap->size, AML_READ_WRITE));
> -aml_append(dev, aml_name_decl("_CRS", crs));
> -aml_append(scope, dev);
> -}
> -
>  static void acpi_dsdt_add_flash(Aml *scope, const MemMapEntry *flash_memmap)
>  {
>  Aml *dev, *crs;
> @@ -864,7 +849,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  if (vmc->acpi_expose_flash) {
>  acpi_dsdt_add_flash(scope, [VIRT_FLASH]);
>  }
> -acpi_dsdt_add_fw_cfg(scope, [VIRT_FW_CFG]);
> +fw_cfg_acpi_dsdt_add(scope, [VIRT_FW_CFG]);
>  acpi_dsdt_add_virtio(scope, [VIRT_MMIO],
>  (irqmap[VIRT_MMIO] + ARM_SPI_BASE), 
> NUM_VIRTIO_TRANSPORTS);
>  acpi_dsdt_add_pci(scope, memmap, irqmap[VIRT_PCIE] + ARM_SPI_BASE, vms);
> diff --git a/hw/nvram/fw_cfg-acpi.c b/hw/nvram/fw_cfg-acpi.c
> new file mode 100644
> index 00..4eeb81bc36
> --- /dev/null
> +++ b/hw/nvram/fw_cfg-acpi.c
> @@ -0,0 +1,17 @@
> +#include "hw/nvram/fw_cfg_acpi.h"
> +#include "hw/acpi/aml-build.h"
> +
> +void fw_cfg_acpi_dsdt_add(Aml *scope, const MemMapEntry *fw_cfg_memmap)
> +{
> +Aml *dev = aml_device("FWCF");
> +aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
> +/* device present, functioning, decoding, not shown in UI */
> +aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
> +aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
> +
> +Aml *crs = aml_resource_template();
> +aml_append(crs, aml_memory32_fixed(fw_cfg_memmap->base,
> +   fw_cfg_memmap->size, AML_READ_WRITE));
> +aml_append(dev, aml_name_decl("_CRS", crs));
> +aml_append(scope, dev);
> +}
> diff --git a/hw/nvram/meson.build b/hw/nvram/meson.build
> index 988dff6f8e..69e6df5aac 100644
> --- a/hw/nvram/meson.build
> +++ b/hw/nvram/meson.build
> @@ -21,3 +21,4 @@ system_ss.add(when: 'CONFIG_XLNX_EFUSE_ZYNQMP', if_true: 
> files(
>  system_ss.add(when: 'CONFIG_XLNX_BBRAM', if_true: files('xlnx-bbram.c'))
>
>  specific_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr_nvram.c'))
> +specific_ss.add(when: 'CONFIG_ACPI', if_true: files('fw_cfg-acpi.c'))
> diff --git a/hw/riscv/virt-acpi-build.c b/hw/riscv/virt-acpi-build.c
> index 7331248f59..d8772c2821 100644
> --- a/hw/riscv/virt-acpi-build.c
> +++ b/hw/riscv/virt-acpi-build.c
> @@ -28,6 +28,7 @@
>  #include "hw/acpi/acpi.h"
>  #include "hw/acpi/aml-build.h"
>  #include "hw/acpi/utils.h"
> +#include "hw/nvram/fw_cfg_acpi.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "sysemu/reset.h"
> @@ -97,22 +98,6 @@ static void acpi_dsdt_add_cpus(Aml *scope, RISCVVirtState 
> *s)
>  }
>  }
>
> -static void acpi_dsdt_add_fw_cfg(Aml *scope, const MemMapEntry 
> *fw_cfg_memmap)
> -{
> -Aml *dev = aml_device("FWCF");
> -aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0002")));
> -
> -/* device present, functioning, decoding, not shown in UI */
> -aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
> -aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
> -
> -Aml *crs = aml_resource_template();
> -aml_append(crs, 

Re: [Virtio-fs] Status of DAX for virtio-fs/virtiofsd?

2023-09-06 Thread Hao Xu



On 9/6/23 21:57, Stefan Hajnoczi wrote:

On Wed, 6 Sept 2023 at 09:07, Hao Xu  wrote:

On 5/18/23 00:26, Stefan Hajnoczi wrote:

On Wed, 17 May 2023 at 11:54, Alex Bennée  wrote:
Hi Alex,
There were two unresolved issues:

1. How to inject SIGBUS when the guest accesses a page that's beyond
the end-of-file.

Hi Stefan,
Does this SIGBUS issue exist if the guest kernel can be trusted? Since in

that case, we can check the offset value in guest kernel.

The scenario is:
1. A guest userspace process has a DAX file mmapped.
2. The host or another guest that is also sharing the directory
truncates the file. The pages mmapped by our guest are no longer
valid.
3. The guest loads from an mmapped page and a vmexit occurs.
4. Now the host must inject a SIGBUS into the guest. There is
currently no way to do this.

I believe this scenario doesn't happen within a single guest, because
the guest kernel will raise SIGBUS itself without a vmexit if another
process inside that same guest truncates the file.

Another scenario is when the guest kernel access the DAX pages. A
vmexit can occur here too.

If you trust the host and all guests sharing the directory not to
truncate files that are mmapped, then this issue will not occur.

Stefan



I see, my use case should be fine since the directory is not shared and 
fs is read-only.


Thanks for detail explanation.


Regards,

Hao




Re: [PATCH v9 00/20] riscv: 'max' CPU, detect user choice in TCG

2023-09-06 Thread Alistair Francis
On Sat, Sep 2, 2023 at 5:48 AM Daniel Henrique Barboza
 wrote:
>
> Hi,
>
> This new version contains suggestions made by Andrew Jones in v8.
>
> Most notable change is the removal of the opensbi.py test in patch 11,
> which was replaced by a TuxBoot test. It's more suitable to test the
> integrity of all the extensions enabled by the 'max' CPU.
>
> The series is available in this branch:
>
> https://gitlab.com/danielhb/qemu/-/tree/max_cpu_user_choice_v9
>
> Patches missing acks: 11, 15
>
> Changes from v8:
> - patch 7:
>   - add g_assert(array) at the start of riscv_cpu_add_qdev_prop_array()
> - patch 8:
>   - add g_assert(array) at the start of riscv_cpu_add_kvm_unavail_prop_array()
> - patch 11:
>   - removed both opensbi.py tests
>   - added 2 'max' cpu tuxboot tests in tuxrun_baselines.py
> - patch 12:
>   - fixed typos in deprecated.rst
> - patch 15:
>   - use g_assert_not_reached() at the end of cpu_cfg_ext_get_min_version()
> - patch 19:
>   - added comment on top of riscv_cpu_add_misa_properties() explaining why
> we're not implementing user choice support for MISA properties
> - patch 20:
>   - warn_report() is now called after the G error conditions
> - v8 link: 
> https://lore.kernel.org/qemu-riscv/20230824221440.484675-1-dbarb...@ventanamicro.com/
>
>
>
> Daniel Henrique Barboza (20):
>   target/riscv/cpu.c: split CPU options from riscv_cpu_extensions[]
>   target/riscv/cpu.c: skip 'bool' check when filtering KVM props
>   target/riscv/cpu.c: split kvm prop handling to its own helper
>   target/riscv: add DEFINE_PROP_END_OF_LIST() to riscv_cpu_options[]
>   target/riscv/cpu.c: split non-ratified exts from
> riscv_cpu_extensions[]
>   target/riscv/cpu.c: split vendor exts from riscv_cpu_extensions[]
>   target/riscv/cpu.c: add riscv_cpu_add_qdev_prop_array()
>   target/riscv/cpu.c: add riscv_cpu_add_kvm_unavail_prop_array()
>   target/riscv/cpu.c: limit cfg->vext_spec log message
>   target/riscv: add 'max' CPU type
>   avocado, risc-v: add tuxboot tests for 'max' CPU
>   target/riscv: deprecate the 'any' CPU type
>   target/riscv/cpu.c: use offset in isa_ext_is_enabled/update_enabled
>   target/riscv: make CPUCFG() macro public
>   target/riscv/cpu.c: introduce cpu_cfg_ext_auto_update()
>   target/riscv/cpu.c: use cpu_cfg_ext_auto_update() during realize()
>   target/riscv/cpu.c: introduce RISCVCPUMultiExtConfig
>   target/riscv: use isa_ext_update_enabled() in
> init_max_cpu_extensions()
>   target/riscv/cpu.c: honor user choice in cpu_cfg_ext_auto_update()
>   target/riscv/cpu.c: consider user option with RVG

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  docs/about/deprecated.rst |  12 +
>  target/riscv/cpu-qom.h|   1 +
>  target/riscv/cpu.c| 564 +-
>  target/riscv/cpu.h|   2 +
>  target/riscv/kvm.c|   8 +-
>  tests/avocado/tuxrun_baselines.py |  32 ++
>  6 files changed, 450 insertions(+), 169 deletions(-)
>
> --
> 2.41.0
>
>



Re: [PATCH v9 15/20] target/riscv/cpu.c: introduce cpu_cfg_ext_auto_update()

2023-09-06 Thread Alistair Francis
On Sat, Sep 2, 2023 at 5:49 AM Daniel Henrique Barboza
 wrote:
>
> During realize() time we're activating a lot of extensions based on some
> criteria, e.g.:
>
> if (cpu->cfg.ext_zk) {
> cpu->cfg.ext_zkn = true;
> cpu->cfg.ext_zkr = true;
> cpu->cfg.ext_zkt = true;
> }
>
> This practice resulted in at least one case where we ended up enabling
> something we shouldn't: RVC enabling zca/zcd/zcf when using a CPU that
> has priv_spec older than 1.12.0.
>
> We're also not considering user choice. There's no way of doing it now
> but this is about to change in the next few patches.
>
> cpu_cfg_ext_auto_update() will check for priv version mismatches before
> enabling extensions. If we have a mismatch between the current priv
> version and the extension we want to enable, do not enable it. In the
> near future, this same function will also consider user choice when
> deciding if we're going to enable/disable an extension or not.
>
> For now let's use it to handle zca/zcd/zcf enablement if RVC is enabled.
>
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 43 ---
>  1 file changed, 40 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 43c68e1792..a4876df5f4 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -177,6 +177,43 @@ static void isa_ext_update_enabled(RISCVCPU *cpu, 
> uint32_t ext_offset,
>  *ext_enabled = en;
>  }
>
> +static int cpu_cfg_ext_get_min_version(uint32_t ext_offset)
> +{
> +int i;
> +
> +for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
> +if (isa_edata_arr[i].ext_enable_offset != ext_offset) {
> +continue;
> +}
> +
> +return isa_edata_arr[i].min_version;
> +}
> +
> +g_assert_not_reached();
> +}
> +
> +static void cpu_cfg_ext_auto_update(RISCVCPU *cpu, uint32_t ext_offset,
> +bool value)
> +{
> +CPURISCVState *env = >env;
> +bool prev_val = isa_ext_is_enabled(cpu, ext_offset);
> +int min_version;
> +
> +if (prev_val == value) {
> +return;
> +}
> +
> +if (value && env->priv_ver != PRIV_VERSION_LATEST) {
> +/* Do not enable it if priv_ver is older than min_version */
> +min_version = cpu_cfg_ext_get_min_version(ext_offset);
> +if (env->priv_ver < min_version) {
> +return;
> +}
> +}
> +
> +isa_ext_update_enabled(cpu, ext_offset, value);
> +}
> +
>  const char * const riscv_int_regnames[] = {
>  "x0/zero", "x1/ra",  "x2/sp",  "x3/gp",  "x4/tp",  "x5/t0",   "x6/t1",
>  "x7/t2",   "x8/s0",  "x9/s1",  "x10/a0", "x11/a1", "x12/a2",  "x13/a3",
> @@ -1268,12 +1305,12 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
> Error **errp)
>
>  /* zca, zcd and zcf has a PRIV 1.12.0 restriction */
>  if (riscv_has_ext(env, RVC) && env->priv_ver >= PRIV_VERSION_1_12_0) {
> -cpu->cfg.ext_zca = true;
> +cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zca), true);
>  if (riscv_has_ext(env, RVF) && env->misa_mxl_max == MXL_RV32) {
> -cpu->cfg.ext_zcf = true;
> +cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zcf), true);
>  }
>  if (riscv_has_ext(env, RVD)) {
> -cpu->cfg.ext_zcd = true;
> +cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zcd), true);
>  }
>  }
>
> --
> 2.41.0
>
>



Re: [PATCH RESEND 15/15] ppc: spapr: Document Nested PAPR API

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Adding initial documentation about Nested PAPR API to describe the set
> of APIs and its usage. Also talks about the Guest State Buffer elements
> and it's format which is used between L0/L1 to communicate L2 state.

I would move this patch first (well, behind any cleanup and preparation
patches, but before any new API additions).

Thanks,
Nick

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  docs/devel/nested-papr.txt | 500 +
>  1 file changed, 500 insertions(+)
>  create mode 100644 docs/devel/nested-papr.txt
>
> diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt
> new file mode 100644
> index 00..c5c2ba7e50
> --- /dev/null
> +++ b/docs/devel/nested-papr.txt
> @@ -0,0 +1,500 @@
> +Nested PAPR API (aka KVM on PowerVM)
> +
> +
> +This API aims at providing support to enable nested virtualization with
> +KVM on PowerVM. While the existing support for nested KVM on PowerNV was
> +introduced with cap-nested-hv option, however, with a slight design change,
> +to enable this on papr/pseries, a new cap-nested-papr option is added. eg:
> +
> +  qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
> +
> +Work by:
> +Michael Neuling 
> +Vaibhav Jain 
> +Jordan Niethe 
> +Harsh Prateek Bora 
> +Shivaprasad G Bhat 
> +Kautuk Consul 
> +
> +Below taken from the kernel documentation:
> +
> +Introduction
> +
> +
> +This document explains how a guest operating system can act as a
> +hypervisor and run nested guests through the use of hypercalls, if the
> +hypervisor has implemented them. The terms L0, L1, and L2 are used to
> +refer to different software entities. L0 is the hypervisor mode entity
> +that would normally be called the "host" or "hypervisor". L1 is a
> +guest virtual machine that is directly run under L0 and is initiated
> +and controlled by L0. L2 is a guest virtual machine that is initiated
> +and controlled by L1 acting as a hypervisor. A significant design change
> +wrt existing API is that now the entire L2 state is maintained within L0.
> +
> +Existing Nested-HV API
> +==
> +
> +Linux/KVM has had support for Nesting as an L0 or L1 since 2018
> +
> +The L0 code was added::
> +
> +   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
> +   Author: Paul Mackerras 
> +   Date:   Mon Oct 8 16:31:03 2018 +1100
> +   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
> +
> +The L1 code was added::
> +
> +   commit 360cae313702cdd0b90f82c261a8302fecef030a
> +   Author: Paul Mackerras 
> +   Date:   Mon Oct 8 16:31:04 2018 +1100
> +   KVM: PPC: Book3S HV: Nested guest entry via hypercall
> +
> +This API works primarily using a signal hcall h_enter_nested(). This
> +call made by the L1 to tell the L0 to start an L2 vCPU with the given
> +state. The L0 then starts this L2 and runs until an L2 exit condition
> +is reached. Once the L2 exits, the state of the L2 is given back to
> +the L1 by the L0. The full L2 vCPU state is always transferred from
> +and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
> +vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
> +-> L1 exit).
> +
> +The only state kept by the L0 is the partition table. The L1 registers
> +it's partition table using the h_set_partition_table() hcall. All
> +other state held by the L0 about the L2s is cached state (such as
> +shadow page tables).
> +
> +The L1 may run any L2 or vCPU without first informing the L0. It
> +simply starts the vCPU using h_enter_nested(). The creation of L2s and
> +vCPUs is done implicitly whenever h_enter_nested() is called.
> +
> +In this document, we call this existing API the v1 API.
> +
> +New PAPR API
> +===
> +
> +The new PAPR API changes from the v1 API such that the creating L2 and
> +associated vCPUs is explicit. In this document, we call this the v2
> +API.
> +
> +h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
> +be called the L1 must explicitly create the L2 using h_guest_create()
> +and any associated vCPUs() created with h_guest_create_vCPU(). Getting
> +and setting vCPU state can also be performed using h_guest_{g|s}et
> +hcall.
> +
> +The basic execution flow is for an L1 to create an L2, run it, and
> +delete it is:
> +
> +- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
> +  (normally at L1 boot time).
> +
> +- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a 
> token
> +
> +- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
> +
> +- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
> +
> +- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
> +
> +- L1 deletes L2 with H_GUEST_DELETE()
> +
> +More details of the individual hcalls follows:
> +
> +HCALL 

Re: [PATCH RESEND 13/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_RUN_VCPU

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Once the L1 has created a nested guest and its associated VCPU, it can
> request for the execution of nested guest by setting its initial state
> which can be done either using the h_guest_set_state or using the input
> buffers along with the call to h_guest_run_vcpu(). On guest exit, L0
> uses output buffers to convey the exit cause to the L1. L0 takes care of
> switching context from L1 to L2 during guest entry and restores L1 context
> on guest exit.
>
> Unlike nested-hv, L2 (nested) guest's entire state is retained with
> L0 after guest exit and restored on next entry in case of nested-papr.
>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Kautuk Consul 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_nested.c   | 471 +++-
>  include/hw/ppc/spapr_cpu_core.h |   7 +-
>  include/hw/ppc/spapr_nested.h   |   6 +
>  3 files changed, 408 insertions(+), 76 deletions(-)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 67e389a762..3605f27115 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -12,6 +12,17 @@
>  #ifdef CONFIG_TCG
>  #define PRTS_MASK  0x1f
>  
> +static void exit_nested_restore_vcpu(PowerPCCPU *cpu, int excp,
> + SpaprMachineStateNestedGuestVcpu *vcpu);
> +static void exit_process_output_buffer(PowerPCCPU *cpu,
> +  SpaprMachineStateNestedGuest *guest,
> +  target_ulong vcpuid,
> +  target_ulong *r3);
> +static void restore_common_regs(CPUPPCState *dst, CPUPPCState *src);
> +static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
> +   target_ulong vcpuid,
> +   bool inoutbuf);
> +
>  static target_ulong h_set_ptbl(PowerPCCPU *cpu,
> SpaprMachineState *spapr,
> target_ulong opcode,
> @@ -187,21 +198,21 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>  return H_PARAMETER;
>  }
>  
> -spapr_cpu->nested_host_state = g_try_new(struct nested_ppc_state, 1);
> -if (!spapr_cpu->nested_host_state) {
> +spapr_cpu->nested_hv_host = g_try_new(struct nested_ppc_state, 1);
> +if (!spapr_cpu->nested_hv_host) {
>  return H_NO_MEM;
>  }

Don't rename existing thing in the same patch as adding new thing.

>  
>  assert(env->spr[SPR_LPIDR] == 0);
>  assert(env->spr[SPR_DPDES] == 0);
> -nested_save_state(spapr_cpu->nested_host_state, cpu);
> +nested_save_state(spapr_cpu->nested_hv_host, cpu);
>  
>  len = sizeof(*regs);
>  regs = address_space_map(CPU(cpu)->as, regs_ptr, , false,
>  MEMTXATTRS_UNSPECIFIED);
>  if (!regs || len != sizeof(*regs)) {
>  address_space_unmap(CPU(cpu)->as, regs, len, 0, false);
> -g_free(spapr_cpu->nested_host_state);
> +g_free(spapr_cpu->nested_hv_host);
>  return H_P2;
>  }
>  
> @@ -276,105 +287,146 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>  
>  void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  {
> +SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +CPUState *cs = CPU(cpu);

I think it would be worth seeing how it looks to split these into
original and papr functions rather than try mash them together.

>  CPUPPCState *env = >env;
>  SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
> +target_ulong r3_return = env->excp_vectors[excp]; /* hcall return value 
> */
>  struct nested_ppc_state l2_state;
> -target_ulong hv_ptr = spapr_cpu->nested_host_state->gpr[4];
> -target_ulong regs_ptr = spapr_cpu->nested_host_state->gpr[5];
> -target_ulong hsrr0, hsrr1, hdar, asdr, hdsisr;
> +target_ulong hv_ptr, regs_ptr;
> +target_ulong hsrr0 = 0, hsrr1 = 0, hdar = 0, asdr = 0, hdsisr = 0;
>  struct kvmppc_hv_guest_state *hvstate;
>  struct kvmppc_pt_regs *regs;
>  hwaddr len;
> +target_ulong lpid = 0, vcpuid = 0;
> +struct SpaprMachineStateNestedGuestVcpu *vcpu = NULL;
> +struct SpaprMachineStateNestedGuest *guest = NULL;
>  
>  assert(spapr_cpu->in_nested);
> -
> -nested_save_state(_state, cpu);
> -hsrr0 = env->spr[SPR_HSRR0];
> -hsrr1 = env->spr[SPR_HSRR1];
> -hdar = env->spr[SPR_HDAR];
> -hdsisr = env->spr[SPR_HDSISR];
> -asdr = env->spr[SPR_ASDR];
> +if (spapr->nested.api == NESTED_API_KVM_HV) {
> +nested_save_state(_state, cpu);
> +hsrr0 = env->spr[SPR_HSRR0];
> +hsrr1 = env->spr[SPR_HSRR1];
> +hdar = env->spr[SPR_HDAR];
> +hdsisr = env->spr[SPR_HDSISR];
> +asdr = env->spr[SPR_ASDR];
> +} else if (spapr->nested.api == NESTED_API_PAPR) {
> +lpid = spapr_cpu->nested_papr_host->gpr[5];
> +vcpuid = spapr_cpu->nested_papr_host->gpr[6];
> +guest = 

Re: [PATCH v9 11/20] avocado, risc-v: add tuxboot tests for 'max' CPU

2023-09-06 Thread Alistair Francis
On Sat, Sep 2, 2023 at 5:51 AM Daniel Henrique Barboza
 wrote:
>
> Add smoke tests to ensure that we'll not break the 'max' CPU type when
> adding new frozen/ratified RISC-V extensions.
>
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  tests/avocado/tuxrun_baselines.py | 32 +++
>  1 file changed, 32 insertions(+)
>
> diff --git a/tests/avocado/tuxrun_baselines.py 
> b/tests/avocado/tuxrun_baselines.py
> index e12250eabb..c99bea6c0b 100644
> --- a/tests/avocado/tuxrun_baselines.py
> +++ b/tests/avocado/tuxrun_baselines.py
> @@ -501,6 +501,38 @@ def test_riscv64(self):
>
>  self.common_tuxrun(csums=sums)
>
> +def test_riscv32_maxcpu(self):
> +"""
> +:avocado: tags=arch:riscv32
> +:avocado: tags=machine:virt
> +:avocado: tags=cpu:max
> +:avocado: tags=tuxboot:riscv32
> +"""
> +sums = { "Image" :
> + 
> "89599407d7334de629a40e7ad6503c73670359eb5f5ae9d686353a3d6deccbd5",
> + "fw_jump.elf" :
> + 
> "f2ef28a0b77826f79d085d3e4aa686f1159b315eff9099a37046b18936676985",
> + "rootfs.ext4.zst" :
> + 
> "7168d296d0283238ea73cd5a775b3dd608e55e04c7b92b76ecce31bb13108cba" }
> +
> +self.common_tuxrun(csums=sums)
> +
> +def test_riscv64_maxcpu(self):
> +"""
> +:avocado: tags=arch:riscv64
> +:avocado: tags=machine:virt
> +:avocado: tags=cpu:max
> +:avocado: tags=tuxboot:riscv64
> +"""
> +sums = { "Image" :
> + 
> "cd634badc65e52fb63465ec99e309c0de0369f0841b7d9486f9729e119bac25e",
> + "fw_jump.elf" :
> + 
> "6e3373abcab4305fe151b564a4c71110d833c21f2c0a1753b7935459e36aedcf",
> + "rootfs.ext4.zst" :
> + 
> "b18e3a3bdf27be03da0b285e84cb71bf09eca071c3a087b42884b6982ed679eb" }
> +
> +self.common_tuxrun(csums=sums)
> +
>  def test_s390(self):
>  """
>  :avocado: tags=arch:s390x
> --
> 2.41.0
>
>



Re: [PULL for-6.2 0/7] Ide patches

2023-09-06 Thread John Snow
I guess the last time I sent IDE patches was for 6.2 and that tag got
stuck in my git-publish invocation, oops. I am not suggesting we break
the laws of causality to merge these patches.

On Wed, Sep 6, 2023 at 11:42 PM John Snow  wrote:
>
> The following changes since commit c152379422a204109f34ca2b43ecc538c7d738ae:
>
>   Merge tag 'ui-pull-request' of https://gitlab.com/marcandre.lureau/qemu 
> into staging (2023-09-06 11:16:01 -0400)
>
> are available in the Git repository at:
>
>   https://gitlab.com/jsnow/qemu.git tags/ide-pull-request
>
> for you to fetch changes up to 9f89423537653de07ca40c18b5ff5b70b104cc93:
>
>   hw/ide/ahci: fix broken SError handling (2023-09-06 22:48:04 -0400)
>
> 
> IDE Pull request
>
> 
>
> Niklas Cassel (7):
>   hw/ide/core: set ERR_STAT in unsupported command completion
>   hw/ide/ahci: write D2H FIS when processing NCQ command
>   hw/ide/ahci: simplify and document PxCI handling
>   hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
>   hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
>   hw/ide/ahci: fix ahci_write_fis_sdb()
>   hw/ide/ahci: fix broken SError handling
>
>  tests/qtest/libqos/ahci.h |   8 ++-
>  hw/ide/ahci.c | 110 +++---
>  hw/ide/core.c |   2 +-
>  tests/qtest/libqos/ahci.c | 106 +++-
>  4 files changed, 163 insertions(+), 63 deletions(-)
>
> --
> 2.41.0
>
>




[PULL for-6.2 4/7] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared

2023-09-06 Thread John Snow
From: Niklas Cassel 

According to AHCI 1.3.1 definition of PxSACT:
This field is cleared when PxCMD.ST is written from a '1' to a '0' by
software. This field is not cleared by a COMRESET or a software reset.

According to AHCI 1.3.1 definition of PxCI:
This field is also cleared when PxCMD.ST is written from a '1' to a '0'
by software.

Clearing PxCMD.ST is part of the error recovery procedure, see
AHCI 1.3.1, section "6.2 Error Recovery".

If we don't clear PxCI on error recovery, the previous command will
incorrectly still be marked as pending after error recovery.

Signed-off-by: Niklas Cassel 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20230609140844.202795-6-...@flawful.org
Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 3deaf01add..a31e6fa65e 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -329,6 +329,11 @@ static void ahci_port_write(AHCIState *s, int port, int 
offset, uint32_t val)
 ahci_check_irq(s);
 break;
 case AHCI_PORT_REG_CMD:
+if ((pr->cmd & PORT_CMD_START) && !(val & PORT_CMD_START)) {
+pr->scr_act = 0;
+pr->cmd_issue = 0;
+}
+
 /* Block any Read-only fields from being set;
  * including LIST_ON and FIS_ON.
  * The spec requires to set ICC bits to zero after the ICC change
-- 
2.41.0




[PULL for-6.2 6/7] hw/ide/ahci: fix ahci_write_fis_sdb()

2023-09-06 Thread John Snow
From: Niklas Cassel 

When there is an error, we need to raise a TFES error irq, see AHCI 1.3.1,
5.3.13.1 SDB:Entry.

If ERR_STAT is set, we jump to state ERR:FatalTaskfile, which will raise
a TFES IRQ unconditionally, regardless if the I bit is set in the FIS or
not.

Thus, we should never raise a normal IRQ after having sent an error IRQ.

It is valid to signal successfully completed commands as finished in the
same SDB FIS that generates the error IRQ. The important thing is that
commands that did not complete successfully (e.g. commands that were
aborted, do not get the finished bit set).

Before this commit, there was never a TFES IRQ raised on NCQ error.

Signed-off-by: Niklas Cassel 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20230609140844.202795-8-...@flawful.org
Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 12aaadc554..ef6c9fc378 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -806,8 +806,14 @@ static void ahci_write_fis_sdb(AHCIState *s, 
NCQTransferState *ncq_tfs)
 pr->scr_act &= ~ad->finished;
 ad->finished = 0;
 
-/* Trigger IRQ if interrupt bit is set (which currently, it always is) */
-if (sdb_fis->flags & 0x40) {
+/*
+ * TFES IRQ is always raised if ERR_STAT is set, regardless of I bit.
+ * If ERR_STAT is not set, trigger SDBS IRQ if interrupt bit is set
+ * (which currently, it always is).
+ */
+if (sdb_fis->status & ERR_STAT) {
+ahci_trigger_irq(s, ad, AHCI_PORT_IRQ_BIT_TFES);
+} else if (sdb_fis->flags & 0x40) {
 ahci_trigger_irq(s, ad, AHCI_PORT_IRQ_BIT_SDBS);
 }
 }
-- 
2.41.0




[PULL for-6.2 7/7] hw/ide/ahci: fix broken SError handling

2023-09-06 Thread John Snow
From: Niklas Cassel 

When encountering an NCQ error, you should not write the NCQ tag to the
SError register. This is completely wrong.

The SError register has a clear definition, where each bit represents a
different error, see PxSERR definition in AHCI 1.3.1.

If we write a random value (like the NCQ tag) in SError, e.g. Linux will
read SError, and will trigger arbitrary error handling depending on the
NCQ tag that happened to be executing.

In case of success, ncq_cb() will call ncq_finish().
In case of error, ncq_cb() will call ncq_err() (which will clear
ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
sufficient to tell if finished should get set or not.

Signed-off-by: Niklas Cassel 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20230609140844.202795-9-...@flawful.org
Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index ef6c9fc378..d0a774bc17 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1012,7 +1012,6 @@ static void ncq_err(NCQTransferState *ncq_tfs)
 
 ide_state->error = ABRT_ERR;
 ide_state->status = READY_STAT | ERR_STAT;
-ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
 qemu_sglist_destroy(_tfs->sglist);
 ncq_tfs->used = 0;
 }
@@ -1022,7 +1021,7 @@ static void ncq_finish(NCQTransferState *ncq_tfs)
 /* If we didn't error out, set our finished bit. Errored commands
  * do not get a bit set for the SDB FIS ACT register, nor do they
  * clear the outstanding bit in scr_act (PxSACT). */
-if (!(ncq_tfs->drive->port_regs.scr_err & (1 << ncq_tfs->tag))) {
+if (ncq_tfs->used) {
 ncq_tfs->drive->finished |= (1 << ncq_tfs->tag);
 }
 
-- 
2.41.0




[PULL for-6.2 0/7] Ide patches

2023-09-06 Thread John Snow
The following changes since commit c152379422a204109f34ca2b43ecc538c7d738ae:

  Merge tag 'ui-pull-request' of https://gitlab.com/marcandre.lureau/qemu into 
staging (2023-09-06 11:16:01 -0400)

are available in the Git repository at:

  https://gitlab.com/jsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to 9f89423537653de07ca40c18b5ff5b70b104cc93:

  hw/ide/ahci: fix broken SError handling (2023-09-06 22:48:04 -0400)


IDE Pull request



Niklas Cassel (7):
  hw/ide/core: set ERR_STAT in unsupported command completion
  hw/ide/ahci: write D2H FIS when processing NCQ command
  hw/ide/ahci: simplify and document PxCI handling
  hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
  hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
  hw/ide/ahci: fix ahci_write_fis_sdb()
  hw/ide/ahci: fix broken SError handling

 tests/qtest/libqos/ahci.h |   8 ++-
 hw/ide/ahci.c | 110 +++---
 hw/ide/core.c |   2 +-
 tests/qtest/libqos/ahci.c | 106 +++-
 4 files changed, 163 insertions(+), 63 deletions(-)

-- 
2.41.0





[PULL for-6.2 5/7] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set

2023-09-06 Thread John Snow
From: Niklas Cassel 

For NCQ, PxCI is cleared on command queued successfully.
For non-NCQ, PxCI is cleared on command completed successfully.
Successfully means ERR_STAT, BUSY and DRQ are all cleared.

A command that has ERR_STAT set, does not get to clear PxCI.
See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
and 5.3.16.5 ERR:FatalTaskfile.

In the case of non-NCQ commands, not clearing PxCI is needed in order
for host software to be able to see which command slot that failed.

Signed-off-by: Niklas Cassel 
Message-id: 20230609140844.202795-7-...@flawful.org
Signed-off-by: John Snow 
---
 tests/qtest/libqos/ahci.h |   8 ++-
 hw/ide/ahci.c |   7 ++-
 tests/qtest/libqos/ahci.c | 106 --
 3 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/tests/qtest/libqos/ahci.h b/tests/qtest/libqos/ahci.h
index 88835b6228..48017864bf 100644
--- a/tests/qtest/libqos/ahci.h
+++ b/tests/qtest/libqos/ahci.h
@@ -590,11 +590,9 @@ void ahci_set_command_header(AHCIQState *ahci, uint8_t 
port,
 void ahci_destroy_command(AHCIQState *ahci, uint8_t port, uint8_t slot);
 
 /* AHCI sanity check routines */
-void ahci_port_check_error(AHCIQState *ahci, uint8_t port,
-   uint32_t imask, uint8_t emask);
-void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t port,
-uint32_t intr_mask);
-void ahci_port_check_nonbusy(AHCIQState *ahci, uint8_t port, uint8_t slot);
+void ahci_port_check_error(AHCIQState *ahci, AHCICommand *cmd);
+void ahci_port_check_interrupts(AHCIQState *ahci, AHCICommand *cmd);
+void ahci_port_check_nonbusy(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_port_check_d2h_sanity(AHCIQState *ahci, uint8_t port, uint8_t slot);
 void ahci_port_check_pio_sanity(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_port_check_cmd_sanity(AHCIQState *ahci, AHCICommand *cmd);
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index a31e6fa65e..12aaadc554 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1523,7 +1523,8 @@ static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t 
slot)
 {
 IDEState *ide_state = >port.ifs[0];
 
-if (!(ide_state->status & (BUSY_STAT | DRQ_STAT))) {
+if (!(ide_state->status & ERR_STAT) &&
+!(ide_state->status & (BUSY_STAT | DRQ_STAT))) {
 ad->port_regs.cmd_issue &= ~(1 << slot);
 }
 }
@@ -1532,6 +1533,7 @@ static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t 
slot)
 static void ahci_cmd_done(const IDEDMA *dma)
 {
 AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+IDEState *ide_state = >port.ifs[0];
 
 trace_ahci_cmd_done(ad->hba, ad->port_no);
 
@@ -1548,7 +1550,8 @@ static void ahci_cmd_done(const IDEDMA *dma)
  */
 ahci_write_fis_d2h(ad, true);
 
-if (ad->port_regs.cmd_issue && !ad->check_bh) {
+if (!(ide_state->status & ERR_STAT) &&
+ad->port_regs.cmd_issue && !ad->check_bh) {
 ad->check_bh = qemu_bh_new_guarded(ahci_check_cmd_bh, ad,
>mem_reentrancy_guard);
 qemu_bh_schedule(ad->check_bh);
diff --git a/tests/qtest/libqos/ahci.c b/tests/qtest/libqos/ahci.c
index f53f12aa99..a2c94c6e06 100644
--- a/tests/qtest/libqos/ahci.c
+++ b/tests/qtest/libqos/ahci.c
@@ -404,57 +404,110 @@ void ahci_port_clear(AHCIQState *ahci, uint8_t port)
 /**
  * Check a port for errors.
  */
-void ahci_port_check_error(AHCIQState *ahci, uint8_t port,
-   uint32_t imask, uint8_t emask)
+void ahci_port_check_error(AHCIQState *ahci, AHCICommand *cmd)
 {
+uint8_t port = cmd->port;
 uint32_t reg;
 
-/* The upper 9 bits of the IS register all indicate errors. */
-reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
-reg &= ~imask;
-reg >>= 23;
-g_assert_cmphex(reg, ==, 0);
+/* If expecting TF error, ensure that TFES is set. */
+if (cmd->errors) {
+reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
+ASSERT_BIT_SET(reg, AHCI_PX_IS_TFES);
+} else {
+/* The upper 9 bits of the IS register all indicate errors. */
+reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
+reg &= ~cmd->interrupts;
+reg >>= 23;
+g_assert_cmphex(reg, ==, 0);
+}
 
-/* The Sata Error Register should be empty. */
+/* The Sata Error Register should be empty, even when expecting TF error. 
*/
 reg = ahci_px_rreg(ahci, port, AHCI_PX_SERR);
 g_assert_cmphex(reg, ==, 0);
 
+/* If expecting TF error, and TFES was set, perform error recovery
+ * (see AHCI 1.3 section 6.2.2.1) such that we can send new commands. */
+if (cmd->errors) {
+/* This will clear PxCI. */
+ahci_px_clr(ahci, port, AHCI_PX_CMD, AHCI_PX_CMD_ST);
+
+/* The port has 500ms to disengage. */
+usleep(50);
+reg = ahci_px_rreg(ahci, port, AHCI_PX_CMD);
+ASSERT_BIT_CLEAR(reg, AHCI_PX_CMD_CR);
+
+/* Clear PxIS. */
+reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
+  

[PULL for-6.2 3/7] hw/ide/ahci: simplify and document PxCI handling

2023-09-06 Thread John Snow
From: Niklas Cassel 

The AHCI spec states that:
For NCQ, PxCI is cleared on command queued successfully.

For non-NCQ, PxCI is cleared on command completed successfully.
(A non-NCQ command that completes with error does not clear PxCI.)

The current QEMU implementation either clears PxCI in check_cmd(),
or in ahci_cmd_done().

check_cmd() will clear PxCI for a command if handle_cmd() returns 0.
handle_cmd() will return -1 if BUSY or DRQ is set.

The QEMU implementation for NCQ commands will currently not set BUSY
or DRQ, so they will always have PxCI cleared by handle_cmd().
ahci_cmd_done() will never even get called for NCQ commands.

Non-NCQ commands are executed by ide_bus_exec_cmd().
Non-NCQ commands in QEMU are implemented either in a sync or in an async
way.

For non-NCQ commands implemented in a sync way, the command handler will
return true, and when ide_bus_exec_cmd() sees that a command handler
returns true, it will call ide_cmd_done() (which will call
ahci_cmd_done()). For a command implemented in a sync way,
ahci_cmd_done() will do nothing (since busy_slot is not set). Instead,
after ide_bus_exec_cmd() has finished, check_cmd() will clear PxCI for
these commands.

For non-NCQ commands implemented in an async way (using either aiocb or
pio_aiocb), the command handler will return false, ide_bus_exec_cmd()
will not call ide_cmd_done(), instead it is expected that the async
callback function will call ide_cmd_done() once the async command is
done. handle_cmd() will set busy_slot, if and only if BUSY or DRQ is
set, and this is checked _after_ ide_bus_exec_cmd() has returned.
handle_cmd() will return -1, so check_cmd() will not clear PxCI.
When the async callback calls ide_cmd_done() (which will call
ahci_cmd_done()), it will see that busy_slot is set, and
ahci_cmd_done() will clear PxCI.

This seems racy, since busy_slot is set _after_ ide_bus_exec_cmd() has
returned. The callback might come before busy_slot gets set. And it is
quite confusing that ahci_cmd_done() will be called for all non-NCQ
commands when the command is done, but will only clear PxCI in certain
cases, even though it will always write a D2H FIS and raise an IRQ.

Even worse, in the case where ahci_cmd_done() does not clear PxCI, it
still raises an IRQ. Host software might thus read an old PxCI value,
since PxCI is cleared (by check_cmd()) after the IRQ has been raised.

Try to simplify this by always setting busy_slot for non-NCQ commands,
such that ahci_cmd_done() will always be responsible for clearing PxCI
for non-NCQ commands.

For NCQ commands, clear PxCI when we receive the D2H FIS, but before
raising the IRQ, see AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and
RegFIS:ClearCI.

Signed-off-by: Niklas Cassel 
Message-id: 20230609140844.202795-5-...@flawful.org
Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 70 ---
 1 file changed, 50 insertions(+), 20 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 4b272397fd..3deaf01add 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -41,9 +41,10 @@
 #include "trace.h"
 
 static void check_cmd(AHCIState *s, int port);
-static int handle_cmd(AHCIState *s, int port, uint8_t slot);
+static void handle_cmd(AHCIState *s, int port, uint8_t slot);
 static void ahci_reset_port(AHCIState *s, int port);
 static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i);
+static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t slot);
 static void ahci_init_d2h(AHCIDevice *ad);
 static int ahci_dma_prepare_buf(const IDEDMA *dma, int32_t limit);
 static bool ahci_map_clb_address(AHCIDevice *ad);
@@ -591,9 +592,8 @@ static void check_cmd(AHCIState *s, int port)
 
 if ((pr->cmd & PORT_CMD_START) && pr->cmd_issue) {
 for (slot = 0; (slot < 32) && pr->cmd_issue; slot++) {
-if ((pr->cmd_issue & (1U << slot)) &&
-!handle_cmd(s, port, slot)) {
-pr->cmd_issue &= ~(1U << slot);
+if (pr->cmd_issue & (1U << slot)) {
+handle_cmd(s, port, slot);
 }
 }
 }
@@ -1123,6 +1123,22 @@ static void process_ncq_command(AHCIState *s, int port, 
const uint8_t *cmd_fis,
 return;
 }
 
+/*
+ * A NCQ command clears the bit in PxCI after the command has been QUEUED
+ * successfully (ERROR not set, BUSY and DRQ cleared).
+ *
+ * For NCQ commands, PxCI will always be cleared here.
+ *
+ * (Once the NCQ command is COMPLETED, the device will send a SDB FIS with
+ * the interrupt bit set, which will clear PxSACT and raise an interrupt.)
+ */
+ahci_clear_cmd_issue(ad, slot);
+
+/*
+ * In reality, for NCQ commands, PxCI is cleared after receiving a D2H FIS
+ * without the interrupt bit set, but since ahci_write_fis_d2h() can raise
+ * an IRQ on error, we need to call them in reverse order.
+ */
 ahci_write_fis_d2h(ad, false);
 
 ncq_tfs->used = 1;
@@ -1197,6 +1213,7 @@ static void 

[PULL for-6.2 2/7] hw/ide/ahci: write D2H FIS when processing NCQ command

2023-09-06 Thread John Snow
From: Niklas Cassel 

The way that BUSY + PxCI is cleared for NCQ (FPDMA QUEUED) commands is
described in SATA 3.5a Gold:

11.15 FPDMA QUEUED command protocol
DFPDMAQ2: ClearInterfaceBsy
"Transmit Register Device to Host FIS with the BSY bit cleared to zero
and the DRQ bit cleared to zero and Interrupt bit cleared to zero to
mark interface ready for the next command."

PxCI is currently cleared by handle_cmd(), but we don't write the D2H
FIS to the FIS Receive Area that actually caused PxCI to be cleared.

Similar to how ahci_pio_transfer() calls ahci_write_fis_pio() with an
additional parameter to write a PIO Setup FIS without raising an IRQ,
add a parameter to ahci_write_fis_d2h() so that ahci_write_fis_d2h()
also can write the FIS to the FIS Receive Area without raising an IRQ.

Change process_ncq_command() to call ahci_write_fis_d2h() without
raising an IRQ (similar to ahci_pio_transfer()), such that the FIS
Receive Area is in sync with the PxTFD shadow register.

E.g. Linux reads status and error fields from the FIS Receive Area
directly, so it is wise to keep the FIS Receive Area and the PxTFD
shadow register in sync.

Signed-off-by: Niklas Cassel 
Message-id: 20230609140844.202795-4-...@flawful.org
Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 48d550f633..4b272397fd 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -43,7 +43,7 @@
 static void check_cmd(AHCIState *s, int port);
 static int handle_cmd(AHCIState *s, int port, uint8_t slot);
 static void ahci_reset_port(AHCIState *s, int port);
-static bool ahci_write_fis_d2h(AHCIDevice *ad);
+static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i);
 static void ahci_init_d2h(AHCIDevice *ad);
 static int ahci_dma_prepare_buf(const IDEDMA *dma, int32_t limit);
 static bool ahci_map_clb_address(AHCIDevice *ad);
@@ -618,7 +618,7 @@ static void ahci_init_d2h(AHCIDevice *ad)
 return;
 }
 
-if (ahci_write_fis_d2h(ad)) {
+if (ahci_write_fis_d2h(ad, true)) {
 ad->init_d2h_sent = true;
 /* We're emulating receiving the first Reg H2D Fis from the device;
  * Update the SIG register, but otherwise proceed as normal. */
@@ -850,7 +850,7 @@ static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t 
len, bool pio_fis_i)
 }
 }
 
-static bool ahci_write_fis_d2h(AHCIDevice *ad)
+static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i)
 {
 AHCIPortRegs *pr = >port_regs;
 uint8_t *d2h_fis;
@@ -864,7 +864,7 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
 d2h_fis = >res_fis[RES_FIS_RFIS];
 
 d2h_fis[0] = SATA_FIS_TYPE_REGISTER_D2H;
-d2h_fis[1] = (1 << 6); /* interrupt bit */
+d2h_fis[1] = d2h_fis_i ? (1 << 6) : 0; /* interrupt bit */
 d2h_fis[2] = s->status;
 d2h_fis[3] = s->error;
 
@@ -890,7 +890,10 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
 ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_TFES);
 }
 
-ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
+if (d2h_fis_i) {
+ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
+}
+
 return true;
 }
 
@@ -1120,6 +1123,8 @@ static void process_ncq_command(AHCIState *s, int port, 
const uint8_t *cmd_fis,
 return;
 }
 
+ahci_write_fis_d2h(ad, false);
+
 ncq_tfs->used = 1;
 ncq_tfs->drive = ad;
 ncq_tfs->slot = slot;
@@ -1506,7 +1511,7 @@ static void ahci_cmd_done(const IDEDMA *dma)
 }
 
 /* update d2h status */
-ahci_write_fis_d2h(ad);
+ahci_write_fis_d2h(ad, true);
 
 if (ad->port_regs.cmd_issue && !ad->check_bh) {
 ad->check_bh = qemu_bh_new_guarded(ahci_check_cmd_bh, ad,
-- 
2.41.0




[PULL for-6.2 1/7] hw/ide/core: set ERR_STAT in unsupported command completion

2023-09-06 Thread John Snow
From: Niklas Cassel 

Currently, the first time sending an unsupported command
(e.g. READ LOG DMA EXT) will not have ERR_STAT set in the completion.
Sending the unsupported command again, will correctly have ERR_STAT set.

When ide_cmd_permitted() returns false, it calls ide_abort_command().
ide_abort_command() first calls ide_transfer_stop(), which will call
ide_transfer_halt() and ide_cmd_done(), after that ide_abort_command()
sets ERR_STAT in status.

ide_cmd_done() for AHCI will call ahci_write_fis_d2h() which writes the
current status in the FIS, and raises an IRQ. (The status here will not
have ERR_STAT set!).

Thus, we cannot call ide_transfer_stop() before setting ERR_STAT, as
ide_transfer_stop() will result in the FIS being written and an IRQ
being raised.

The reason why it works the second time, is that ERR_STAT will still
be set from the previous command, so when writing the FIS, the
completion will correctly have ERR_STAT set.

Set ERR_STAT before writing the FIS (calling cmd_done), so that we will
raise an error IRQ correctly when receiving an unsupported command.

Signed-off-by: Niklas Cassel 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20230609140844.202795-3-...@flawful.org
Signed-off-by: John Snow 
---
 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index ee116891ed..b5e0dcd29b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -533,9 +533,9 @@ BlockAIOCB *ide_issue_trim(
 
 void ide_abort_command(IDEState *s)
 {
-ide_transfer_stop(s);
 s->status = READY_STAT | ERR_STAT;
 s->error = ABRT_ERR;
+ide_transfer_stop(s);
 }
 
 static void ide_set_retry(IDEState *s)
-- 
2.41.0




Re: [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> L1 can reuest to get/set state of any of the supported Guest State
> Buffer (GSB) elements using h_guest_[get|set]_state hcalls.
> These hcalls needs to do some necessary validation check for each
> get/set request based on the flags passed and operation supported.
>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_nested.c | 267 ++
>  include/hw/ppc/spapr_nested.h |  22 +++
>  2 files changed, 289 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 6fbb1bcb02..498e7286fa 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -897,6 +897,138 @@ void init_nested(void)
>  }
>  }
>  
> +static struct guest_state_element *guest_state_element_next(
> +struct guest_state_element *element,
> +int64_t *len,
> +int64_t *num_elements)
> +{
> +uint16_t size;
> +
> +/* size is of element->value[] only. Not whole guest_state_element */
> +size = be16_to_cpu(element->size);
> +
> +if (len) {
> +*len -= size + offsetof(struct guest_state_element, value);
> +}
> +
> +if (num_elements) {
> +*num_elements -= 1;
> +}
> +
> +return (struct guest_state_element *)(element->value + size);
> +}
> +
> +static
> +struct guest_state_element_type *guest_state_element_type_find(uint16_t id)
> +{
> +int i;
> +
> +for (i = 0; i < ARRAY_SIZE(guest_state_element_types); i++)
> +if (id == guest_state_element_types[i].id) {
> +return _state_element_types[i];
> +}
> +
> +return NULL;
> +}
> +
> +static void print_element(struct guest_state_element *element,
> +  struct guest_state_request *gsr)
> +{
> +printf("id:0x%04x size:0x%04x %s ",
> +   be16_to_cpu(element->id), be16_to_cpu(element->size),
> +   gsr->flags & GUEST_STATE_REQUEST_SET ? "set" : "get");
> +printf("buf:0x%016lx ...\n", be64_to_cpu(*(uint64_t *)element->value));

No printfs. These could be GUEST_ERROR qemu logs if anything, make
sure they're relatively well formed messages if you keep them, i.e.,
something a Linux/KVM developer could understand what went wrong.
I.e., no __func__ which is internal to QEMU, use "H_GUEST_GET_STATE"
etc. Ditto for all the rest of the printfs.

> +}
> +
> +static bool guest_state_request_check(struct guest_state_request *gsr)
> +{
> +int64_t num_elements, len = gsr->len;
> +struct guest_state_buffer *gsb = gsr->gsb;
> +struct guest_state_element *element;
> +struct guest_state_element_type *type;
> +uint16_t id, size;
> +
> +/* gsb->num_elements = 0 == 32 bits long */
> +assert(len >= 4);

I haven't looked closely, but can the guest can't crash the
host with malformed requests here?

This API is pretty complicated, make sure you sanitize all inputs
carefully, as early as possible, and without too deep a call and
control flow chain from the API entry point.


> +
> +num_elements = be32_to_cpu(gsb->num_elements);
> +element = gsb->elements;
> +len -= sizeof(gsb->num_elements);
> +
> +/* Walk the buffer to validate the length */
> +while (num_elements) {
> +
> +id = be16_to_cpu(element->id);
> +size = be16_to_cpu(element->size);
> +
> +if (false) {
> +print_element(element, gsr);
> +}
> +/* buffer size too small */
> +if (len < 0) {
> +return false;
> +}
> +
> +type = guest_state_element_type_find(id);
> +if (!type) {
> +printf("%s: Element ID %04x unknown\n", __func__, id);
> +print_element(element, gsr);
> +return false;
> +}
> +
> +if (id == GSB_HV_VCPU_IGNORED_ID) {
> +goto next_element;
> +}
> +
> +if (size != type->size) {
> +printf("%s: Size mismatch. Element ID:%04x. Size Exp:%i 
> Got:%i\n",
> +   __func__, id, type->size, size);
> +print_element(element, gsr);
> +return false;
> +}
> +
> +if ((type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY) &&
> +(gsr->flags & GUEST_STATE_REQUEST_SET)) {
> +printf("%s: trying to set a read-only Element ID:%04x.\n",
> +   __func__, id);
> +return false;
> +}
> +
> +if (type->flags & GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE) {
> +/* guest wide element type */
> +if (!(gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE)) {
> +printf("%s: trying to set a guest wide Element ID:%04x.\n",
> +   __func__, id);
> +return false;
> +}
> +} else {
> +/* thread wide element type */
> +if (gsr->flags & GUEST_STATE_REQUEST_GUEST_WIDE) {
> +printf("%s: trying to set a thread wide Element 

Re: [PATCH] hw/riscv: split RAM into low and high memory

2023-09-06 Thread Alistair Francis
On Thu, Aug 3, 2023 at 10:47 AM Wu, Fei  wrote:
>
> On 8/1/2023 6:46 AM, Daniel Henrique Barboza wrote:
> >
> >
> > On 7/30/23 22:53, Fei Wu wrote:
> >> riscv virt platform's memory started at 0x8000 and
> >> straddled the 4GiB boundary. Curiously enough, this choice
> >> of a memory layout will prevent from launching a VM with
> >> a bit more than 2000MiB and PCIe pass-thru on an x86 host, due
> >> to identity mapping requirements for the MSI doorbell on x86,
> >> and these (APIC/IOAPIC) live right below 4GiB.
> >>
> >> So just split the RAM range into two portions:
> >> - 1 GiB range from 0x8000 to 0xc000.
> >> - The remainder at 0x1
> >>
> >> ...leaving a hole between the ranges.
> >
> > I am afraid this breaks some existing distro setups, like Ubuntu. After
> > this patch
> > this emulation stopped working:
> >
> > ~/work/qemu/build/qemu-system-riscv64 \
> > -machine virt -nographic -m 8G -smp 8 \
> > -kernel ./uboot-ubuntu/usr/lib/u-boot/qemu-riscv64_smode/uboot.elf \
> > -drive file=snapshot.img,format=qcow2,if=virtio \
> > -netdev bridge,id=bridge1,br=virbr0 -device
> > virtio-net-pci,netdev=bridge1
> >
> >
> > This is basically a guest created via the official Canonical tutorial:
> >
> > https://wiki.ubuntu.com/RISC-V/QEMU
> >
> > The error being thrown:
> >
> > =
> >
> > Boot HART ID  : 4
> > Boot HART Domain  : root
> > Boot HART Priv Version: v1.12
> > Boot HART Base ISA: rv64imafdch
> > Boot HART ISA Extensions  : time,sstc
> > Boot HART PMP Count   : 16
> > Boot HART PMP Granularity : 4
> > Boot HART PMP Address Bits: 54
> > Boot HART MHPM Count  : 16
> > Boot HART MIDELEG : 0x1666
> > Boot HART MEDELEG : 0x00f0b509
> >
> >
> > U-Boot 2022.07+dfsg-1ubuntu4.2 (Nov 24 2022 - 18:47:41 +)
> >
> > CPU:
> > rv64imafdch_zicbom_zicboz_zicsr_zifencei_zihintpause_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu
> > Model: riscv-virtio,qemu
> > DRAM:  Unhandled exception: Store/AMO access fault
> > EPC: 802018b8 RA: 802126a0 TVAL: ff733f90
> >
> > Code: b823 06b2 bc23 06b2 b023 08b2 b423 08b2 (b823 08b2)
> >
> >
> > resetting ...
> > System reset not supported on this platform
> > ### ERROR ### Please RESET the board ###
> > QEMU 8.0.90 monitor - type 'help' for more infor
> > =
> >
> >
> > Based on the change made I can make an educated guess on what is going
> > wrong.
> > We have another board with a similar memory topology you're making here,
> > the
> > Microchip Polarfire (microchip_pfsoc.c). We were having some problems
> > with this
> > board while trying to consolidate the boot process between all boards in
> > hw/riscv/boot.c because of its non-continuous RAM bank. The full story
> > can be
> > read in the commit message of 4b402886ac89 ("hw/riscv: change
> > riscv_compute_fdt_addr()
> > semantics") but the short version can be seen in riscv_compute_fdt_addr()
> > from boot.c:
> >
> >  - if ram_start is less than 3072MiB, the FDT will be  put at the lowest
> > value
> > between 3072 MiB and the end of that RAM bank;
> >
> > - if ram_start is higher than 3072 MiB the FDT will be put at the end of
> > the
> > RAM bank.
> >
> > So, after this patch, since riscv_compute_fdt_addr() is being used with
> > the now
> > lower RAM bank, the fdt is being put in LOW_MEM - fdt_size for any setup
> > that has
> > more than 1Gb RAM, and this breaks assumptions made by uboot and Ubuntu
> > and possibly
> > others that are trying to retrieve the FDT from the gap that you created
> > between
> > low and hi mem in this patch.
> >
> > In fact, this same Ubuntu guest I mentioned above will boot if I put
> > only 1 Gb of RAM
> > (-m 1Gb). If I try with -m 1.1Gb I reproduce this error. This can be a
> > validation of
> > the guess I'm making here: Ubuntu is trying to fetch stuff (probably the
> > fdt) from
> > the gap between the memory areas.
> >
> > This change on top of this patch doesn't work either:
> >
> > $ git diff
> > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> > index 8fbdc7220c..dfff48d849 100644
> > --- a/hw/riscv/virt.c
> > +++ b/hw/riscv/virt.c
> > @@ -1335,9 +1335,16 @@ static void virt_machine_done(Notifier *notifier,
> > void *data)
> >   kernel_start_addr, true, NULL);
> >  }
> >
> > -fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base,
> > +if (machine->ram_size < memmap[VIRT_DRAM].size) {
> > +fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base,
> > memmap[VIRT_DRAM].size,
> > machine);
> > +} else {
> > +fdt_load_addr =
> > riscv_compute_fdt_addr(memmap[VIRT_DRAM_HIGH].base,
> > +   memmap[VIRT_DRAM_HIGH].size,
> > +   machine);
> > +}
> > +
> >
> > This would put 

Re: [PATCH] docs/devel: Add cross-compiling doc

2023-09-06 Thread Alistair Francis
On Wed, Jul 26, 2023 at 10:08 PM Andrew Jones  wrote:
>
> Add instructions for how to cross-compile QEMU for RISC-V. The
> file is named generically because there's no reason not to collect
> other architectures steps into the same file, especially because
> several subsections like those for cross-compiling QEMU dependencies
> using meson and a cross-file could be shared. Additionally, other
> approaches to creating sysroots, such as with debootstrap, may be
> documented in this file in the future.
>
> Signed-off-by: Andrew Jones 

I get a warning when building this:

qemu/docs/devel/cross-compiling.rst: WARNING: document isn't included
in any toctree

Do you mind adding a toc reference to it and sending a v2?

Alistair

> ---
>  docs/devel/cross-compiling.rst | 221 +
>  1 file changed, 221 insertions(+)
>  create mode 100644 docs/devel/cross-compiling.rst
>
> diff --git a/docs/devel/cross-compiling.rst b/docs/devel/cross-compiling.rst
> new file mode 100644
> index ..1b988ba54e4c
> --- /dev/null
> +++ b/docs/devel/cross-compiling.rst
> @@ -0,0 +1,221 @@
> +.. SPDX-License-Identifier: GPL-2.0-or-later
> +
> +
> +Cross-compiling QEMU
> +
> +
> +Cross-compiling QEMU first requires the preparation of a cross-toolchain
> +and the cross-compiling of QEMU's dependencies. While the steps will be
> +similar across architectures, each architecture will have its own specific
> +recommendations. This document collects architecture-specific procedures
> +and hints that may be used to cross-compile QEMU, where typically the host
> +environment is x86.
> +
> +RISC-V
> +==
> +
> +Toolchain
> +-
> +
> +Select a root directory for the cross environment
> +^
> +
> +Export an environment variable pointing to a root directory
> +for the cross environment. For example, ::
> +
> +  $ export PREFIX="$HOME/opt/riscv"
> +
> +Create a work directory
> +^^^
> +
> +Tools and several components will need to be downloaded and built. Create
> +a directory for all the work, ::
> +
> +  $ export WORK_DIR="$HOME/work/xqemu"
> +  $ mkdir -p "$WORK_DIR"
> +
> +Select and prepare the toolchain
> +
> +
> +Select a toolchain such as [riscv-toolchain]_ and follow its instructions
> +for building and installing it to ``$PREFIX``, e.g. ::
> +
> +  $ cd "$WORK_DIR"
> +  $ git clone https://github.com/riscv/riscv-gnu-toolchain
> +  $ cd riscv-gnu-toolchain
> +  $ ./configure --prefix="$PREFIX"
> +  $ make -j$(nproc) linux
> +
> +Set the ``$CROSS_COMPILE`` environment variable to the prefix of the cross
> +tools and add the tools to ``$PATH``, ::
> +
> +$ export CROSS_COMPILE=riscv64-unknown-linux-gnu-
> +$ export PATH="$PREFIX/bin:$PATH"
> +
> +Also set ``$SYSROOT``, where all QEMU cross-compiled dependencies will be
> +installed. The toolchain installation likely created a 'sysroot' directory
> +at ``$PREFIX/sysroot``, which is the default location for most cross
> +tools, making it a good location, ::
> +
> +  $ mkdir -p "$PREFIX/sysroot"
> +  $ export SYSROOT="$PREFIX/sysroot"
> +
> +Create a pkg-config wrapper
> +^^^
> +
> +The build processes of QEMU and some of its dependencies depend on
> +pkg-config. Create a wrapper script for it which works for the cross
> +environment: ::
> +
> +  $ cat <"$PREFIX/bin/${CROSS_COMPILE}pkg-config"
> +  #!/bin/sh
> +
> +  [ "\$SYSROOT" ] || exit 1
> +
> +  export PKG_CONFIG_PATH=
> +  export 
> PKG_CONFIG_LIBDIR="\${SYSROOT}/usr/lib/pkgconfig:\${SYSROOT}/usr/lib64/pkgconfig:\${SYSROOT}/usr/share/pkgconfig"
> +
> +  exec pkg-config "\$@"
> +  EOF
> +  $ chmod +x "$PREFIX/bin/${CROSS_COMPILE}pkg-config"
> +
> +Create a cross-file for meson builds
> +
> +
> +meson setup, used by some of QEMU's dependencies, needs a "cross-file" to
> +configure the cross environment. Create one, ::
> +
> +  $ cd "$WORK_DIR"
> +  $ cat  +  [host_machine]
> +  system = 'linux'
> +  cpu_family = 'riscv64'
> +  cpu = 'riscv64'
> +  endian = 'little'
> +
> +  [binaries]
> +  c = '${CROSS_COMPILE}gcc'
> +  cpp = '${CROSS_COMPILE}g++'
> +  ar = '${CROSS_COMPILE}ar'
> +  ld = '${CROSS_COMPILE}ld'
> +  objcopy = '${CROSS_COMPILE}objcopy'
> +  strip = '${CROSS_COMPILE}strip'
> +  pkgconfig = '${CROSS_COMPILE}pkg-config'
> +  EOF
> +
> +Cross-compile dependencies
> +--
> +
> +glibc
> +^
> +
> +If [riscv-toolchain]_ was selected for the toolchain then this step is
> +already complete and glibc has already been installed into ``$SYSROOT``.
> +Otherwise, cross-compile glibc and install it to ``$SYSROOT``.
> +
> +libffi
> +^^
> +
> +::
> +
> +  $ cd "$WORK_DIR"
> +  $ git clone https://gitlab.freedesktop.org/gstreamer/meson-ports/libffi.git
> +  $ cd libffi
> +  $ meson setup --cross-file ../cross_file.txt --prefix="$SYSROOT/usr" _build
> +  $ 

Re: [PATCH] target/riscv: update checks on writing pmpcfg for ePMP to version 1.0

2023-09-06 Thread Alistair Francis
On Tue, Sep 5, 2023 at 2:30 AM Alvin Chang  wrote:
>
> Current checks on writing pmpcfg for ePMP follows ePMP version 0.9.1.
> However, ePMP specification has already been ratified, and there are
> some differences between version 0.9.1 and 1.0. In this commit we update
> the checks of writing pmpcfg to follow ePMP version 1.0.
>
> When mseccfg.MML is set, the constraints to modify PMP rules are:
> 1. Locked rules cannot be removed or modified until a PMP reset, unless
>mseccfg.RLB is set.
> 2. From Smepmp specification version 1.0, chapter 2 section 4b:
>Adding a rule with executable privileges that either is M-mode-only
>or a locked Shared-Region is not possible and such pmpcfg writes are
>ignored, leaving pmpcfg unchanged.
>
> The commit transfers the value of pmpcfg into the index of the ePMP
> truth table, and checks the rules by aforementioned specification
> changes.
>
> Signed-off-by: Alvin Chang 

Thanks for the patch!

As part of this change we should convert ePMP over to Smepmp and drop
the experimental status

Alistair

> ---
>  target/riscv/pmp.c | 51 ++
>  1 file changed, 42 insertions(+), 9 deletions(-)
>
> diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> index a08cd95658..c036ca3e70 100644
> --- a/target/riscv/pmp.c
> +++ b/target/riscv/pmp.c
> @@ -99,16 +99,49 @@ static void pmp_write_cfg(CPURISCVState *env, uint32_t 
> pmp_index, uint8_t val)
>  locked = false;
>  }
>
> -/* mseccfg.MML is set */
> -if (MSECCFG_MML_ISSET(env)) {
> -/* not adding execute bit */
> -if ((val & PMP_LOCK) != 0 && (val & PMP_EXEC) != PMP_EXEC) {
> -locked = false;
> -}
> -/* shared region and not adding X bit */
> -if ((val & PMP_LOCK) != PMP_LOCK &&
> -(val & 0x7) != (PMP_WRITE | PMP_EXEC)) {
> +/*
> + * mseccfg.MML is set. Locked rules cannot be removed or modified
> + * until a PMP reset. Besides, from Smepmp specification version 
> 1.0
> + * , chapter 2 section 4b says:
> + * Adding a rule with executable privileges that either is
> + * M-mode-only or a locked Shared-Region is not possible and such
> + * pmpcfg writes are ignored, leaving pmpcfg unchanged.
> + */
> +if (MSECCFG_MML_ISSET(env) && !pmp_is_locked(env, pmp_index)) {
> +/*
> + * Convert the PMP permissions to match the truth table in 
> the
> + * ePMP spec.
> + */
> +const uint8_t epmp_operation =
> +((val & PMP_LOCK) >> 4) | ((val & PMP_READ) << 2) |
> +(val & PMP_WRITE) | ((val & PMP_EXEC) >> 2);
> +
> +switch (epmp_operation) {
> +/* pmpcfg.L = 0. Neither M-mode-only nor locked 
> Shared-Region */
> +case 0:
> +case 1:
> +case 2:
> +case 3:
> +case 4:
> +case 5:
> +case 6:
> +case 7:
> +/* pmpcfg.L = 1 and pmpcfg.X = 0 (but case 10 is not 
> allowed) */
> +case 8:
> +case 12:
> +case 14:
> +/* pmpcfg.LRWX =  */
> +case 15:  /* Read-only locked Shared-Region on all modes */
>  locked = false;
> +break;
> +/* Other rules which add new code regions are not allowed */
> +case 9:
> +case 10:  /* Execute-only locked Shared-Region on all modes 
> */
> +case 11:
> +case 13:
> +break;
> +default:
> +g_assert_not_reached();
>  }
>  }
>  } else {
> --
> 2.34.1
>
>



Re: [PATCH v3] target/riscv: don't read CSR in riscv_csrrw_do64

2023-09-06 Thread Alistair Francis
On Tue, Aug 8, 2023 at 7:10 PM Nikita Shubin  wrote:
>
> From: Nikita Shubin 
>
> As per ISA:
>
> "For CSRRWI, if rd=x0, then the instruction shall not read the CSR and
> shall not cause any of the side effects that might occur on a CSR read."
>
> trans_csrrwi() and trans_csrrw() call do_csrw() if rd=x0, do_csrw() calls
> riscv_csrrw_do64(), via helper_csrw() passing NULL as *ret_value.
>
> Signed-off-by: Nikita Shubin 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
> Changelog v2:
> - fixed uninitialized old_value
>
> Changelog v3:
> - reword comment and commit message as Deniel suggested
>
> ---
>  target/riscv/csr.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index ea7585329e..c5564d6d53 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -3908,21 +3908,27 @@ static RISCVException riscv_csrrw_do64(CPURISCVState 
> *env, int csrno,
> target_ulong write_mask)
>  {
>  RISCVException ret;
> -target_ulong old_value;
> +target_ulong old_value = 0;
>
>  /* execute combined read/write operation if it exists */
>  if (csr_ops[csrno].op) {
>  return csr_ops[csrno].op(env, csrno, ret_value, new_value, 
> write_mask);
>  }
>
> -/* if no accessor exists then return failure */
> -if (!csr_ops[csrno].read) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -/* read old value */
> -ret = csr_ops[csrno].read(env, csrno, _value);
> -if (ret != RISCV_EXCP_NONE) {
> -return ret;
> +/*
> + * ret_value == NULL means that rd=x0 and we're coming from helper_csrw()
> + * and we can't throw side effects caused by CSR reads.
> + */
> +if (ret_value) {
> +/* if no accessor exists then return failure */
> +if (!csr_ops[csrno].read) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +/* read old value */
> +ret = csr_ops[csrno].read(env, csrno, _value);
> +if (ret != RISCV_EXCP_NONE) {
> +return ret;
> +}
>  }
>
>  /* write value if writable and write mask set, otherwise drop writes */
> --
> 2.39.2
>
>



Re: [PATCH RESEND 10/15] ppc: spapr: Initialize the GSB Elements lookup table.

2023-09-06 Thread Nicholas Piggin
Might be good to add a common nested: prefix to all patches actually.

On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This is a first step towards enabling support for nested PAPR hcalls for
> providing the get/set of various Guest State Buffer (GSB) elements via
> h_guest_[g|s]et_state hcalls. This enables for identifying correct
> callbacks for get/set for each of the elements supported via
> h_guest_[g|s]et_state hcalls, support for which is added in next patch.

Changelog could use work.

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Shivaprasad G Bhat 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_hcall.c  |   1 +
>  hw/ppc/spapr_nested.c | 487 ++
>  include/hw/ppc/ppc.h  |   2 +
>  include/hw/ppc/spapr_nested.h | 102 +++
>  4 files changed, 592 insertions(+)
>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 9b1f225d4a..ca609cb5a4 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1580,6 +1580,7 @@ static void hypercall_register_types(void)
>  spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
>  
>  spapr_register_nested();
> +init_nested();

This is for hcall registration, not general subsystem init I think.
Arguably not sure if it matters, it just looks odd for everything
else to be an hcall except this. I would just add a new init
function.

And actually now I look closer at this, I would not do your papr
hcall init in the cap apply function, if it is possible to do
inside spapr_register_nested(), then that function could look at
which caps are enabled and register the appropriate hcalls. Then
no change to move this into cap code.

>  }
>  
>  type_init(hypercall_register_types)
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index e7956685af..6fbb1bcb02 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c

[snip]

My eyes are going square, I'll review this later.

> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index e095c002dc..d7acc28d17 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -33,6 +33,8 @@ struct ppc_tb_t {
>  QEMUTimer *decr_timer;
>  /* Hypervisor decrementer management */
>  uint64_t hdecr_next;/* Tick for next hdecr interrupt  */
> +/* TB that HDEC should fire and return ctrl back to the Host partition */
> +uint64_t hdecr_expiry_tb;

Why is this here?

>  QEMUTimer *hdecr_timer;
>  int64_t purr_offset;
>  void *opaque;
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index 2e8c6ba1ca..3c0d6a486e 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h

[snip]

>  
> +struct guest_state_element_type {
> +uint16_t id;
> +int size;
> +#define GUEST_STATE_ELEMENT_TYPE_FLAG_GUEST_WIDE 0x1
> +#define GUEST_STATE_ELEMENT_TYPE_FLAG_READ_ONLY  0x2
> +   uint16_t flags;
> +void *(*location)(SpaprMachineStateNestedGuest *, target_ulong);
> +size_t offset;
> +void (*copy)(void *, void *, bool);
> +uint64_t mask;
> +};

I have to wonder whether this is the best way to go. Having
these indicrect function calls and array of "ops" like this
might be limiting the compiler. I wonder if it should just
be done in a switch table, which is how most interpreters
I've seen (which admittedly is not many) seem to do it.

Thanks,
Nick




Re: [PATCH v3] target/riscv: don't read CSR in riscv_csrrw_do64

2023-09-06 Thread Alistair Francis
On Tue, Aug 8, 2023 at 7:10 PM Nikita Shubin  wrote:
>
> From: Nikita Shubin 
>
> As per ISA:
>
> "For CSRRWI, if rd=x0, then the instruction shall not read the CSR and
> shall not cause any of the side effects that might occur on a CSR read."
>
> trans_csrrwi() and trans_csrrw() call do_csrw() if rd=x0, do_csrw() calls
> riscv_csrrw_do64(), via helper_csrw() passing NULL as *ret_value.
>
> Signed-off-by: Nikita Shubin 

Reviewed-by: Alistair Francis 

Alistair

> ---
> Changelog v2:
> - fixed uninitialized old_value
>
> Changelog v3:
> - reword comment and commit message as Deniel suggested
>
> ---
>  target/riscv/csr.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index ea7585329e..c5564d6d53 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -3908,21 +3908,27 @@ static RISCVException riscv_csrrw_do64(CPURISCVState 
> *env, int csrno,
> target_ulong write_mask)
>  {
>  RISCVException ret;
> -target_ulong old_value;
> +target_ulong old_value = 0;
>
>  /* execute combined read/write operation if it exists */
>  if (csr_ops[csrno].op) {
>  return csr_ops[csrno].op(env, csrno, ret_value, new_value, 
> write_mask);
>  }
>
> -/* if no accessor exists then return failure */
> -if (!csr_ops[csrno].read) {
> -return RISCV_EXCP_ILLEGAL_INST;
> -}
> -/* read old value */
> -ret = csr_ops[csrno].read(env, csrno, _value);
> -if (ret != RISCV_EXCP_NONE) {
> -return ret;
> +/*
> + * ret_value == NULL means that rd=x0 and we're coming from helper_csrw()
> + * and we can't throw side effects caused by CSR reads.
> + */
> +if (ret_value) {
> +/* if no accessor exists then return failure */
> +if (!csr_ops[csrno].read) {
> +return RISCV_EXCP_ILLEGAL_INST;
> +}
> +/* read old value */
> +ret = csr_ops[csrno].read(env, csrno, _value);
> +if (ret != RISCV_EXCP_NONE) {
> +return ret;
> +}
>  }
>
>  /* write value if writable and write mask set, otherwise drop writes */
> --
> 2.39.2
>
>



Re: [PATCH RESEND 09/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE_VCPU

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch implements support for hcall H_GUEST_CREATE_VCPU which is
> used to instantiate a new VCPU for a previously created nested guest.
> The L1 provide the guest-id (returned by L0 during call to
> H_GUEST_CREATE) and an associated unique vcpu-id to refer to this
> instance in future calls. It is assumed that vcpu-ids are being
> allocated in a sequential manner and max vcpu limit is 2048.
>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Shivaprasad G Bhat 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_nested.c | 110 ++
>  include/hw/ppc/spapr.h|   1 +
>  include/hw/ppc/spapr_nested.h |   1 +
>  3 files changed, 112 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 09bbbfb341..e7956685af 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -376,6 +376,47 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>  }
>  
> +static
> +SpaprMachineStateNestedGuest *spapr_get_nested_guest(SpaprMachineState 
> *spapr,
> + target_ulong lpid)
> +{
> +SpaprMachineStateNestedGuest *guest;
> +
> +guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
> +return guest;
> +}

Are you namespacing the new API stuff with papr or no? Might be good to
reduce confusion.

> +
> +static bool vcpu_check(SpaprMachineStateNestedGuest *guest,
> +   target_ulong vcpuid,
> +   bool inoutbuf)

What's it checking? That the id is valid? Allocated? Enabled?

> +{
> +struct SpaprMachineStateNestedGuestVcpu *vcpu;
> +
> +if (vcpuid >= NESTED_GUEST_VCPU_MAX) {
> +return false;
> +}
> +
> +if (!(vcpuid < guest->vcpus)) {
> +return false;
> +}
> +
> +vcpu = >vcpu[vcpuid];
> +if (!vcpu->enabled) {
> +return false;
> +}
> +
> +if (!inoutbuf) {
> +return true;
> +}
> +
> +/* Check to see if the in/out buffers are registered */
> +if (vcpu->runbufin.addr && vcpu->runbufout.addr) {
> +return true;
> +}
> +
> +return false;
> +}
> +
>  static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
>   SpaprMachineState *spapr,
>   target_ulong opcode,
> @@ -448,6 +489,11 @@ static void
>  destroy_guest_helper(gpointer value)
>  {
>  struct SpaprMachineStateNestedGuest *guest = value;
> +int i = 0;

Don't need to set i = 0 twice. A newline would be good though.

> +for (i = 0; i < guest->vcpus; i++) {
> +cpu_ppc_tb_free(>vcpu[i].env);
> +}
> +g_free(guest->vcpu);
>  g_free(guest);
>  }
>  
> @@ -518,6 +564,69 @@ static target_ulong h_guest_create(PowerPCCPU *cpu,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_guest_create_vcpu(PowerPCCPU *cpu,
> +SpaprMachineState *spapr,
> +target_ulong opcode,
> +target_ulong *args)
> +{
> +CPUPPCState *env = >env, *l2env;
> +target_ulong flags = args[0];
> +target_ulong lpid = args[1];
> +target_ulong vcpuid = args[2];
> +SpaprMachineStateNestedGuest *guest;
> +
> +if (flags) { /* don't handle any flags for now */
> +return H_UNSUPPORTED_FLAG;
> +}
> +
> +guest = spapr_get_nested_guest(spapr, lpid);
> +if (!guest) {
> +return H_P2;
> +}
> +
> +if (vcpuid < guest->vcpus) {
> +return H_IN_USE;
> +}
> +
> +if (guest->vcpus >= NESTED_GUEST_VCPU_MAX) {
> +return H_P3;
> +}
> +
> +if (guest->vcpus) {
> +struct SpaprMachineStateNestedGuestVcpu *vcpus;

Ditto for using typedefs. Do a sweep for this.

> +vcpus = g_try_renew(struct SpaprMachineStateNestedGuestVcpu,
> +guest->vcpu,
> +guest->vcpus + 1);

g_try_renew doesn't work with NULL mem? That's unfortunate.

> +if (!vcpus) {
> +return H_NO_MEM;
> +}
> +memset([guest->vcpus], 0,
> +   sizeof(struct SpaprMachineStateNestedGuestVcpu));
> +guest->vcpu = vcpus;
> +l2env = [guest->vcpus].env;
> +} else {
> +guest->vcpu = g_try_new0(struct SpaprMachineStateNestedGuestVcpu, 1);
> +if (guest->vcpu == NULL) {
> +return H_NO_MEM;
> +}
> +l2env = >vcpu->env;
> +}

These two legs seem to be doing the same thing in different
ways wrt l2env. Just assign guest->vcpu in the branches and
get the l2env from guest->vcpu[guest->vcpus] afterward, no?

> +/* need to memset to zero otherwise we leak L1 state to L2 */
> +memset(l2env, 0, sizeof(CPUPPCState));

AFAIKS you just zeroed it above.

> +/* 

Re: [PATCH RESEND 14/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_DELETE

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This hcall is used by L1 to delete a guest entry in L0 or can also be
> used to delete all guests if needed (usually in shutdown scenarios).

I'd squash with at least the create hcall.

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_nested.c | 32 
>  include/hw/ppc/spapr_nested.h |  1 +
>  2 files changed, 33 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 3605f27115..5afdad4990 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -1692,6 +1692,37 @@ static void exit_process_output_buffer(PowerPCCPU *cpu,
>  return;
>  }
>  
> +static target_ulong h_guest_delete(PowerPCCPU *cpu,
> +   SpaprMachineState *spapr,
> +   target_ulong opcode,
> +   target_ulong *args)
> +{
> +target_ulong flags = args[0];
> +target_ulong lpid = args[1];
> +struct SpaprMachineStateNestedGuest *guest;
> +
> +if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
> +return H_FUNCTION;
> +}

If you only register these hcalls when you apply the cap, then you
don't need to test it, right?

Open question as to whether it's better to register hcalls when
enabling such caps, or do the tests for them here. I guess the
former makes sense.

> +
> +/* handle flag deleteAllGuests, remaining bits reserved */

This comment is confusing. What is flag deleteAllGuests?

H_GUEST_DELETE_ALL_MASK? Is that a mask, or a flag?

> +if (flags & ~H_GUEST_DELETE_ALL_MASK) {
> +return H_UNSUPPORTED_FLAG;
> +} else if (flags & H_GUEST_DELETE_ALL_MASK) {
> +g_hash_table_destroy(spapr->nested.guests);
> +return H_SUCCESS;
> +}
> +
> +guest = g_hash_table_lookup(spapr->nested.guests, GINT_TO_POINTER(lpid));
> +if (!guest) {
> +return H_P2;
> +}
> +
> +g_hash_table_remove(spapr->nested.guests, GINT_TO_POINTER(lpid));
> +
> +return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>  spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -1709,6 +1740,7 @@ void spapr_register_nested_phyp(void)
>  spapr_register_hypercall(H_GUEST_SET_STATE   , h_guest_set_state);
>  spapr_register_hypercall(H_GUEST_GET_STATE   , h_guest_get_state);
>  spapr_register_hypercall(H_GUEST_RUN_VCPU, h_guest_run_vcpu);
> +spapr_register_hypercall(H_GUEST_DELETE  , h_guest_delete);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index ca5d28c06e..9eb43778ad 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -209,6 +209,7 @@
>  #define H_GUEST_GETSET_STATE_FLAG_GUEST_WIDE 0x8000 /* BE in GSB 
> */
>  #define GUEST_STATE_REQUEST_GUEST_WIDE   0x1
>  #define GUEST_STATE_REQUEST_SET  0x2
> +#define H_GUEST_DELETE_ALL_MASK  0x8000ULL
>  
>  #define GUEST_STATE_ELEMENT(i, sz, s, f, ptr, c) { \
>  .id = (i), \




Re: [PATCH] target/riscv: Align the AIA model to v1.0 ratified spec

2023-09-06 Thread Alistair Francis
On Wed, Aug 16, 2023 at 4:18 PM Tommy Wu  wrote:
>
> According to the new spec, when vsiselect has a reserved value, attempts
> from M-mode or HS-mode to access vsireg, or from VS-mode to access
> sireg, should preferably raise an illegal instruction exception.
>
> Signed-off-by: Tommy Wu 
> Reviewed-by: Frank Chang 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/csr.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index ea7585329e..e4244b8dac 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -1685,7 +1685,7 @@ static int rmw_iprio(target_ulong xlen,
>  static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
>   target_ulong new_val, target_ulong wr_mask)
>  {
> -bool virt;
> +bool virt, isel_reserved;
>  uint8_t *iprio;
>  int ret = -EINVAL;
>  target_ulong priv, isel, vgein;
> @@ -1695,6 +1695,7 @@ static int rmw_xireg(CPURISCVState *env, int csrno, 
> target_ulong *val,
>
>  /* Decode register details from CSR number */
>  virt = false;
> +isel_reserved = false;
>  switch (csrno) {
>  case CSR_MIREG:
>  iprio = env->miprio;
> @@ -1739,11 +1740,13 @@ static int rmw_xireg(CPURISCVState *env, int csrno, 
> target_ulong *val,
>riscv_cpu_mxl_bits(env)),
>  val, new_val, wr_mask);
>  }
> +} else {
> +isel_reserved = true;
>  }
>
>  done:
>  if (ret) {
> -return (env->virt_enabled && virt) ?
> +return (env->virt_enabled && virt && !isel_reserved) ?
> RISCV_EXCP_VIRT_INSTRUCTION_FAULT : RISCV_EXCP_ILLEGAL_INST;
>  }
>  return RISCV_EXCP_NONE;
> --
> 2.27.0
>
>



Re: [PATCH] docs/devel: Add cross-compiling doc

2023-09-06 Thread Alistair Francis
On Wed, Jul 26, 2023 at 10:08 PM Andrew Jones  wrote:
>
> Add instructions for how to cross-compile QEMU for RISC-V. The
> file is named generically because there's no reason not to collect
> other architectures steps into the same file, especially because
> several subsections like those for cross-compiling QEMU dependencies
> using meson and a cross-file could be shared. Additionally, other
> approaches to creating sysroots, such as with debootstrap, may be
> documented in this file in the future.
>
> Signed-off-by: Andrew Jones 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  docs/devel/cross-compiling.rst | 221 +
>  1 file changed, 221 insertions(+)
>  create mode 100644 docs/devel/cross-compiling.rst
>
> diff --git a/docs/devel/cross-compiling.rst b/docs/devel/cross-compiling.rst
> new file mode 100644
> index ..1b988ba54e4c
> --- /dev/null
> +++ b/docs/devel/cross-compiling.rst
> @@ -0,0 +1,221 @@
> +.. SPDX-License-Identifier: GPL-2.0-or-later
> +
> +
> +Cross-compiling QEMU
> +
> +
> +Cross-compiling QEMU first requires the preparation of a cross-toolchain
> +and the cross-compiling of QEMU's dependencies. While the steps will be
> +similar across architectures, each architecture will have its own specific
> +recommendations. This document collects architecture-specific procedures
> +and hints that may be used to cross-compile QEMU, where typically the host
> +environment is x86.
> +
> +RISC-V
> +==
> +
> +Toolchain
> +-
> +
> +Select a root directory for the cross environment
> +^
> +
> +Export an environment variable pointing to a root directory
> +for the cross environment. For example, ::
> +
> +  $ export PREFIX="$HOME/opt/riscv"
> +
> +Create a work directory
> +^^^
> +
> +Tools and several components will need to be downloaded and built. Create
> +a directory for all the work, ::
> +
> +  $ export WORK_DIR="$HOME/work/xqemu"
> +  $ mkdir -p "$WORK_DIR"
> +
> +Select and prepare the toolchain
> +
> +
> +Select a toolchain such as [riscv-toolchain]_ and follow its instructions
> +for building and installing it to ``$PREFIX``, e.g. ::
> +
> +  $ cd "$WORK_DIR"
> +  $ git clone https://github.com/riscv/riscv-gnu-toolchain
> +  $ cd riscv-gnu-toolchain
> +  $ ./configure --prefix="$PREFIX"
> +  $ make -j$(nproc) linux
> +
> +Set the ``$CROSS_COMPILE`` environment variable to the prefix of the cross
> +tools and add the tools to ``$PATH``, ::
> +
> +$ export CROSS_COMPILE=riscv64-unknown-linux-gnu-
> +$ export PATH="$PREFIX/bin:$PATH"
> +
> +Also set ``$SYSROOT``, where all QEMU cross-compiled dependencies will be
> +installed. The toolchain installation likely created a 'sysroot' directory
> +at ``$PREFIX/sysroot``, which is the default location for most cross
> +tools, making it a good location, ::
> +
> +  $ mkdir -p "$PREFIX/sysroot"
> +  $ export SYSROOT="$PREFIX/sysroot"
> +
> +Create a pkg-config wrapper
> +^^^
> +
> +The build processes of QEMU and some of its dependencies depend on
> +pkg-config. Create a wrapper script for it which works for the cross
> +environment: ::
> +
> +  $ cat <"$PREFIX/bin/${CROSS_COMPILE}pkg-config"
> +  #!/bin/sh
> +
> +  [ "\$SYSROOT" ] || exit 1
> +
> +  export PKG_CONFIG_PATH=
> +  export 
> PKG_CONFIG_LIBDIR="\${SYSROOT}/usr/lib/pkgconfig:\${SYSROOT}/usr/lib64/pkgconfig:\${SYSROOT}/usr/share/pkgconfig"
> +
> +  exec pkg-config "\$@"
> +  EOF
> +  $ chmod +x "$PREFIX/bin/${CROSS_COMPILE}pkg-config"
> +
> +Create a cross-file for meson builds
> +
> +
> +meson setup, used by some of QEMU's dependencies, needs a "cross-file" to
> +configure the cross environment. Create one, ::
> +
> +  $ cd "$WORK_DIR"
> +  $ cat  +  [host_machine]
> +  system = 'linux'
> +  cpu_family = 'riscv64'
> +  cpu = 'riscv64'
> +  endian = 'little'
> +
> +  [binaries]
> +  c = '${CROSS_COMPILE}gcc'
> +  cpp = '${CROSS_COMPILE}g++'
> +  ar = '${CROSS_COMPILE}ar'
> +  ld = '${CROSS_COMPILE}ld'
> +  objcopy = '${CROSS_COMPILE}objcopy'
> +  strip = '${CROSS_COMPILE}strip'
> +  pkgconfig = '${CROSS_COMPILE}pkg-config'
> +  EOF
> +
> +Cross-compile dependencies
> +--
> +
> +glibc
> +^
> +
> +If [riscv-toolchain]_ was selected for the toolchain then this step is
> +already complete and glibc has already been installed into ``$SYSROOT``.
> +Otherwise, cross-compile glibc and install it to ``$SYSROOT``.
> +
> +libffi
> +^^
> +
> +::
> +
> +  $ cd "$WORK_DIR"
> +  $ git clone https://gitlab.freedesktop.org/gstreamer/meson-ports/libffi.git
> +  $ cd libffi
> +  $ meson setup --cross-file ../cross_file.txt --prefix="$SYSROOT/usr" _build
> +  $ ninja -C _build
> +  $ ninja -C _build install
> +
> +*Building libffi seperately avoids a compilation error generated when
> +building it as 

Re: [PATCH RESEND 08/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_CREATE

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This hcall is used by L1 to indicate to L0 that a new nested guest needs
> to be created and therefore necessary resource allocation shall be made.
> The L0 uses a hash table for nested guest specific resource management.
> This data structure is further utilized by other hcalls to operate on
> related members during entire life cycle of the nested guest.

Similar comment for changelog re detail. Detailed specification of API
and implementation could go in comments or documentation if useful.

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Shivaprasad G Bhat 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_nested.c | 75 +++
>  include/hw/ppc/spapr_nested.h |  3 ++
>  2 files changed, 78 insertions(+)
>
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 9af65f257f..09bbbfb341 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -444,6 +444,80 @@ static target_ulong h_guest_set_capabilities(PowerPCCPU 
> *cpu,
>  return H_SUCCESS;
>  }
>  
> +static void
> +destroy_guest_helper(gpointer value)
> +{
> +struct SpaprMachineStateNestedGuest *guest = value;
> +g_free(guest);
> +}
> +
> +static target_ulong h_guest_create(PowerPCCPU *cpu,
> +   SpaprMachineState *spapr,
> +   target_ulong opcode,
> +   target_ulong *args)
> +{
> +CPUPPCState *env = >env;
> +target_ulong flags = args[0];
> +target_ulong continue_token = args[1];
> +uint64_t lpid;
> +int nguests = 0;
> +struct SpaprMachineStateNestedGuest *guest;
> +
> +if (flags) { /* don't handle any flags for now */
> +return H_UNSUPPORTED_FLAG;
> +}
> +
> +if (continue_token != -1) {
> +return H_P2;
> +}
> +
> +if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_PAPR)) {
> +return H_FUNCTION;
> +}
> +
> +if (!spapr->nested.capabilities_set) {
> +return H_STATE;
> +}
> +
> +if (!spapr->nested.guests) {
> +spapr->nested.lpid_max = NESTED_GUEST_MAX;
> +spapr->nested.guests = g_hash_table_new_full(NULL,
> + NULL,
> + NULL,
> + destroy_guest_helper);

Is lpid_max only used by create? Probably no need to have it in spapr
then->nested then. Also, do we even need to have a limit?

> +}
> +
> +nguests = g_hash_table_size(spapr->nested.guests);
> +
> +if (nguests == spapr->nested.lpid_max) {
> +return H_NO_MEM;
> +}
> +
> +/* Lookup for available lpid */
> +for (lpid = 1; lpid < spapr->nested.lpid_max; lpid++) {

PAPR API calls it "guest ID" I think. Should change all references to
lpid to that.

> +if (!(g_hash_table_lookup(spapr->nested.guests,
> +  GINT_TO_POINTER(lpid {
> +break;
> +}
> +}
> +if (lpid == spapr->nested.lpid_max) {
> +return H_NO_MEM;
> +}
> +
> +guest = g_try_new0(struct SpaprMachineStateNestedGuest, 1);
> +if (!guest) {
> +return H_NO_MEM;
> +}
> +
> +guest->pvr_logical = spapr->nested.pvr_base;
> +
> +g_hash_table_insert(spapr->nested.guests, GINT_TO_POINTER(lpid), guest);
> +printf("%s: lpid: %lu (MAX: %i)\n", __func__, lpid, 
> spapr->nested.lpid_max);

Remove printf.

> +
> +env->gpr[4] = lpid;
> +return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>  spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -456,6 +530,7 @@ void spapr_register_nested_phyp(void)
>  {
>  spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, 
> h_guest_get_capabilities);
>  spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, 
> h_guest_set_capabilities);
> +spapr_register_hypercall(H_GUEST_CREATE  , h_guest_create);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index a7996251cb..7841027df8 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -197,6 +197,9 @@
>  #define H_GUEST_CAP_P9_MODE_BMAP1
>  #define H_GUEST_CAP_P10_MODE_BMAP   2
>  
> +/* Nested PAPR API macros */
> +#define NESTED_GUEST_MAX 4096

Prefix with PAPR_?

Thanks,
Nick

> +
>  typedef struct SpaprMachineStateNestedGuest {
>  unsigned long vcpus;
>  struct SpaprMachineStateNestedGuestVcpu *vcpu;




RE: [PATCH v1 21/22] vfio/pci: Allow the selection of a given iommu backend

2023-09-06 Thread Duan, Zhenzhong



>-Original Message-
>From: Jason Gunthorpe 
>Sent: Thursday, September 7, 2023 9:11 AM
>To: Alex Williamson 
>Subject: Re: [PATCH v1 21/22] vfio/pci: Allow the selection of a given iommu
>backend
>
>On Wed, Sep 06, 2023 at 01:09:26PM -0600, Alex Williamson wrote:
>> On Wed, 6 Sep 2023 15:10:39 -0300
>> Jason Gunthorpe  wrote:
>>
>> > On Wed, Aug 30, 2023 at 06:37:53PM +0800, Zhenzhong Duan wrote:
>> > > Note the /dev/iommu device may have been pre-opened by a
>> > > management tool such as libvirt. This mode is no more considered
>> > > for the legacy backend. So let's remove the "TODO" comment.
>> >
>> > Can you show an example of that syntax too?
>>
>> Unless you're just looking for something in the commit log,
>
>Yeah, I was thinking the commit log
>
>> patch 16/ added the following to the qemu help output:
>>
>> +#ifdef CONFIG_IOMMUFD
>> +``-object iommufd,id=id[,fd=fd]``
>> +Creates an iommufd backend which allows control of DMA mapping
>> +through the /dev/iommu device.
>> +
>> +The ``id`` parameter is a unique ID which frontends (such as
>> +vfio-pci of vdpa) will use to connect withe the iommufd backend.
>> +
>> +The ``fd`` parameter is an optional pre-opened file descriptor
>> +resulting from /dev/iommu opening. Usually the iommufd is shared
>> +accross all subsystems, bringing the benefit of centralized
>> +reference counting.
>> +#endif

Thanks for point out this issue.
I can think of two choices:
1. squash this patch to PATCH16
2. keep this patch separate and to pull fd passing related change from PATCH16 
into this one
Please kindly suggest which way is preferred in community.

Btw: I only enable fd passing for vfio pci device, let me know if it's preferred
to include all other vfio devices in this series, then I'll add them.

>>
>> > Also, the vfio device should be openable externally as well
>>
>> Appears to be added in the very next patch in the series.  Thanks,
>
>Indeed, I got confused because this removed the TODO - that could
>reasonably be pushed to the next patch and include a bit more detail
>in the commit message

Good idea, will fix.

Thanks
Zhenzhong



Re: [PATCH] docs/devel: Add cross-compiling doc

2023-09-06 Thread Alistair Francis
On Wed, Jul 26, 2023 at 10:08 PM Andrew Jones  wrote:
>
> Add instructions for how to cross-compile QEMU for RISC-V. The
> file is named generically because there's no reason not to collect
> other architectures steps into the same file, especially because
> several subsections like those for cross-compiling QEMU dependencies
> using meson and a cross-file could be shared. Additionally, other
> approaches to creating sysroots, such as with debootstrap, may be
> documented in this file in the future.
>
> Signed-off-by: Andrew Jones 

Acked-by: Alistair Francis 

Alistair

> ---
>  docs/devel/cross-compiling.rst | 221 +
>  1 file changed, 221 insertions(+)
>  create mode 100644 docs/devel/cross-compiling.rst
>
> diff --git a/docs/devel/cross-compiling.rst b/docs/devel/cross-compiling.rst
> new file mode 100644
> index ..1b988ba54e4c
> --- /dev/null
> +++ b/docs/devel/cross-compiling.rst
> @@ -0,0 +1,221 @@
> +.. SPDX-License-Identifier: GPL-2.0-or-later
> +
> +
> +Cross-compiling QEMU
> +
> +
> +Cross-compiling QEMU first requires the preparation of a cross-toolchain
> +and the cross-compiling of QEMU's dependencies. While the steps will be
> +similar across architectures, each architecture will have its own specific
> +recommendations. This document collects architecture-specific procedures
> +and hints that may be used to cross-compile QEMU, where typically the host
> +environment is x86.
> +
> +RISC-V
> +==
> +
> +Toolchain
> +-
> +
> +Select a root directory for the cross environment
> +^
> +
> +Export an environment variable pointing to a root directory
> +for the cross environment. For example, ::
> +
> +  $ export PREFIX="$HOME/opt/riscv"
> +
> +Create a work directory
> +^^^
> +
> +Tools and several components will need to be downloaded and built. Create
> +a directory for all the work, ::
> +
> +  $ export WORK_DIR="$HOME/work/xqemu"
> +  $ mkdir -p "$WORK_DIR"
> +
> +Select and prepare the toolchain
> +
> +
> +Select a toolchain such as [riscv-toolchain]_ and follow its instructions
> +for building and installing it to ``$PREFIX``, e.g. ::
> +
> +  $ cd "$WORK_DIR"
> +  $ git clone https://github.com/riscv/riscv-gnu-toolchain
> +  $ cd riscv-gnu-toolchain
> +  $ ./configure --prefix="$PREFIX"
> +  $ make -j$(nproc) linux
> +
> +Set the ``$CROSS_COMPILE`` environment variable to the prefix of the cross
> +tools and add the tools to ``$PATH``, ::
> +
> +$ export CROSS_COMPILE=riscv64-unknown-linux-gnu-
> +$ export PATH="$PREFIX/bin:$PATH"
> +
> +Also set ``$SYSROOT``, where all QEMU cross-compiled dependencies will be
> +installed. The toolchain installation likely created a 'sysroot' directory
> +at ``$PREFIX/sysroot``, which is the default location for most cross
> +tools, making it a good location, ::
> +
> +  $ mkdir -p "$PREFIX/sysroot"
> +  $ export SYSROOT="$PREFIX/sysroot"
> +
> +Create a pkg-config wrapper
> +^^^
> +
> +The build processes of QEMU and some of its dependencies depend on
> +pkg-config. Create a wrapper script for it which works for the cross
> +environment: ::
> +
> +  $ cat <"$PREFIX/bin/${CROSS_COMPILE}pkg-config"
> +  #!/bin/sh
> +
> +  [ "\$SYSROOT" ] || exit 1
> +
> +  export PKG_CONFIG_PATH=
> +  export 
> PKG_CONFIG_LIBDIR="\${SYSROOT}/usr/lib/pkgconfig:\${SYSROOT}/usr/lib64/pkgconfig:\${SYSROOT}/usr/share/pkgconfig"
> +
> +  exec pkg-config "\$@"
> +  EOF
> +  $ chmod +x "$PREFIX/bin/${CROSS_COMPILE}pkg-config"
> +
> +Create a cross-file for meson builds
> +
> +
> +meson setup, used by some of QEMU's dependencies, needs a "cross-file" to
> +configure the cross environment. Create one, ::
> +
> +  $ cd "$WORK_DIR"
> +  $ cat  +  [host_machine]
> +  system = 'linux'
> +  cpu_family = 'riscv64'
> +  cpu = 'riscv64'
> +  endian = 'little'
> +
> +  [binaries]
> +  c = '${CROSS_COMPILE}gcc'
> +  cpp = '${CROSS_COMPILE}g++'
> +  ar = '${CROSS_COMPILE}ar'
> +  ld = '${CROSS_COMPILE}ld'
> +  objcopy = '${CROSS_COMPILE}objcopy'
> +  strip = '${CROSS_COMPILE}strip'
> +  pkgconfig = '${CROSS_COMPILE}pkg-config'
> +  EOF
> +
> +Cross-compile dependencies
> +--
> +
> +glibc
> +^
> +
> +If [riscv-toolchain]_ was selected for the toolchain then this step is
> +already complete and glibc has already been installed into ``$SYSROOT``.
> +Otherwise, cross-compile glibc and install it to ``$SYSROOT``.
> +
> +libffi
> +^^
> +
> +::
> +
> +  $ cd "$WORK_DIR"
> +  $ git clone https://gitlab.freedesktop.org/gstreamer/meson-ports/libffi.git
> +  $ cd libffi
> +  $ meson setup --cross-file ../cross_file.txt --prefix="$SYSROOT/usr" _build
> +  $ ninja -C _build
> +  $ ninja -C _build install
> +
> +*Building libffi seperately avoids a compilation error generated when
> +building it as a 

Re: [PATCH RESEND 07/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_SET_CAPABILITIES

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch implements nested PAPR hcall H_GUEST_SET_CAPABILITIES.
> This is used by L1 to set capabilities of the nested guest being
> created. The capabilities being set are subset of the capabilities
> returned from the previous call to H_GUEST_GET_CAPABILITIES hcall.
> Currently, it only supports P9/P10 capability check through PVR.
>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr.c|  1 +
>  hw/ppc/spapr_nested.c | 46 +++
>  include/hw/ppc/spapr_nested.h |  3 +++
>  3 files changed, 50 insertions(+)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index cbab7a825f..7c6f6ee25d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3443,6 +3443,7 @@ static void spapr_instance_init(Object *obj)
>  "Host serial number to advertise in guest device tree");
>  /* Nested */
>  spapr->nested.api = 0;
> +spapr->nested.capabilities_set = false;

I would actually think about moving spapr->nested init into
spapr_nested.c.

>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 37f3a49be2..9af65f257f 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -399,6 +399,51 @@ static target_ulong h_guest_get_capabilities(PowerPCCPU 
> *cpu,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_guest_set_capabilities(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> +  target_ulong *args)
> +{
> +CPUPPCState *env = >env;
> +target_ulong flags = args[0];
> +target_ulong capabilities = args[1];
> +
> +if (flags) { /* don't handle any flags capabilities for now */
> +return H_PARAMETER;
> +}
> +
> +

May need to do a pass over whitespace.

> +/* isn't supported */
> +if (capabilities & H_GUEST_CAPABILITIES_COPY_MEM) {
> +env->gpr[4] = 0;
> +return H_P2;
> +}
> +
> +if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +(CPU_POWERPC_POWER9_BASE)) {
> +/* We are a P9 */
> +if (!(capabilities & H_GUEST_CAPABILITIES_P9_MODE)) {
> +env->gpr[4] = 1;
> +return H_P2;
> +}
> +}
> +
> +if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +(CPU_POWERPC_POWER10_BASE)) {
> +/* We are a P10 */

The 2 comments above aren't helpful. Just remove them.

> +if (!(capabilities & H_GUEST_CAPABILITIES_P10_MODE)) {
> +env->gpr[4] = 2;
> +return H_P2;
> +}
> +}
> +
> +spapr->nested.capabilities_set = true;

Is it okay to set twice? If not, add a check. If yes, remove
capabilities_set until it's needed.

> +
> +spapr->nested.pvr_base = env->spr[SPR_PVR];
> +
> +return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>  spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -410,6 +455,7 @@ void spapr_register_nested(void)
>  void spapr_register_nested_phyp(void)
>  {
>  spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, 
> h_guest_get_capabilities);
> +spapr_register_hypercall(H_GUEST_SET_CAPABILITIES, 
> h_guest_set_capabilities);
>  }
>  
>  #else
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index ce198e9f70..a7996251cb 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -193,6 +193,9 @@
>  #define H_GUEST_CAPABILITIES_COPY_MEM 0x8000
>  #define H_GUEST_CAPABILITIES_P9_MODE  0x4000
>  #define H_GUEST_CAPABILITIES_P10_MODE 0x2000
> +#define H_GUEST_CAP_COPY_MEM_BMAP   0
> +#define H_GUEST_CAP_P9_MODE_BMAP1
> +#define H_GUEST_CAP_P10_MODE_BMAP   2
>  
>  typedef struct SpaprMachineStateNestedGuest {
>  unsigned long vcpus;




Re: [PATCH RESEND 06/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_GET_CAPABILITIES

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch implements nested PAPR hcall H_GUEST_GET_CAPABILITIES and
> also enables registration of nested PAPR hcalls whenever an L0 is
> launched with cap-nested-papr=true. The common registration routine
> shall be used by future patches for registration of related hcall
> support
> being added. This hcall is used by L1 kernel to get the set of guest
> capabilities that are supported by L0 (Qemu TCG).

Changelog can drop "This patch". Probably don't have to be so
detailed here either -- we already established that PAPR hcalls can
be used with cap-nested-papr in the last patch, we know that L1
kernels make the hcalls to the vhyp, etc.

"Introduce the nested PAPR hcall H_GUEST_GET_CAPABILITIES which
is used to query the capabilities of the API and the L2 guests
it provides."

I would squash this with set.

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr_caps.c   |  1 +
>  hw/ppc/spapr_nested.c | 35 +++
>  include/hw/ppc/spapr_nested.h |  6 ++
>  3 files changed, 42 insertions(+)
>
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index d3b9f107aa..cbe53a79ec 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -511,6 +511,7 @@ static void cap_nested_papr_apply(SpaprMachineState 
> *spapr,
>  return;
>  }
>  spapr->nested.api = NESTED_API_PAPR;
> +spapr_register_nested_phyp();
>  } else if (kvm_enabled()) {
>  /*
>   * this gets executed in L1 qemu when L2 is launched,

Hmm, this doesn't match nested HV registration. If you want to register
the hcalls in the cap apply, can you move spapr_register_nested()
there first? It may make more sense to go in as a dummy function with
the cap patch first, since you don't introduce all hcalls together.

Also phyp->papr. Scrub for phyp please.

> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index a669470f1a..37f3a49be2 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -6,6 +6,7 @@
>  #include "hw/ppc/spapr.h"
>  #include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/ppc/spapr_nested.h"
> +#include "cpu-models.h"
>  
>  #ifdef CONFIG_TCG
>  #define PRTS_MASK  0x1f
> @@ -375,6 +376,29 @@ void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  address_space_unmap(CPU(cpu)->as, regs, len, len, true);
>  }
>  
> +static target_ulong h_guest_get_capabilities(PowerPCCPU *cpu,
> + SpaprMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> +CPUPPCState *env = >env;
> +target_ulong flags = args[0];
> +
> +if (flags) { /* don't handle any flags capabilities for now */
> +return H_PARAMETER;
> +}
> +
> +if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +(CPU_POWERPC_POWER9_BASE))
> +env->gpr[4] = H_GUEST_CAPABILITIES_P9_MODE;
> +
> +if ((env->spr[SPR_PVR] & CPU_POWERPC_POWER_SERVER_MASK) ==
> +(CPU_POWERPC_POWER10_BASE))
> +env->gpr[4] = H_GUEST_CAPABILITIES_P10_MODE;
> +
> +return H_SUCCESS;
> +}
> +
>  void spapr_register_nested(void)
>  {
>  spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> @@ -382,6 +406,12 @@ void spapr_register_nested(void)
>  spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
>  spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, 
> h_copy_tofrom_guest);
>  }
> +
> +void spapr_register_nested_phyp(void)
> +{
> +spapr_register_hypercall(H_GUEST_GET_CAPABILITIES, 
> h_guest_get_capabilities);
> +}
> +
>  #else
>  void spapr_exit_nested(PowerPCCPU *cpu, int excp)
>  {
> @@ -392,4 +422,9 @@ void spapr_register_nested(void)
>  {
>  /* DO NOTHING */
>  }
> +
> +void spapr_register_nested_phyp(void)
> +{
> +/* DO NOTHING */
> +}
>  #endif
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index f8db31075b..ce198e9f70 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -189,6 +189,11 @@
>  /* End of list of Guest State Buffer Element IDs */
>  #define GSB_LASTGSB_VCPU_SPR_ASDR
>  
> +/* Bit masks to be used in nested PAPR API */
> +#define H_GUEST_CAPABILITIES_COPY_MEM 0x8000
> +#define H_GUEST_CAPABILITIES_P9_MODE  0x4000
> +#define H_GUEST_CAPABILITIES_P10_MODE 0x2000

See introducing these defines with the patch that uses them isn't so
bad :)

Thanks,
Nick

> +
>  typedef struct SpaprMachineStateNestedGuest {
>  unsigned long vcpus;
>  struct SpaprMachineStateNestedGuestVcpu *vcpu;
> @@ -331,6 +336,7 @@ struct nested_ppc_state {
>  };
>  
>  void spapr_register_nested(void);
> +void spapr_register_nested_phyp(void);
>  void spapr_exit_nested(PowerPCCPU *cpu, int excp);
>  

Re: [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch introduces a new cmd line option cap-nested-papr to enable
> support for nested PAPR API by setting the nested.api version accordingly.
> It requires the user to launch the L0 Qemu in TCG mode and then L1 Linux
> can then launch the nested guest in KVM mode. Unlike cap-nested-hv,
> this is meant for nested guest on pseries (PowerVM) where L0 retains
> whole state of the nested guest. Both APIs are thus mutually exclusive.
> Support for related hcalls is being added in next set of patches.

Oh, this should be about the final patch too, when you have built
the code to actually support said capability.

Thanks,
Nick

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr.c |  2 ++
>  hw/ppc/spapr_caps.c| 48 ++
>  include/hw/ppc/spapr.h |  5 -
>  3 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0aa9f21516..cbab7a825f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2092,6 +2092,7 @@ static const VMStateDescription vmstate_spapr = {
>  _spapr_cap_fwnmi,
>  _spapr_fwnmi,
>  _spapr_cap_rpt_invalidate,
> +_spapr_cap_nested_papr,
>  NULL
>  }
>  };
> @@ -4685,6 +4686,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
>  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
>  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> +smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
>  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index a3a790b026..d3b9f107aa 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -491,6 +491,44 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
> *spapr,
>  }
>  }
>  
> +static void cap_nested_papr_apply(SpaprMachineState *spapr,
> +uint8_t val, Error **errp)
> +{
> +ERRP_GUARD();
> +PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +CPUPPCState *env = >env;
> +
> +if (!val) {
> +/* capability disabled by default */
> +return;
> +}
> +
> +if (tcg_enabled()) {
> +if (!(env->insns_flags2 & PPC2_ISA300)) {
> +error_setg(errp, "Nested-PAPR only supported on POWER9 and 
> later");
> +error_append_hint(errp,
> +  "Try appending -machine 
> cap-nested-papr=off\n");
> +return;
> +}
> +spapr->nested.api = NESTED_API_PAPR;
> +} else if (kvm_enabled()) {
> +/*
> + * this gets executed in L1 qemu when L2 is launched,
> + * needs kvm-hv support in L1 kernel.
> + */
> +if (!kvmppc_has_cap_nested_kvm_hv()) {
> +error_setg(errp,
> +   "KVM implementation does not support Nested-HV");
> +error_append_hint(errp,
> +  "Try appending -machine cap-nested-hv=off\n");
> +} else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
> +error_setg(errp, "Error enabling cap-nested-hv with KVM");
> +error_append_hint(errp,
> +  "Try appending -machine cap-nested-hv=off\n");
> +}
> +}
> +}
> +
>  static void cap_large_decr_apply(SpaprMachineState *spapr,
>   uint8_t val, Error **errp)
>  {
> @@ -736,6 +774,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  .type = "bool",
>  .apply = cap_nested_kvm_hv_apply,
>  },
> +[SPAPR_CAP_NESTED_PAPR] = {
> +.name = "nested-papr",
> +.description = "Allow Nested PAPR (Phyp)",
> +.index = SPAPR_CAP_NESTED_PAPR,
> +.get = spapr_cap_get_bool,
> +.set = spapr_cap_set_bool,
> +.type = "bool",
> +.apply = cap_nested_papr_apply,
> +},
>  [SPAPR_CAP_LARGE_DECREMENTER] = {
>  .name = "large-decr",
>  .description = "Allow Large Decrementer",
> @@ -920,6 +967,7 @@ SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
>  SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>  SPAPR_CAP_MIG_STATE(hpt_maxpagesize, SPAPR_CAP_HPT_MAXPAGESIZE);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> +SPAPR_CAP_MIG_STATE(nested_papr, SPAPR_CAP_NESTED_PAPR);
>  SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
>  SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index c8b42af430..8a6e9ce929 100644
> --- a/include/hw/ppc/spapr.h
> +++ 

Re: [PATCH RESEND 05/15] ppc: spapr: Introduce cap-nested-papr for nested PAPR API

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch introduces a new cmd line option cap-nested-papr to enable
> support for nested PAPR API by setting the nested.api version accordingly.
> It requires the user to launch the L0 Qemu in TCG mode and then L1 Linux
> can then launch the nested guest in KVM mode. Unlike cap-nested-hv,
> this is meant for nested guest on pseries (PowerVM) where L0 retains
> whole state of the nested guest. Both APIs are thus mutually exclusive.
> Support for related hcalls is being added in next set of patches.

This changelog could use some work too.

"Introduce a SPAPR capability cap-nested-papr with provides a nested
HV facility to the guest. This is similar to cap-nested-hv, but uses
a different (incompatible) API and so they are mutually exclusive."

You could add some documentation to say recent Linux pseries guests
support both, and explain more about KVM and PowerVM support there too,
if it is relevant.

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr.c |  2 ++
>  hw/ppc/spapr_caps.c| 48 ++
>  include/hw/ppc/spapr.h |  5 -
>  3 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0aa9f21516..cbab7a825f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2092,6 +2092,7 @@ static const VMStateDescription vmstate_spapr = {
>  _spapr_cap_fwnmi,
>  _spapr_fwnmi,
>  _spapr_cap_rpt_invalidate,
> +_spapr_cap_nested_papr,
>  NULL
>  }
>  };
> @@ -4685,6 +4686,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_WORKAROUND;
>  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
>  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> +smc->default_caps.caps[SPAPR_CAP_NESTED_PAPR] = SPAPR_CAP_OFF;
>  smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
>  smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
>  smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index a3a790b026..d3b9f107aa 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -491,6 +491,44 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
> *spapr,
>  }
>  }
>  
> +static void cap_nested_papr_apply(SpaprMachineState *spapr,
> +uint8_t val, Error **errp)
> +{
> +ERRP_GUARD();
> +PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +CPUPPCState *env = >env;
> +
> +if (!val) {
> +/* capability disabled by default */
> +return;
> +}
> +
> +if (tcg_enabled()) {
> +if (!(env->insns_flags2 & PPC2_ISA300)) {
> +error_setg(errp, "Nested-PAPR only supported on POWER9 and 
> later");
> +error_append_hint(errp,
> +  "Try appending -machine 
> cap-nested-papr=off\n");
> +return;
> +}
> +spapr->nested.api = NESTED_API_PAPR;

I'm not seeing any mutual exclusion with the other cap here. What if
you enable them both? Lucky dip?

It would actually be nice to enable both even if you just choose the
mode after the first hcall is made. I think you could actually support
both (even concurrently) quite easily.

For now this is probably okay if you fix mutex.


> +} else if (kvm_enabled()) {
> +/*
> + * this gets executed in L1 qemu when L2 is launched,
> + * needs kvm-hv support in L1 kernel.
> + */
> +if (!kvmppc_has_cap_nested_kvm_hv()) {
> +error_setg(errp,
> +   "KVM implementation does not support Nested-HV");
> +error_append_hint(errp,
> +  "Try appending -machine cap-nested-hv=off\n");
> +} else if (kvmppc_set_cap_nested_kvm_hv(val) < 0) {
> +error_setg(errp, "Error enabling cap-nested-hv with KVM");
> +error_append_hint(errp,
> +  "Try appending -machine cap-nested-hv=off\n");
> +}

This is just copy and pasted from the other cap, isn't it?

> +}
> +}
> +
>  static void cap_large_decr_apply(SpaprMachineState *spapr,
>   uint8_t val, Error **errp)
>  {
> @@ -736,6 +774,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  .type = "bool",
>  .apply = cap_nested_kvm_hv_apply,
>  },
> +[SPAPR_CAP_NESTED_PAPR] = {
> +.name = "nested-papr",
> +.description = "Allow Nested PAPR (Phyp)",
> +.index = SPAPR_CAP_NESTED_PAPR,
> +.get = spapr_cap_get_bool,
> +.set = spapr_cap_set_bool,
> +.type = "bool",
> +.apply = cap_nested_papr_apply,
> +},

Should scrub "Phyp". "Phyp" and PowerVM also doesn't mean anything for

Re: [PATCH RESEND 04/15] ppc: spapr: Start using nested.api for nested kvm-hv api

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> With this patch, isolating kvm-hv nested api code to be executed only
> when cap-nested-hv is set. This helps keeping api specific logic
> mutually exclusive.

Changelog needs a bit of improvement. Emphasis on "why" for changelogs.
If you take a changeset that makes a single logical change to the code,
you should be able to understand why that is done. You could make some
assumptions about the bigger series when it comes to details so don't
have to explain from first principles. But if it's easy to explain why
the high level, you could.

Why are we adding this fundamentally? So that the spapr nested code can
be extended to support a second API.

This patch should add the api field to the struct, and also the
NESTED_API_KVM_HV definition.

Thanks,
Nick

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr.c  | 7 ++-
>  hw/ppc/spapr_caps.c | 1 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e44686b04d..0aa9f21516 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1334,8 +1334,11 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
> PowerPCCPU *cpu,
>  /* Copy PATE1:GR into PATE0:HR */
>  entry->dw0 = spapr->patb_entry & PATE0_HR;
>  entry->dw1 = spapr->patb_entry;
> +return true;
> +}
> +assert(spapr->nested.api);
>  
> -} else {
> +if (spapr->nested.api == NESTED_API_KVM_HV) {
>  uint64_t patb, pats;
>  
>  assert(lpid != 0);
> @@ -3437,6 +3440,8 @@ static void spapr_instance_init(Object *obj)
>  spapr_get_host_serial, spapr_set_host_serial);
>  object_property_set_description(obj, "host-serial",
>  "Host serial number to advertise in guest device tree");
> +/* Nested */
> +spapr->nested.api = 0;
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 5a0755d34f..a3a790b026 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -454,6 +454,7 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
> *spapr,
>  return;
>  }
>  
> +spapr->nested.api = NESTED_API_KVM_HV;
>  if (kvm_enabled()) {
>  if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00, 0,
>spapr->max_compat_pvr)) {




Re: [PATCH RESEND 03/15] ppc: spapr: Use SpaprMachineStateNested's ptcr instead of nested_ptcr

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Use nested guest state specific struct for storing related info.

So this is the patch I would introduce the SpaprMachineStateNested
struct, with just the .ptrc member. Add other members to it as they
are used in later patches.

>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  hw/ppc/spapr.c | 4 ++--
>  hw/ppc/spapr_nested.c  | 4 ++--
>  include/hw/ppc/spapr.h | 3 ++-
>  3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 07e91e3800..e44686b04d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1340,8 +1340,8 @@ static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, 
> PowerPCCPU *cpu,
>  
>  assert(lpid != 0);
>  
> -patb = spapr->nested_ptcr & PTCR_PATB;
> -pats = spapr->nested_ptcr & PTCR_PATS;
> +patb = spapr->nested.ptcr & PTCR_PATB;
> +pats = spapr->nested.ptcr & PTCR_PATS;
>  
>  /* Check if partition table is properly aligned */
>  if (patb & MAKE_64BIT_MASK(0, pats + 12)) {

At this point I wonder if we should first move the nested part of
spapr_get_pate into nested code. It's a bit of a wart to have it
here when most of the other nested cases are abstracted from non
nested code quite well.

> diff --git a/hw/ppc/spapr_nested.c b/hw/ppc/spapr_nested.c
> index 121aa96ddc..a669470f1a 100644
> --- a/hw/ppc/spapr_nested.c
> +++ b/hw/ppc/spapr_nested.c
> @@ -25,7 +25,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
>  return H_PARAMETER;
>  }
>  
> -spapr->nested_ptcr = ptcr; /* Save new partition table */
> +spapr->nested.ptcr = ptcr; /* Save new partition table */
>  
>  return H_SUCCESS;
>  }
> @@ -157,7 +157,7 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>  struct kvmppc_pt_regs *regs;
>  hwaddr len;
>  
> -if (spapr->nested_ptcr == 0) {
> +if (spapr->nested.ptcr == 0) {
>  return H_NOT_AVAILABLE;
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 3990fed1d9..c8b42af430 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -12,6 +12,7 @@
>  #include "hw/ppc/spapr_xive.h"  /* For SpaprXive */
>  #include "hw/ppc/xics.h"/* For ICSState */
>  #include "hw/ppc/spapr_tpm_proxy.h"
> +#include "hw/ppc/spapr_nested.h" /* for SpaprMachineStateNested */
>  
>  struct SpaprVioBus;
>  struct SpaprPhbState;
> @@ -216,7 +217,7 @@ struct SpaprMachineState {
>  uint32_t vsmt;   /* Virtual SMT mode (KVM's "core stride") */
>  
>  /* Nested HV support (TCG only) */
> -uint64_t nested_ptcr;
> +struct SpaprMachineStateNested nested;

I think convention says to use the typedef for these?

Thanks,
Nick

>  
>  Notifier epow_notifier;
>  QTAILQ_HEAD(, SpaprEventLogEntry) pending_events;




Re: [PATCH v1 21/22] vfio/pci: Allow the selection of a given iommu backend

2023-09-06 Thread Jason Gunthorpe
On Wed, Sep 06, 2023 at 01:09:26PM -0600, Alex Williamson wrote:
> On Wed, 6 Sep 2023 15:10:39 -0300
> Jason Gunthorpe  wrote:
> 
> > On Wed, Aug 30, 2023 at 06:37:53PM +0800, Zhenzhong Duan wrote:
> > > Note the /dev/iommu device may have been pre-opened by a
> > > management tool such as libvirt. This mode is no more considered
> > > for the legacy backend. So let's remove the "TODO" comment.  
> > 
> > Can you show an example of that syntax too?
> 
> Unless you're just looking for something in the commit log, 

Yeah, I was thinking the commit log

> patch 16/ added the following to the qemu help output:
> 
> +#ifdef CONFIG_IOMMUFD
> +``-object iommufd,id=id[,fd=fd]``
> +Creates an iommufd backend which allows control of DMA mapping
> +through the /dev/iommu device.
> +
> +The ``id`` parameter is a unique ID which frontends (such as
> +vfio-pci of vdpa) will use to connect withe the iommufd backend.
> +
> +The ``fd`` parameter is an optional pre-opened file descriptor
> +resulting from /dev/iommu opening. Usually the iommufd is shared
> +accross all subsystems, bringing the benefit of centralized
> +reference counting.
> +#endif
>  
> > Also, the vfio device should be openable externally as well
> 
> Appears to be added in the very next patch in the series.  Thanks,

Indeed, I got confused because this removed the TODO - that could
reasonably be pushed to the next patch and include a bit more detail
in the commit message

Jason



Re: [PATCH RESEND 02/15] ppc: spapr: Add new/extend structs to support Nested PAPR API

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> This patch introduces new data structures to be used with Nested PAPR
> API. Also extends kvmppc_hv_guest_state with additional set of registers
> supported with nested PAPR API.
>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Shivaprasad G Bhat 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  include/hw/ppc/spapr_nested.h | 48 +++
>  1 file changed, 48 insertions(+)
>
> diff --git a/include/hw/ppc/spapr_nested.h b/include/hw/ppc/spapr_nested.h
> index 5cb668dd53..f8db31075b 100644
> --- a/include/hw/ppc/spapr_nested.h
> +++ b/include/hw/ppc/spapr_nested.h
> @@ -189,6 +189,39 @@
>  /* End of list of Guest State Buffer Element IDs */
>  #define GSB_LASTGSB_VCPU_SPR_ASDR
>  
> +typedef struct SpaprMachineStateNestedGuest {
> +unsigned long vcpus;
> +struct SpaprMachineStateNestedGuestVcpu *vcpu;
> +uint64_t parttbl[2];
> +uint32_t pvr_logical;
> +uint64_t tb_offset;
> +} SpaprMachineStateNestedGuest;
> +
> +struct SpaprMachineStateNested {
> +
> +uint8_t api;
> +#define NESTED_API_KVM_HV  1
> +#define NESTED_API_PAPR2
> +uint64_t ptcr;
> +uint32_t lpid_max;
> +uint32_t pvr_base;
> +bool capabilities_set;
> +GHashTable *guests;
> +};
> +
> +struct SpaprMachineStateNestedGuestVcpuRunBuf {
> +uint64_t addr;
> +uint64_t size;
> +};
> +
> +typedef struct SpaprMachineStateNestedGuestVcpu {
> +bool enabled;
> +struct SpaprMachineStateNestedGuestVcpuRunBuf runbufin;
> +struct SpaprMachineStateNestedGuestVcpuRunBuf runbufout;
> +CPUPPCState env;
> +int64_t tb_offset;
> +int64_t dec_expiry_tb;
> +} SpaprMachineStateNestedGuestVcpu;
>  
>  /*
>   * Register state for entering a nested guest with H_ENTER_NESTED.
> @@ -228,6 +261,21 @@ struct kvmppc_hv_guest_state {
>  uint64_t dawr1;
>  uint64_t dawrx1;
>  /* Version 2 ends here */
> +uint64_t dec;
> +uint64_t fscr;
> +uint64_t fpscr;
> +uint64_t bescr;
> +uint64_t ebbhr;
> +uint64_t ebbrr;
> +uint64_t tar;
> +uint64_t dexcr;
> +uint64_t hdexcr;
> +uint64_t hashkeyr;
> +uint64_t hashpkeyr;
> +uint64_t ctrl;
> +uint64_t vscr;
> +uint64_t vrsave;
> +ppc_vsr_t vsr[64];
>  };

Why? I can't see where it's used... This is API for the original HV
hcalls which is possibly now broken because the code uses sizeof()
when mapping it.

In general I'm not a fan of splitting patches by the type of code they
add. Definitions for external APIs okay. But for things like internal
structures I prefer added where they are introduced.

It's actually harder to review a patch if related / dependent changes
aren't in it, IMO. What should be split is unrelated or independent
changes and logical steps. Same goes for hcalls too actually. Take a
look at the series that introduced nested HV. 120f738a467 adds all the
hcalls, all the structures, etc. 

So I would also hink about squashing at least get/set capabilities
hcalls together, and guest create/delete, and probably vcpu create/run.

Thanks,
Nick



Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()

2023-09-06 Thread Dr. David Alan Gilbert
* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> Coroutine HMP commands currently run to completion in a nested event
> loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
> the BQL and cannot process work while the coroutine monitor command is
> running. A deadlock occurs when monitor commands attempt to wait for
> call_rcu work to finish.

I hate to think if there's anywhere else that ends up doing that
other than the monitors.

But, not knowing the semantics of the rcu code, it looks kind of OK to
me from the monitor.

(Do you ever get anything like qemu quitting from one of the other
monitors while this coroutine hasn't been run?)

Dave

> This patch refactors the HMP monitor to use the existing event loop
> instead of creating a nested event loop. This will allow the next
> patches to rely on draining call_rcu work.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  monitor/hmp.c | 28 +++-
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/monitor/hmp.c b/monitor/hmp.c
> index 69c1b7e98a..6cff2810aa 100644
> --- a/monitor/hmp.c
> +++ b/monitor/hmp.c
> @@ -,15 +,17 @@ typedef struct HandleHmpCommandCo {
>  Monitor *mon;
>  const HMPCommand *cmd;
>  QDict *qdict;
> -bool done;
>  } HandleHmpCommandCo;
>  
> -static void handle_hmp_command_co(void *opaque)
> +static void coroutine_fn handle_hmp_command_co(void *opaque)
>  {
>  HandleHmpCommandCo *data = opaque;
> +
>  handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
>  monitor_set_cur(qemu_coroutine_self(), NULL);
> -data->done = true;
> +qobject_unref(data->qdict);
> +monitor_resume(data->mon);
> +g_free(data);
>  }
>  
>  void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> @@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char 
> *cmdline)
>  Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), 
> >common);
>  handle_hmp_command_exec(>common, cmd, qdict);
>  monitor_set_cur(qemu_coroutine_self(), old_mon);
> +qobject_unref(qdict);
>  } else {
> -HandleHmpCommandCo data = {
> -.mon = >common,
> -.cmd = cmd,
> -.qdict = qdict,
> -.done = false,
> -};
> -Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, );
> +HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
> +
> +data = g_new(HandleHmpCommandCo, 1);
> +data->mon = >common;
> +data->cmd = cmd;
> +data->qdict = qdict; /* freed by handle_hmp_command_co() */
> +
> +Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
> +monitor_suspend(>common); /* resumed by handle_hmp_command_co() 
> */
>  monitor_set_cur(co, >common);
>  aio_co_enter(qemu_get_aio_context(), co);
> -AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
>  }
> -
> -qobject_unref(qdict);
>  }
>  
>  static void cmd_completion(MonitorHMP *mon, const char *name, const char 
> *list)
> -- 
> 2.41.0
> 
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



[PATCH v3 32/32] hw/riscv/shakti_c: Check CPU type in machine_run_board_init()

2023-09-06 Thread Gavin Shan
Set mc->valid_cpu_types so that the user specified CPU type can
be validated in machine_run_board_init(). We needn't to do it
by ourselves.

Signed-off-by: Gavin Shan 
---
 hw/riscv/shakti_c.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/riscv/shakti_c.c b/hw/riscv/shakti_c.c
index 12ea74b032..fc83ed4db4 100644
--- a/hw/riscv/shakti_c.c
+++ b/hw/riscv/shakti_c.c
@@ -28,6 +28,10 @@
 #include "exec/address-spaces.h"
 #include "hw/riscv/boot.h"
 
+static const char * const valid_cpu_types[] = {
+RISCV_CPU_TYPE_NAME("shakti-c"),
+NULL
+};
 
 static const struct MemmapEntry {
 hwaddr base;
@@ -47,12 +51,6 @@ static void shakti_c_machine_state_init(MachineState *mstate)
 ShaktiCMachineState *sms = RISCV_SHAKTI_MACHINE(mstate);
 MemoryRegion *system_memory = get_system_memory();
 
-/* Allow only Shakti C CPU for this platform */
-if (strcmp(mstate->cpu_type, TYPE_RISCV_CPU_SHAKTI_C) != 0) {
-error_report("This board can only be used with Shakti C CPU");
-exit(1);
-}
-
 /* Initialize SoC */
 object_initialize_child(OBJECT(mstate), "soc", >soc,
 TYPE_RISCV_SHAKTI_SOC);
@@ -85,6 +83,7 @@ static void shakti_c_machine_class_init(ObjectClass *klass, 
void *data)
 mc->desc = "RISC-V Board compatible with Shakti SDK";
 mc->init = shakti_c_machine_state_init;
 mc->default_cpu_type = TYPE_RISCV_CPU_SHAKTI_C;
+mc->valid_cpu_types = valid_cpu_types;
 mc->default_ram_id = "riscv.shakti.c.ram";
 }
 
-- 
2.41.0




[PATCH v3 31/32] hw/arm: Check CPU type in machine_run_board_init()

2023-09-06 Thread Gavin Shan
Set mc->valid_cpu_types so that the user specified CPU type can
be validated in machine_run_board_init(). We needn't to do it by
ourselves.

Signed-off-by: Gavin Shan 
---
 hw/arm/bananapi_m2u.c   | 12 ++--
 hw/arm/cubieboard.c | 12 ++--
 hw/arm/mps2-tz.c| 20 ++--
 hw/arm/mps2.c   | 25 +++--
 hw/arm/msf2-som.c   | 12 ++--
 hw/arm/musca.c  | 13 ++---
 hw/arm/npcm7xx_boards.c | 13 ++---
 hw/arm/orangepi.c   | 12 ++--
 8 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c
index 74121d8966..2d8551aa67 100644
--- a/hw/arm/bananapi_m2u.c
+++ b/hw/arm/bananapi_m2u.c
@@ -29,6 +29,11 @@
 
 static struct arm_boot_info bpim2u_binfo;
 
+static const char * const valid_cpu_types[] = {
+ARM_CPU_TYPE_NAME("cortex-a7"),
+NULL
+};
+
 /*
  * R40 can boot from mmc0 and mmc2, and bpim2u has two mmc interface, one is
  * connected to sdcard and another mount an emmc media.
@@ -70,12 +75,6 @@ static void bpim2u_init(MachineState *machine)
 exit(1);
 }
 
-/* Only allow Cortex-A7 for this board */
-if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a7")) != 0) {
-error_report("This board can only be used with cortex-a7 CPU");
-exit(1);
-}
-
 r40 = AW_R40(object_new(TYPE_AW_R40));
 object_property_add_child(OBJECT(machine), "soc", OBJECT(r40));
 object_unref(OBJECT(r40));
@@ -138,6 +137,7 @@ static void bpim2u_machine_init(MachineClass *mc)
 mc->max_cpus = AW_R40_NUM_CPUS;
 mc->default_cpus = AW_R40_NUM_CPUS;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a7");
+mc->valid_cpu_types = valid_cpu_types;
 mc->default_ram_size = 1 * GiB;
 mc->default_ram_id = "bpim2u.ram";
 }
diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index 8c7fa91529..f77fd5fe6c 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -28,6 +28,11 @@ static struct arm_boot_info cubieboard_binfo = {
 .board_id = 0x1008,
 };
 
+static const char * const valid_cpu_types[] = {
+ARM_CPU_TYPE_NAME("cortex-a8"),
+NULL
+};
+
 static void cubieboard_init(MachineState *machine)
 {
 AwA10State *a10;
@@ -51,12 +56,6 @@ static void cubieboard_init(MachineState *machine)
 exit(1);
 }
 
-/* Only allow Cortex-A8 for this board */
-if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a8")) != 0) {
-error_report("This board can only be used with cortex-a8 CPU");
-exit(1);
-}
-
 a10 = AW_A10(object_new(TYPE_AW_A10));
 object_property_add_child(OBJECT(machine), "soc", OBJECT(a10));
 object_unref(OBJECT(a10));
@@ -115,6 +114,7 @@ static void cubieboard_machine_init(MachineClass *mc)
 {
 mc->desc = "cubietech cubieboard (Cortex-A8)";
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a8");
+mc->valid_cpu_types = valid_cpu_types;
 mc->default_ram_size = 1 * GiB;
 mc->init = cubieboard_init;
 mc->block_default_type = IF_IDE;
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index eae3639da2..d7bb6d965f 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -190,6 +190,16 @@ OBJECT_DECLARE_TYPE(MPS2TZMachineState, 
MPS2TZMachineClass, MPS2TZ_MACHINE)
 /* For cpu{0,1}_mpu_{ns,s}, means "leave at SSE's default value" */
 #define MPU_REGION_DEFAULT UINT32_MAX
 
+static const char * const valid_cpu_types[] = {
+ARM_CPU_TYPE_NAME("cortex-m33"),
+NULL
+};
+
+static const char * const mps3tz_an547_valid_cpu_types[] = {
+ARM_CPU_TYPE_NAME("cortex-m55"),
+NULL
+};
+
 static const uint32_t an505_oscclk[] = {
 4000,
 2458,
@@ -809,12 +819,6 @@ static void mps2tz_common_init(MachineState *machine)
 int num_ppcs;
 int i;
 
-if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
-error_report("This board can only be used with CPU %s",
- mc->default_cpu_type);
-exit(1);
-}
-
 if (machine->ram_size != mc->default_ram_size) {
 char *sz = size_to_str(mc->default_ram_size);
 error_report("Invalid RAM size, should be %s", sz);
@@ -1321,6 +1325,7 @@ static void mps2tz_an505_class_init(ObjectClass *oc, void 
*data)
 mc->max_cpus = mc->default_cpus;
 mmc->fpga_type = FPGA_AN505;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
+mc->valid_cpu_types = valid_cpu_types;
 mmc->scc_id = 0x41045050;
 mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
 mmc->apb_periph_frq = mmc->sysclk_frq;
@@ -1350,6 +1355,7 @@ static void mps2tz_an521_class_init(ObjectClass *oc, void 
*data)
 mc->max_cpus = mc->default_cpus;
 mmc->fpga_type = FPGA_AN521;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
+mc->valid_cpu_types = valid_cpu_types;
 mmc->scc_id = 0x41045210;
 mmc->sysclk_frq = 20 * 1000 * 1000; /* 20MHz */
 mmc->apb_periph_frq = mmc->sysclk_frq;
@@ -1379,6 +1385,7 @@ static void 

[PATCH v3 29/32] hw/arm/virt: Hide host CPU model for tcg

2023-09-06 Thread Gavin Shan
The 'host' CPU model isn't available until KVM or HVF is enabled.
For example, the following error messages are seen when the guest
is started with option '-cpu cortex-a8' on tcg.

  qemu-system-aarch64: Invalid CPU type: cortex-a8
  The valid types are: cortex-a7, cortex-a15, cortex-a35, cortex-a55,
   cortex-a72, cortex-a76, a64fx, neoverse-n1,
   neoverse-v1, cortex-a53, cortex-a57, (null),
   max

Hide 'host' CPU model until KVM or HVF is enabled.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 762780e677..bd0ad15028 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -217,7 +217,9 @@ static const char * const valid_cpu_types[] = {
 #endif
 ARM_CPU_TYPE_NAME("cortex-a53"),
 ARM_CPU_TYPE_NAME("cortex-a57"),
+#if defined(CONFIG_KVM) || defined(CONFIG_HVF)
 ARM_CPU_TYPE_NAME("host"),
+#endif
 ARM_CPU_TYPE_NAME("max"),
 NULL
 };
-- 
2.41.0




[PATCH v3 27/32] machine: Print CPU model name instead of CPU type name

2023-09-06 Thread Gavin Shan
The names of supported CPU models instead of CPU types should be
printed when the user specified CPU type isn't supported, to be
consistent with the output from '-cpu ?'.

Correct the error messages to print CPU model names instead of CPU
type names.

Signed-off-by: Gavin Shan 
---
 hw/core/machine.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 93a327927f..6b701526ae 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1357,6 +1357,7 @@ static void is_cpu_type_supported(MachineState *machine, 
Error **errp)
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 ObjectClass *oc = object_class_by_name(machine->cpu_type);
 CPUClass *cc;
+char *model;
 int i;
 
 /*
@@ -1373,11 +1374,18 @@ static void is_cpu_type_supported(MachineState 
*machine, Error **errp)
 
 /* The user specified CPU type isn't valid */
 if (!mc->valid_cpu_types[i]) {
-error_setg(errp, "Invalid CPU type: %s", machine->cpu_type);
-error_append_hint(errp, "The valid types are: %s",
-  mc->valid_cpu_types[0]);
+model = cpu_model_from_type(machine->cpu_type);
+error_setg(errp, "Invalid CPU type: %s", model);
+g_free(model);
+
+model = cpu_model_from_type(mc->valid_cpu_types[0]);
+error_append_hint(errp, "The valid types are: %s", model);
+g_free(model);
+
 for (i = 1; mc->valid_cpu_types[i]; i++) {
-error_append_hint(errp, ", %s", mc->valid_cpu_types[i]);
+model = cpu_model_from_type(mc->valid_cpu_types[i]);
+error_append_hint(errp, ", %s", model);
+g_free(model);
 }
 
 error_append_hint(errp, "\n");
-- 
2.41.0




[PATCH v3 30/32] hw/arm/sbsa-ref: Check CPU type in machine_run_board_init()

2023-09-06 Thread Gavin Shan
Set mc->valid_cpu_types so that the user specified CPU type can
be validated in machine_run_board_init(). We needn't to do it
by ourselves.

Signed-off-by: Gavin Shan 
---
 hw/arm/sbsa-ref.c | 21 +++--
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index bc89eb4806..f24be53ea2 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -149,26 +149,15 @@ static const int sbsa_ref_irqmap[] = {
 [SBSA_GWDT_WS0] = 16,
 };
 
-static const char * const valid_cpus[] = {
+static const char * const valid_cpu_types[] = {
 ARM_CPU_TYPE_NAME("cortex-a57"),
 ARM_CPU_TYPE_NAME("cortex-a72"),
 ARM_CPU_TYPE_NAME("neoverse-n1"),
 ARM_CPU_TYPE_NAME("neoverse-v1"),
 ARM_CPU_TYPE_NAME("max"),
+NULL,
 };
 
-static bool cpu_type_valid(const char *cpu)
-{
-int i;
-
-for (i = 0; i < ARRAY_SIZE(valid_cpus); i++) {
-if (strcmp(cpu, valid_cpus[i]) == 0) {
-return true;
-}
-}
-return false;
-}
-
 static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
 {
 uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
@@ -730,11 +719,6 @@ static void sbsa_ref_init(MachineState *machine)
 const CPUArchIdList *possible_cpus;
 int n, sbsa_max_cpus;
 
-if (!cpu_type_valid(machine->cpu_type)) {
-error_report("sbsa-ref: CPU type %s not supported", machine->cpu_type);
-exit(1);
-}
-
 if (kvm_enabled()) {
 error_report("sbsa-ref: KVM is not supported for this machine");
 exit(1);
@@ -899,6 +883,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
 mc->init = sbsa_ref_init;
 mc->desc = "QEMU 'SBSA Reference' ARM Virtual Machine";
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("neoverse-n1");
+mc->valid_cpu_types = valid_cpu_types;
 mc->max_cpus = 512;
 mc->pci_allow_0_address = true;
 mc->minimum_page_bits = 12;
-- 
2.41.0




[PATCH v3 28/32] hw/arm/virt: Check CPU type in machine_run_board_init()

2023-09-06 Thread Gavin Shan
Set mc->valid_cpu_types so that the user specified CPU type can be
validated in machine_run_board_init(). We needn't to do the check
by ourselves.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 21 +++--
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a13c658bbf..762780e677 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -203,7 +203,7 @@ static const int a15irqmap[] = {
 [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
 };
 
-static const char *valid_cpus[] = {
+static const char * const valid_cpu_types[] = {
 #ifdef CONFIG_TCG
 ARM_CPU_TYPE_NAME("cortex-a7"),
 ARM_CPU_TYPE_NAME("cortex-a15"),
@@ -219,20 +219,9 @@ static const char *valid_cpus[] = {
 ARM_CPU_TYPE_NAME("cortex-a57"),
 ARM_CPU_TYPE_NAME("host"),
 ARM_CPU_TYPE_NAME("max"),
+NULL
 };
 
-static bool cpu_type_valid(const char *cpu)
-{
-int i;
-
-for (i = 0; i < ARRAY_SIZE(valid_cpus); i++) {
-if (strcmp(cpu, valid_cpus[i]) == 0) {
-return true;
-}
-}
-return false;
-}
-
 static void create_randomness(MachineState *ms, const char *node)
 {
 struct {
@@ -2030,11 +2019,6 @@ static void machvirt_init(MachineState *machine)
 unsigned int smp_cpus = machine->smp.cpus;
 unsigned int max_cpus = machine->smp.max_cpus;
 
-if (!cpu_type_valid(machine->cpu_type)) {
-error_report("mach-virt: CPU type %s not supported", 
machine->cpu_type);
-exit(1);
-}
-
 possible_cpus = mc->possible_cpu_arch_ids(machine);
 
 /*
@@ -2953,6 +2937,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 #else
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("max");
 #endif
+mc->valid_cpu_types = valid_cpu_types;
 mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
 mc->kvm_type = virt_kvm_type;
 assert(!mc->get_hotplug_handler);
-- 
2.41.0




[PATCH v3 25/32] machine: Use error handling when CPU type is checked

2023-09-06 Thread Gavin Shan
QEMU will be terminated if the specified CPU type isn't supported
in machine_run_board_init(). The list of supported CPU type names
is tracked by mc->valid_cpu_types.

The error handling can be used to propagate error messages, to be
consistent how the errors are handled for other situations in the
same function.

No functional change intended.

Suggested-by: Igor Mammedov 
Signed-off-by: Gavin Shan 
---
 hw/core/machine.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index da699cf4e1..6d3f8e133f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1358,6 +1358,7 @@ void machine_run_board_init(MachineState *machine, const 
char *mem_path, Error *
 MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 ObjectClass *oc = object_class_by_name(machine->cpu_type);
 CPUClass *cc;
+Error *local_err = NULL;
 
 /* This checkpoint is required by replay to separate prior clock
reading from the other reads, because timer polling functions query
@@ -1426,15 +1427,16 @@ void machine_run_board_init(MachineState *machine, 
const char *mem_path, Error *
 
 if (!machine_class->valid_cpu_types[i]) {
 /* The user specified CPU is not valid */
-error_report("Invalid CPU type: %s", machine->cpu_type);
-error_printf("The valid types are: %s",
- machine_class->valid_cpu_types[0]);
+error_setg(_err, "Invalid CPU type: %s", machine->cpu_type);
+error_append_hint(_err, "The valid types are: %s",
+  machine_class->valid_cpu_types[0]);
 for (i = 1; machine_class->valid_cpu_types[i]; i++) {
-error_printf(", %s", machine_class->valid_cpu_types[i]);
+error_append_hint(_err, ", %s",
+  machine_class->valid_cpu_types[i]);
 }
-error_printf("\n");
+error_append_hint(_err, "\n");
 
-exit(1);
+error_propagate(errp, local_err);
 }
 }
 
-- 
2.41.0




[PATCH v3 26/32] machine: Introduce helper is_cpu_type_supported()

2023-09-06 Thread Gavin Shan
The logic, to check if the specified CPU type is supported in
machine_run_board_init(), is independent enough. Factor it out
into helper is_cpu_type_supported(). With this, machine_run_board_init()
looks a bit clean. Since we're here, @machine_class is renamed to
@mc to avoid multiple line spanning of code. The comments are tweaked
a bit either.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/core/machine.c | 82 +--
 1 file changed, 44 insertions(+), 38 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 6d3f8e133f..93a327927f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1352,12 +1352,50 @@ out:
 return r;
 }
 
+static void is_cpu_type_supported(MachineState *machine, Error **errp)
+{
+MachineClass *mc = MACHINE_GET_CLASS(machine);
+ObjectClass *oc = object_class_by_name(machine->cpu_type);
+CPUClass *cc;
+int i;
+
+/*
+ * Check if the user specified CPU type is supported when the valid
+ * CPU types have been determined. Note that the user specified CPU
+ * type is provided through '-cpu' option.
+ */
+if (mc->valid_cpu_types && machine->cpu_type) {
+for (i = 0; mc->valid_cpu_types[i]; i++) {
+if (object_class_dynamic_cast(oc, mc->valid_cpu_types[i])) {
+break;
+}
+}
+
+/* The user specified CPU type isn't valid */
+if (!mc->valid_cpu_types[i]) {
+error_setg(errp, "Invalid CPU type: %s", machine->cpu_type);
+error_append_hint(errp, "The valid types are: %s",
+  mc->valid_cpu_types[0]);
+for (i = 1; mc->valid_cpu_types[i]; i++) {
+error_append_hint(errp, ", %s", mc->valid_cpu_types[i]);
+}
+
+error_append_hint(errp, "\n");
+return;
+}
+}
+
+/* Check if CPU type is deprecated and warn if so */
+cc = CPU_CLASS(oc);
+if (cc && cc->deprecation_note) {
+warn_report("CPU model %s is deprecated -- %s",
+machine->cpu_type, cc->deprecation_note);
+}
+}
 
 void machine_run_board_init(MachineState *machine, const char *mem_path, Error 
**errp)
 {
 MachineClass *machine_class = MACHINE_GET_CLASS(machine);
-ObjectClass *oc = object_class_by_name(machine->cpu_type);
-CPUClass *cc;
 Error *local_err = NULL;
 
 /* This checkpoint is required by replay to separate prior clock
@@ -1409,42 +1447,10 @@ void machine_run_board_init(MachineState *machine, 
const char *mem_path, Error *
 machine->ram = machine_consume_memdev(machine, machine->memdev);
 }
 
-/* If the machine supports the valid_cpu_types check and the user
- * specified a CPU with -cpu check here that the user CPU is supported.
- */
-if (machine_class->valid_cpu_types && machine->cpu_type) {
-int i;
-
-for (i = 0; machine_class->valid_cpu_types[i]; i++) {
-if (object_class_dynamic_cast(oc,
-  machine_class->valid_cpu_types[i])) {
-/* The user specificed CPU is in the valid field, we are
- * good to go.
- */
-break;
-}
-}
-
-if (!machine_class->valid_cpu_types[i]) {
-/* The user specified CPU is not valid */
-error_setg(_err, "Invalid CPU type: %s", machine->cpu_type);
-error_append_hint(_err, "The valid types are: %s",
-  machine_class->valid_cpu_types[0]);
-for (i = 1; machine_class->valid_cpu_types[i]; i++) {
-error_append_hint(_err, ", %s",
-  machine_class->valid_cpu_types[i]);
-}
-error_append_hint(_err, "\n");
-
-error_propagate(errp, local_err);
-}
-}
-
-/* Check if CPU type is deprecated and warn if so */
-cc = CPU_CLASS(oc);
-if (cc && cc->deprecation_note) {
-warn_report("CPU model %s is deprecated -- %s", machine->cpu_type,
-cc->deprecation_note);
+/* Check if the CPU type is supported */
+is_cpu_type_supported(machine, _err);
+if (local_err) {
+error_propagate(errp, local_err);
 }
 
 if (machine->cgs) {
-- 
2.41.0




[PATCH v3 24/32] machine: Constify MachineClass::valid_cpu_types[i]

2023-09-06 Thread Gavin Shan
Constify MachineClass::valid_cpu_types[i], as suggested by Richard
Henderson.

Suggested-by: Richard Henderson 
Signed-off-by: Gavin Shan 
---
 hw/m68k/q800.c  | 2 +-
 include/hw/boards.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index b770b71d54..45f2b58b26 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -596,7 +596,7 @@ static GlobalProperty hw_compat_q800[] = {
 };
 static const size_t hw_compat_q800_len = G_N_ELEMENTS(hw_compat_q800);
 
-static const char *q800_machine_valid_cpu_types[] = {
+static const char * const q800_machine_valid_cpu_types[] = {
 M68K_CPU_TYPE_NAME("m68040"),
 NULL
 };
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 3b541ffd24..49c328d928 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -268,7 +268,7 @@ struct MachineClass {
 bool has_hotpluggable_cpus;
 bool ignore_memory_transaction_failures;
 int numa_mem_align_shift;
-const char **valid_cpu_types;
+const char * const *valid_cpu_types;
 strList *allowed_dynamic_sysbus_devices;
 bool auto_enable_numa_with_memhp;
 bool auto_enable_numa_with_memdev;
-- 
2.41.0




[PATCH v3 23/32] Mark cpu_list() supported on all targets

2023-09-06 Thread Gavin Shan
Remove the false conditions and comments since cpu_list() has been
supported on all targets.

Signed-off-by: Gavin Shan 
---
 bsd-user/main.c | 3 ---
 cpu.c   | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index f913cb55a7..3a2d84f14b 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -378,10 +378,7 @@ int main(int argc, char **argv)
 } else if (!strcmp(r, "cpu")) {
 cpu_model = argv[optind++];
 if (is_help_option(cpu_model)) {
-/* XXX: implement xxx_cpu_list for targets that still miss it 
*/
-#if defined(cpu_list)
 cpu_list();
-#endif
 exit(1);
 }
 } else if (!strcmp(r, "B")) {
diff --git a/cpu.c b/cpu.c
index a19e33ff96..01bff086f8 100644
--- a/cpu.c
+++ b/cpu.c
@@ -302,10 +302,7 @@ char *cpu_model_from_type(const char *typename)
 
 void list_cpus(void)
 {
-/* XXX: implement xxx_cpu_list for targets that still miss it */
-#if defined(cpu_list)
 cpu_list();
-#endif
 }
 
 #if defined(CONFIG_USER_ONLY)
-- 
2.41.0




[PATCH v3 22/32] target/nios2: Implement nios2_cpu_list()

2023-09-06 Thread Gavin Shan
Implement nios2_cpu_list() to support cpu_list(). With this applied,
the available CPU model names, same to the CPU type names, are shown
as below.

  $ ./build/qemu-system-nios2 -cpu ?
  Available CPUs:
nios2-cpu

Signed-off-by: Gavin Shan 
---
 target/nios2/cpu.c | 20 
 target/nios2/cpu.h |  3 +++
 2 files changed, 23 insertions(+)

diff --git a/target/nios2/cpu.c b/target/nios2/cpu.c
index bc5cbf81c2..80af24eb69 100644
--- a/target/nios2/cpu.c
+++ b/target/nios2/cpu.c
@@ -21,6 +21,7 @@
 #include "qemu/osdep.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
+#include "qemu/qemu-print.h"
 #include "cpu.h"
 #include "exec/log.h"
 #include "gdbstub/helpers.h"
@@ -111,6 +112,25 @@ static void iic_set_irq(void *opaque, int irq, int level)
 }
 #endif
 
+static void nios2_cpu_list_entry(gpointer data, gpointer user_data)
+{
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
+}
+
+void nios2_cpu_list(void)
+{
+GSList *list;
+
+list = object_class_get_list_sorted(TYPE_NIOS2_CPU, false);
+qemu_printf("Available CPUs:\n");
+g_slist_foreach(list, nios2_cpu_list_entry, NULL);
+g_slist_free(list);
+}
+
 static void nios2_cpu_initfn(Object *obj)
 {
 Nios2CPU *cpu = NIOS2_CPU(obj);
diff --git a/target/nios2/cpu.h b/target/nios2/cpu.h
index 477a3161fd..6d21b7e8f4 100644
--- a/target/nios2/cpu.h
+++ b/target/nios2/cpu.h
@@ -292,6 +292,9 @@ bool nios2_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 MMUAccessType access_type, int mmu_idx,
 bool probe, uintptr_t retaddr);
 #endif
+void nios2_cpu_list(void);
+
+#define cpu_list nios2_cpu_list
 
 typedef CPUNios2State CPUArchState;
 typedef Nios2CPU ArchCPU;
-- 
2.41.0




[PATCH v3 21/32] target/microblaze: Implement microblaze_cpu_list()

2023-09-06 Thread Gavin Shan
Implement microblaze_cpu_list() to support cpu_list(). With this applied,
the available CPU model names, same to the CPU type names, are shown
as below.

  $ ./build/qemu-system-hppa -cpu ?
  Available CPUs:
microblaze-cpu

Signed-off-by: Gavin Shan 
---
 target/microblaze/cpu.c | 20 
 target/microblaze/cpu.h |  3 +++
 2 files changed, 23 insertions(+)

diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 03c2c4db1f..fc7a5dee5b 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -24,6 +24,7 @@
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "qapi/error.h"
+#include "qemu/qemu-print.h"
 #include "cpu.h"
 #include "qemu/module.h"
 #include "hw/qdev-properties.h"
@@ -291,6 +292,25 @@ static void mb_cpu_realizefn(DeviceState *dev, Error 
**errp)
 mcc->parent_realize(dev, errp);
 }
 
+static void microblaze_cpu_list_entry(gpointer data, gpointer user_data)
+{
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
+}
+
+void microblaze_cpu_list(void)
+{
+GSList *list;
+
+list = object_class_get_list_sorted(TYPE_MICROBLAZE_CPU, false);
+qemu_printf("Available CPUs:\n");
+g_slist_foreach(list, microblaze_cpu_list_entry, NULL);
+g_slist_free(list);
+}
+
 static void mb_cpu_initfn(Object *obj)
 {
 MicroBlazeCPU *cpu = MICROBLAZE_CPU(obj);
diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
index f6cab6ce19..b5775c2966 100644
--- a/target/microblaze/cpu.h
+++ b/target/microblaze/cpu.h
@@ -372,6 +372,9 @@ int mb_cpu_gdb_read_register(CPUState *cpu, GByteArray 
*buf, int reg);
 int mb_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 int mb_cpu_gdb_read_stack_protect(CPUArchState *cpu, GByteArray *buf, int reg);
 int mb_cpu_gdb_write_stack_protect(CPUArchState *cpu, uint8_t *buf, int reg);
+void microblaze_cpu_list(void);
+
+#define cpu_list microblaze_cpu_list
 
 static inline uint32_t mb_cpu_read_msr(const CPUMBState *env)
 {
-- 
2.41.0




[PATCH v3 20/32] target/hppa: Implement hppa_cpu_list()

2023-09-06 Thread Gavin Shan
Implement hppa_cpu_list() to support cpu_list(). With this applied,
the available CPU model names, same to the CPU type names, are shown
as below.

  $ ./build/qemu-system-hppa -cpu ?
  Available CPUs:
hppa-cpu

Signed-off-by: Gavin Shan 
---
 target/hppa/cpu.c | 19 +++
 target/hppa/cpu.h |  3 +++
 2 files changed, 22 insertions(+)

diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 11022f9c99..873402bf9c 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -143,6 +143,25 @@ static void hppa_cpu_realizefn(DeviceState *dev, Error 
**errp)
 #endif
 }
 
+static void hppa_cpu_list_entry(gpointer data, gpointer user_data)
+{
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
+}
+
+void hppa_cpu_list(void)
+{
+GSList *list;
+
+list = object_class_get_list_sorted(TYPE_HPPA_CPU, false);
+qemu_printf("Available CPUs:\n");
+g_slist_foreach(list, hppa_cpu_list_entry, NULL);
+g_slist_free(list);
+}
+
 static void hppa_cpu_initfn(Object *obj)
 {
 CPUState *cs = CPU(obj);
diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h
index fa13694dab..19759f5f62 100644
--- a/target/hppa/cpu.h
+++ b/target/hppa/cpu.h
@@ -351,5 +351,8 @@ void hppa_cpu_alarm_timer(void *);
 int hppa_artype_for_page(CPUHPPAState *env, target_ulong vaddr);
 #endif
 G_NORETURN void hppa_dynamic_excp(CPUHPPAState *env, int excp, uintptr_t ra);
+void hppa_cpu_list(void);
+
+#define cpu_list hppa_cpu_list
 
 #endif /* HPPA_CPU_H */
-- 
2.41.0




[PATCH v3 19/32] target/xtensa: Improve xtensa_cpu_class_by_name()

2023-09-06 Thread Gavin Shan
Improve xtensa_cpu_class_by_name() by merging the condition of
'@oc == NULL' to object_class_dynamic_cast().

Signed-off-by: Gavin Shan 
---
 target/xtensa/cpu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
index acaf8c905f..9d682611aa 100644
--- a/target/xtensa/cpu.c
+++ b/target/xtensa/cpu.c
@@ -141,11 +141,12 @@ static ObjectClass *xtensa_cpu_class_by_name(const char 
*cpu_model)
 typename = g_strdup_printf(XTENSA_CPU_TYPE_NAME("%s"), cpu_model);
 oc = object_class_by_name(typename);
 g_free(typename);
-if (oc == NULL || !object_class_dynamic_cast(oc, TYPE_XTENSA_CPU) ||
-object_class_is_abstract(oc)) {
-return NULL;
+if (object_class_dynamic_cast(oc, TYPE_XTENSA_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
 }
-return oc;
+
+return NULL;
 }
 
 static void xtensa_cpu_disas_set_info(CPUState *cs, disassemble_info *info)
-- 
2.41.0




[PATCH v3 17/32] target/tricore: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/tricore, the CPU type name is always the combination of the
CPU model name and suffix. The CPU model names have been correctly
shown in tricore_cpu_list_entry().

Use generic helper cpu_model_from_type() to show the CPU model names
in the above function. tricore_cpu_class_by_name() is also improved
by merging the condition of '@oc == NULL' to object_class_dynamic_cast().

Signed-off-by: Gavin Shan 
---
 target/tricore/cpu.c|  9 +
 target/tricore/helper.c | 13 +
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/target/tricore/cpu.c b/target/tricore/cpu.c
index 133a9ac70e..066249e50d 100644
--- a/target/tricore/cpu.c
+++ b/target/tricore/cpu.c
@@ -140,11 +140,12 @@ static ObjectClass *tricore_cpu_class_by_name(const char 
*cpu_model)
 typename = g_strdup_printf(TRICORE_CPU_TYPE_NAME("%s"), cpu_model);
 oc = object_class_by_name(typename);
 g_free(typename);
-if (!oc || !object_class_dynamic_cast(oc, TYPE_TRICORE_CPU) ||
-object_class_is_abstract(oc)) {
-return NULL;
+if (object_class_dynamic_cast(oc, TYPE_TRICORE_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
 }
-return oc;
+
+return NULL;
 }
 
 static void tc1796_initfn(Object *obj)
diff --git a/target/tricore/helper.c b/target/tricore/helper.c
index 6d076ac36f..21f4e1f1a3 100644
--- a/target/tricore/helper.c
+++ b/target/tricore/helper.c
@@ -98,14 +98,11 @@ bool tricore_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 
 static void tricore_cpu_list_entry(gpointer data, gpointer user_data)
 {
-ObjectClass *oc = data;
-const char *typename;
-char *name;
-
-typename = object_class_get_name(oc);
-name = g_strndup(typename, strlen(typename) - strlen("-" 
TYPE_TRICORE_CPU));
-qemu_printf("  %s\n", name);
-g_free(name);
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void tricore_cpu_list(void)
-- 
2.41.0




[PATCH v3 18/32] target/sparc: Improve sparc_cpu_class_by_name()

2023-09-06 Thread Gavin Shan
Improve sparc_cpu_class_by_name() by validating @oc, to ensure it's
child of TYPE_SPARC_CPU since it's possible for other types of classes
to have TYPE_SPARC_CPU as the suffix of their type names.

Signed-off-by: Gavin Shan 
---
 target/sparc/cpu.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index 130ab8f578..20417707da 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -745,7 +745,12 @@ static ObjectClass *sparc_cpu_class_by_name(const char 
*cpu_model)
 typename = sparc_cpu_type_name(cpu_model);
 oc = object_class_by_name(typename);
 g_free(typename);
-return oc;
+if (object_class_dynamic_cast(oc, TYPE_SPARC_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
+}
+
+return NULL;
 }
 
 static void sparc_cpu_realizefn(DeviceState *dev, Error **errp)
-- 
2.41.0




[PATCH v3 16/32] target/sh4: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/sh4, the CPU type name can be: (1) the combination of the
CPU model name and suffix; (2) TYPE_SH7750R_CPU when the CPU model
name is "any". The CPU model names have been correctly shown in
superh_cpu_list_entry().

Use generic helper cpu_model_from_type() to show the CPU model name
in the above function. Besides, superh_cpu_class_by_name() is improved
by avoiding "goto out" and validating the CPU class.

Signed-off-by: Gavin Shan 
---
 target/sh4/cpu.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index 61769ffdfa..ca06e2ce99 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -125,9 +125,10 @@ static void superh_cpu_disas_set_info(CPUState *cpu, 
disassemble_info *info)
 static void superh_cpu_list_entry(gpointer data, gpointer user_data)
 {
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
-int len = strlen(typename) - strlen(SUPERH_CPU_TYPE_SUFFIX);
+char *model = cpu_model_from_type(typename);
 
-qemu_printf("%.*s\n", len, typename);
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void sh4_cpu_list(void)
@@ -135,6 +136,7 @@ void sh4_cpu_list(void)
 GSList *list;
 
 list = object_class_get_list_sorted(TYPE_SUPERH_CPU, false);
+qemu_printf("Available CPUs:\n");
 g_slist_foreach(list, superh_cpu_list_entry, NULL);
 g_slist_free(list);
 }
@@ -146,20 +148,20 @@ static ObjectClass *superh_cpu_class_by_name(const char 
*cpu_model)
 
 s = g_ascii_strdown(cpu_model, -1);
 if (strcmp(s, "any") == 0) {
-oc = object_class_by_name(TYPE_SH7750R_CPU);
-goto out;
+typename = g_strdup(TYPE_SH7750R_CPU);
+} else {
+typename = g_strdup_printf(SUPERH_CPU_TYPE_NAME("%s"), s);
 }
 
-typename = g_strdup_printf(SUPERH_CPU_TYPE_NAME("%s"), s);
 oc = object_class_by_name(typename);
-if (oc != NULL && object_class_is_abstract(oc)) {
-oc = NULL;
-}
-
-out:
 g_free(s);
 g_free(typename);
-return oc;
+if (object_class_dynamic_cast(oc, TYPE_SUPERH_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
+}
+
+return NULL;
 }
 
 static void sh7750r_cpu_initfn(Object *obj)
-- 
2.41.0




[PATCH v3 15/32] target/s390x: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/s390x, the CPU type name is always the combination of the
CPU modle name and suffix. The CPU model names have been correctly
shown in s390_print_cpu_model_list_entry() and create_cpu_model_list().

Use generic helper cpu_model_from_type() to show the CPU model names
in the above two functions. Besides, we need validate the CPU class
in s390_cpu_class_by_name(), as other targets do.

Signed-off-by: Gavin Shan 
---
 target/s390x/cpu_models.c| 18 +++---
 target/s390x/cpu_models_sysemu.c |  9 -
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 91ce896491..103e9072b8 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -338,7 +338,8 @@ static void s390_print_cpu_model_list_entry(gpointer data, 
gpointer user_data)
 {
 const S390CPUClass *scc = S390_CPU_CLASS((ObjectClass *)data);
 CPUClass *cc = CPU_CLASS(scc);
-char *name = g_strdup(object_class_get_name((ObjectClass *)data));
+const char *typename = object_class_get_name((ObjectClass *)data);
+char *model = cpu_model_from_type(typename);
 g_autoptr(GString) details = g_string_new("");
 
 if (scc->is_static) {
@@ -355,14 +356,12 @@ static void s390_print_cpu_model_list_entry(gpointer 
data, gpointer user_data)
 g_string_truncate(details, details->len - 2);
 }
 
-/* strip off the -s390x-cpu */
-g_strrstr(name, "-" TYPE_S390_CPU)[0] = 0;
 if (details->len) {
-qemu_printf("s390 %-15s %-35s (%s)\n", name, scc->desc, details->str);
+qemu_printf("s390 %-15s %-35s (%s)\n", model, scc->desc, details->str);
 } else {
-qemu_printf("s390 %-15s %-35s\n", name, scc->desc);
+qemu_printf("s390 %-15s %-35s\n", model, scc->desc);
 }
-g_free(name);
+g_free(model);
 }
 
 static gint s390_cpu_list_compare(gconstpointer a, gconstpointer b)
@@ -916,7 +915,12 @@ ObjectClass *s390_cpu_class_by_name(const char *name)
 
 oc = object_class_by_name(typename);
 g_free(typename);
-return oc;
+if (object_class_dynamic_cast(oc, TYPE_S390_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
+}
+
+return NULL;
 }
 
 static const TypeInfo qemu_s390_cpu_type_info = {
diff --git a/target/s390x/cpu_models_sysemu.c b/target/s390x/cpu_models_sysemu.c
index 63981bf36b..c41af253d3 100644
--- a/target/s390x/cpu_models_sysemu.c
+++ b/target/s390x/cpu_models_sysemu.c
@@ -55,17 +55,16 @@ static void create_cpu_model_list(ObjectClass *klass, void 
*opaque)
 struct CpuDefinitionInfoListData *cpu_list_data = opaque;
 CpuDefinitionInfoList **cpu_list = _list_data->list;
 CpuDefinitionInfo *info;
-char *name = g_strdup(object_class_get_name(klass));
+const char *typename = object_class_get_name(klass);
+char *model = cpu_model_from_type(typename);
 S390CPUClass *scc = S390_CPU_CLASS(klass);
 
-/* strip off the -s390x-cpu */
-g_strrstr(name, "-" TYPE_S390_CPU)[0] = 0;
 info = g_new0(CpuDefinitionInfo, 1);
-info->name = name;
+info->name = model;
 info->has_migration_safe = true;
 info->migration_safe = scc->is_migration_safe;
 info->q_static = scc->is_static;
-info->q_typename = g_strdup(object_class_get_name(klass));
+info->q_typename = g_strdup(typename);
 /* check for unavailable features */
 if (cpu_list_data->model) {
 Object *obj;
-- 
2.41.0




[PATCH v3 14/32] target/rx: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/rx, the CPU type name can be: (1) the combination of the
CPU model name and suffix; (2) same to the CPU model name. The CPU
type names have been shown in rx_cpu_list_entry().

Use generic helper cpu_model_from_type() to show the CPU model names
in rx_cpu_list_entry(). Besides, rx_cpu_class_by_name() is improved
by merging the condition of '@oc == NULL' to object_class_dynamic_cast().

Signed-off-by: Gavin Shan 
---
 target/rx/cpu.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index 157e57da0f..ff0ced1f3d 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -91,9 +91,11 @@ static void rx_cpu_reset_hold(Object *obj)
 
 static void rx_cpu_list_entry(gpointer data, gpointer user_data)
 {
-ObjectClass *oc = data;
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
 
-qemu_printf("  %s\n", object_class_get_name(oc));
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void rx_cpu_list(void)
@@ -111,18 +113,20 @@ static ObjectClass *rx_cpu_class_by_name(const char 
*cpu_model)
 char *typename;
 
 oc = object_class_by_name(cpu_model);
-if (oc != NULL && object_class_dynamic_cast(oc, TYPE_RX_CPU) != NULL &&
+if (object_class_dynamic_cast(oc, TYPE_RX_CPU) &&
 !object_class_is_abstract(oc)) {
 return oc;
 }
+
 typename = g_strdup_printf(RX_CPU_TYPE_NAME("%s"), cpu_model);
 oc = object_class_by_name(typename);
 g_free(typename);
-if (oc != NULL && object_class_is_abstract(oc)) {
-oc = NULL;
+if (object_class_dynamic_cast(oc, TYPE_RX_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
 }
 
-return oc;
+return NULL;
 }
 
 static void rx_cpu_realize(DeviceState *dev, Error **errp)
-- 
2.41.0




[PATCH v3 13/32] target/riscv: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/riscv, the CPU type name is always the combination of the
CPU model name and suffix. The CPU model names have been correctly
shown in riscv_cpu_list_entry() and riscv_cpu_add_definition()

Use generic helper cpu_mdoel_from_type() to show the CPU model names
in the above two functions, and adjusted format of the output from
riscv_cpu_list_entry() to match with other targets. Besides, the
function riscv_cpu_class_by_name() is improved by renaming @cpuname
to @model since it's for the CPU model name, and merging the condtion
of "@oc == NULL" to object_class_dynamic_cast().

Signed-off-by: Gavin Shan 
---
 target/riscv/cpu.c| 23 +--
 target/riscv/riscv-qmp-cmds.c |  3 +--
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6b93b04453..a525e24c5a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -612,18 +612,19 @@ static ObjectClass *riscv_cpu_class_by_name(const char 
*cpu_model)
 {
 ObjectClass *oc;
 char *typename;
-char **cpuname;
+char **model;
 
-cpuname = g_strsplit(cpu_model, ",", 1);
-typename = g_strdup_printf(RISCV_CPU_TYPE_NAME("%s"), cpuname[0]);
+model = g_strsplit(cpu_model, ",", 1);
+typename = g_strdup_printf(RISCV_CPU_TYPE_NAME("%s"), model[0]);
 oc = object_class_by_name(typename);
-g_strfreev(cpuname);
+g_strfreev(model);
 g_free(typename);
-if (!oc || !object_class_dynamic_cast(oc, TYPE_RISCV_CPU) ||
-object_class_is_abstract(oc)) {
-return NULL;
+if (object_class_dynamic_cast(oc, TYPE_RISCV_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
 }
-return oc;
+
+return NULL;
 }
 
 static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int flags)
@@ -2211,9 +2212,10 @@ static gint riscv_cpu_list_compare(gconstpointer a, 
gconstpointer b)
 static void riscv_cpu_list_entry(gpointer data, gpointer user_data)
 {
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
-int len = strlen(typename) - strlen(RISCV_CPU_TYPE_SUFFIX);
+char *model = cpu_model_from_type(typename);
 
-qemu_printf("%.*s\n", len, typename);
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void riscv_cpu_list(void)
@@ -,6 +2224,7 @@ void riscv_cpu_list(void)
 
 list = object_class_get_list(TYPE_RISCV_CPU, false);
 list = g_slist_sort(list, riscv_cpu_list_compare);
+qemu_printf("Available CPUs:\n");
 g_slist_foreach(list, riscv_cpu_list_entry, NULL);
 g_slist_free(list);
 }
diff --git a/target/riscv/riscv-qmp-cmds.c b/target/riscv/riscv-qmp-cmds.c
index 5ecff1afb3..22f728673f 100644
--- a/target/riscv/riscv-qmp-cmds.c
+++ b/target/riscv/riscv-qmp-cmds.c
@@ -35,8 +35,7 @@ static void riscv_cpu_add_definition(gpointer data, gpointer 
user_data)
 const char *typename = object_class_get_name(oc);
 ObjectClass *dyn_class;
 
-info->name = g_strndup(typename,
-   strlen(typename) - strlen("-" TYPE_RISCV_CPU));
+info->name = cpu_model_from_type(typename);
 info->q_typename = g_strdup(typename);
 
 dyn_class = object_class_dynamic_cast(oc, TYPE_RISCV_DYNAMIC_CPU);
-- 
2.41.0




[PATCH v3 12/32] target/ppc: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/ppc, the CPU type name can be: (1) The combination of
the CPU model name and suffix; (2) the type name of the class whose
PVR matches with the specified one; (3) alias of the CPU model, plus
suffix; (4) MachineClass::default_cpu_type when the CPU model name
is "max". All the possible information, the CPU model name, aliases
of the CPU models and PVRs are all shown in ppc_cpu_list_entry().

Use generic helper cpu_model_from_type() in ppc_cpu_list_entry(),
and rename @name to @model since it points to the CPU model name
instead of the CPU type name.

Signed-off-by: Gavin Shan 
---
 target/ppc/cpu_init.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 02b7aad9b0..7281402331 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7019,16 +7019,15 @@ static void ppc_cpu_list_entry(gpointer data, gpointer 
user_data)
 PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
 DeviceClass *family = DEVICE_CLASS(ppc_cpu_get_family_class(pcc));
 const char *typename = object_class_get_name(oc);
-char *name;
+char *model;
 int i;
 
 if (unlikely(strcmp(typename, TYPE_HOST_POWERPC_CPU) == 0)) {
 return;
 }
 
-name = g_strndup(typename,
- strlen(typename) - strlen(POWERPC_CPU_TYPE_SUFFIX));
-qemu_printf("PowerPC %-16s PVR %08x\n", name, pcc->pvr);
+model = cpu_model_from_type(typename);
+qemu_printf("PowerPC %-16s PVR %08x\n", model, pcc->pvr);
 for (i = 0; ppc_cpu_aliases[i].alias != NULL; i++) {
 PowerPCCPUAlias *alias = _cpu_aliases[i];
 ObjectClass *alias_oc = ppc_cpu_class_by_name(alias->model);
@@ -7045,10 +7044,10 @@ static void ppc_cpu_list_entry(gpointer data, gpointer 
user_data)
 alias->alias, family->desc);
 } else {
 qemu_printf("PowerPC %-16s (alias for %s)\n",
-alias->alias, name);
+alias->alias, model);
 }
 }
-g_free(name);
+g_free(model);
 }
 
 void ppc_cpu_list(void)
-- 
2.41.0




[PATCH v3 11/32] target/openrisc: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/openrisc, the CPU type name is always the combination of
the CPU model name and suffix. The CPU model names have been correctly
shown in openrisc_cpu_list_entry().

Use generic helper cpu_model_from_type() to show the CPU model names
in openrisc_cpu_list_entry(), and @name is renamed to @model since it
points to the CPU model name instead of the CPU type name. Besides,
openrisc_cpu_class_by_name() is simplified since the condtion of
'@oc == NULL' has been covered by object_class_dynamic_cast().

Signed-off-by: Gavin Shan 
---
 target/openrisc/cpu.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index 61d748cfdc..2284c0187b 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -168,11 +168,12 @@ static ObjectClass *openrisc_cpu_class_by_name(const char 
*cpu_model)
 typename = g_strdup_printf(OPENRISC_CPU_TYPE_NAME("%s"), cpu_model);
 oc = object_class_by_name(typename);
 g_free(typename);
-if (oc != NULL && (!object_class_dynamic_cast(oc, TYPE_OPENRISC_CPU) ||
-   object_class_is_abstract(oc))) {
-return NULL;
+if (object_class_dynamic_cast(oc, TYPE_OPENRISC_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
 }
-return oc;
+
+return NULL;
 }
 
 static void or1200_initfn(Object *obj)
@@ -280,15 +281,11 @@ static gint openrisc_cpu_list_compare(gconstpointer a, 
gconstpointer b)
 
 static void openrisc_cpu_list_entry(gpointer data, gpointer user_data)
 {
-ObjectClass *oc = data;
-const char *typename;
-char *name;
-
-typename = object_class_get_name(oc);
-name = g_strndup(typename,
- strlen(typename) - strlen("-" TYPE_OPENRISC_CPU));
-qemu_printf("  %s\n", name);
-g_free(name);
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void cpu_openrisc_list(void)
-- 
2.41.0




[PATCH v3 10/32] target/mips: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/mips, the CPU type name is always the combination of the
CPU model name and suffix. The CPU model names have been shown
correctly in mips_cpu_list(), which fetches the CPU model names from
the pre-defined array. It's different from other targets and lack
of flexibility.

Implement mips_cpu_list() by fetching the CPU model names from the
available CPU classes. Besides, the retrieved class needs to be
validated before it's returned in mips_cpu_class_by_name(), as other
targets do.

Signed-off-by: Gavin Shan 
---
 target/mips/cpu-defs.c.inc |  9 -
 target/mips/cpu.c  | 25 -
 target/mips/sysemu/mips-qmp-cmds.c |  3 +--
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/target/mips/cpu-defs.c.inc b/target/mips/cpu-defs.c.inc
index c0c389c59a..fbf787d8ce 100644
--- a/target/mips/cpu-defs.c.inc
+++ b/target/mips/cpu-defs.c.inc
@@ -1018,15 +1018,6 @@ const mips_def_t mips_defs[] =
 };
 const int mips_defs_number = ARRAY_SIZE(mips_defs);
 
-void mips_cpu_list(void)
-{
-int i;
-
-for (i = 0; i < ARRAY_SIZE(mips_defs); i++) {
-qemu_printf("MIPS '%s'\n", mips_defs[i].name);
-}
-}
-
 static void fpu_init (CPUMIPSState *env, const mips_def_t *def)
 {
 int i;
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index 63da1948fd..3431acbd99 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -532,7 +532,12 @@ static ObjectClass *mips_cpu_class_by_name(const char 
*cpu_model)
 typename = mips_cpu_type_name(cpu_model);
 oc = object_class_by_name(typename);
 g_free(typename);
-return oc;
+if (object_class_dynamic_cast(oc, TYPE_MIPS_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
+}
+
+return NULL;
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -566,6 +571,24 @@ static const struct TCGCPUOps mips_tcg_ops = {
 };
 #endif /* CONFIG_TCG */
 
+static void mips_cpu_list_entry(gpointer data, gpointer user_data)
+{
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
+}
+
+void mips_cpu_list(void)
+{
+GSList *list;
+list = object_class_get_list_sorted(TYPE_MIPS_CPU, false);
+qemu_printf("Available CPUs:\n");
+g_slist_foreach(list, mips_cpu_list_entry, NULL);
+g_slist_free(list);
+}
+
 static void mips_cpu_class_init(ObjectClass *c, void *data)
 {
 MIPSCPUClass *mcc = MIPS_CPU_CLASS(c);
diff --git a/target/mips/sysemu/mips-qmp-cmds.c 
b/target/mips/sysemu/mips-qmp-cmds.c
index 6db4626412..7340ac70ba 100644
--- a/target/mips/sysemu/mips-qmp-cmds.c
+++ b/target/mips/sysemu/mips-qmp-cmds.c
@@ -19,8 +19,7 @@ static void mips_cpu_add_definition(gpointer data, gpointer 
user_data)
 
 typename = object_class_get_name(oc);
 info = g_malloc0(sizeof(*info));
-info->name = g_strndup(typename,
-   strlen(typename) - strlen("-" TYPE_MIPS_CPU));
+info->name = cpu_model_from_type(typename);
 info->q_typename = g_strdup(typename);
 
 QAPI_LIST_PREPEND(*cpu_list, info);
-- 
2.41.0




[PATCH v3 09/32] target/m68k: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/m68k, the CPU type name is always the combination of the
CPU model name and suffix. The CPU model names have been shown
correctly in m68k_cpu_list_entry().

Use generic helper cpu_model_from_type() to show the CPU model name
in m68k_cpu_list_entry(), rename @name to @model since it's for the
CPU model name instead of the CPU type name, and adjusted output
format to match with other targets.

Signed-off-by: Gavin Shan 
---
 target/m68k/helper.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/target/m68k/helper.c b/target/m68k/helper.c
index 0a1544cd68..47f2cee69a 100644
--- a/target/m68k/helper.c
+++ b/target/m68k/helper.c
@@ -49,14 +49,11 @@ static gint m68k_cpu_list_compare(gconstpointer a, 
gconstpointer b)
 
 static void m68k_cpu_list_entry(gpointer data, gpointer user_data)
 {
-ObjectClass *c = data;
-const char *typename;
-char *name;
-
-typename = object_class_get_name(c);
-name = g_strndup(typename, strlen(typename) - strlen("-" TYPE_M68K_CPU));
-qemu_printf("%s\n", name);
-g_free(name);
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void m68k_cpu_list(void)
@@ -65,6 +62,7 @@ void m68k_cpu_list(void)
 
 list = object_class_get_list(TYPE_M68K_CPU, false);
 list = g_slist_sort(list, m68k_cpu_list_compare);
+qemu_printf("Available CPUs:\n");
 g_slist_foreach(list, m68k_cpu_list_entry, NULL);
 g_slist_free(list);
 }
-- 
2.41.0




[PATCH v3 08/32] target/loongarch: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/loongarch, the CPU type name can be: (1) the combination of
the CPU model name and suffix; (2) same to the CPU model name. The CPU
model names have been shown correctly in loongarch_cpu_list_entry()
and loongarch_cpu_add_definition() by following (1).

Use generic helper cpu_model_from_type() in above two functions to
show the CPU model names. The format of the output from cpu_list()
is also adjusted to match with other targets.

Signed-off-by: Gavin Shan 
---
 target/loongarch/cpu.c| 5 -
 target/loongarch/loongarch-qmp-cmds.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 65f9320e34..3ab8e4f792 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -473,14 +473,17 @@ static void loongarch_la132_initfn(Object *obj)
 static void loongarch_cpu_list_entry(gpointer data, gpointer user_data)
 {
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
 
-qemu_printf("%s\n", typename);
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void loongarch_cpu_list(void)
 {
 GSList *list;
 list = object_class_get_list_sorted(TYPE_LOONGARCH_CPU, false);
+qemu_printf("Available CPUs:\n");
 g_slist_foreach(list, loongarch_cpu_list_entry, NULL);
 g_slist_free(list);
 }
diff --git a/target/loongarch/loongarch-qmp-cmds.c 
b/target/loongarch/loongarch-qmp-cmds.c
index 6c25957881..815ceaf0ea 100644
--- a/target/loongarch/loongarch-qmp-cmds.c
+++ b/target/loongarch/loongarch-qmp-cmds.c
@@ -17,8 +17,7 @@ static void loongarch_cpu_add_definition(gpointer data, 
gpointer user_data)
 CpuDefinitionInfo *info = g_new0(CpuDefinitionInfo, 1);
 const char *typename = object_class_get_name(oc);
 
-info->name = g_strndup(typename,
-   strlen(typename) - strlen("-" TYPE_LOONGARCH_CPU));
+info->name = cpu_model_from_type(typename);
 info->q_typename = g_strdup(typename);
 
 QAPI_LIST_PREPEND(*cpu_list, info);
-- 
2.41.0




[PATCH v3 07/32] target/i386: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/i386, the CPU type name is always the combination of the
CPU model name and suffix. The CPU model names have been shown
correctly in x86_cpu_list_entry().

Use generic helper cpu_model_from_type() to get the CPU model name
from the CPU type name in x86_cpu_class_get_model_name(), and rename
@name to @model in x86_cpu_list_entry() since it points to the CPU
model name instead of the CPU type name.

Signed-off-by: Gavin Shan 
---
 target/i386/cpu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 00f913b638..31f1d0379e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1741,8 +1741,7 @@ static char *x86_cpu_class_get_model_name(X86CPUClass *cc)
 {
 const char *class_name = object_class_get_name(OBJECT_CLASS(cc));
 assert(g_str_has_suffix(class_name, X86_CPU_TYPE_SUFFIX));
-return g_strndup(class_name,
- strlen(class_name) - strlen(X86_CPU_TYPE_SUFFIX));
+return cpu_model_from_type(class_name);
 }
 
 typedef struct X86CPUVersionDefinition {
@@ -5544,7 +5543,7 @@ static void x86_cpu_list_entry(gpointer data, gpointer 
user_data)
 {
 ObjectClass *oc = data;
 X86CPUClass *cc = X86_CPU_CLASS(oc);
-g_autofree char *name = x86_cpu_class_get_model_name(cc);
+g_autofree char *model = x86_cpu_class_get_model_name(cc);
 g_autofree char *desc = g_strdup(cc->model_description);
 g_autofree char *alias_of = x86_cpu_class_get_alias_of(cc);
 g_autofree char *model_id = x86_cpu_class_get_model_id(cc);
@@ -5568,7 +5567,7 @@ static void x86_cpu_list_entry(gpointer data, gpointer 
user_data)
 desc = g_strdup_printf("%s (deprecated)", olddesc);
 }
 
-qemu_printf("x86 %-20s  %s\n", name, desc);
+qemu_printf("x86 %-20s  %s\n", model, desc);
 }
 
 /* list available CPU models and flags */
-- 
2.41.0




[PATCH v3 06/32] target/hexagon: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/hexagon, the CPU type name is always the combination of
the CPU model name and suffix. The CPU model names have been shown
correctly in hexagon_cpu_list_entry().

Use generic helper cpu_model_from_type() to show the CPU model names
in hexagon_cpu_list_entry(), and rename @name to @model since it points
to the CPU model name instead of the CPU type name.

Signed-off-by: Gavin Shan 
---
 target/hexagon/cpu.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index f155936289..3d0174e6f1 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -34,13 +34,11 @@ static void hexagon_v73_cpu_init(Object *obj) { }
 
 static void hexagon_cpu_list_entry(gpointer data, gpointer user_data)
 {
-ObjectClass *oc = data;
-char *name = g_strdup(object_class_get_name(oc));
-if (g_str_has_suffix(name, HEXAGON_CPU_TYPE_SUFFIX)) {
-name[strlen(name) - strlen(HEXAGON_CPU_TYPE_SUFFIX)] = '\0';
-}
-qemu_printf("  %s\n", name);
-g_free(name);
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
+
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void hexagon_cpu_list(void)
-- 
2.41.0




[PATCH v3 02/32] target/alpha: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/alpha, the CPU type name can be: (1) the combination of
the CPU model name and suffix; (2) same to the CPU model name; (3)
pre-defined aliases. It's correct to show the CPU model name for (1)
or the CPU type name for (2) in cpu_list() because both of them can
be resolved by alpha_cpu_class_by_name() successfully.

Lets follow (1) to show the CPU model names, with the suffix stripped.
With this, the output is compabitle with all of most cases.

Signed-off-by: Gavin Shan 
---
 target/alpha/cpu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index 270ae787b1..9e15c3245b 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -89,9 +89,11 @@ static void alpha_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 static void alpha_cpu_list_entry(gpointer data, gpointer user_data)
 {
-ObjectClass *oc = data;
+const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
 
-qemu_printf("  %s\n", object_class_get_name(oc));
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void alpha_cpu_list(void)
-- 
2.41.0




[PATCH v3 03/32] target/arm: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/arm, the CPU type name can be: (1) the combination of
the CPU model name and suffix; (2) alias "any" corresponding to
"max-arm-cpu" when CONFIG_USER_ONLY is enabled. The CPU model names
have been already shown in cpu_list() and query_cpu_definitions()
by following (1).

Use generic helper cpu_model_from_type() to show the CPU model names.
The variable @name is renamed to @model in arm_cpu_list_entry() since
it points to the CPU model name instead of the CPU type name.

Signed-off-by: Gavin Shan 
---
 target/arm/arm-qmp-cmds.c |  6 ++
 target/arm/helper.c   | 12 +---
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/target/arm/arm-qmp-cmds.c b/target/arm/arm-qmp-cmds.c
index c8fa524002..51fddaefc3 100644
--- a/target/arm/arm-qmp-cmds.c
+++ b/target/arm/arm-qmp-cmds.c
@@ -233,12 +233,10 @@ static void arm_cpu_add_definition(gpointer data, 
gpointer user_data)
 ObjectClass *oc = data;
 CpuDefinitionInfoList **cpu_list = user_data;
 CpuDefinitionInfo *info;
-const char *typename;
+const char *typename = object_class_get_name(oc);
 
-typename = object_class_get_name(oc);
 info = g_malloc0(sizeof(*info));
-info->name = g_strndup(typename,
-   strlen(typename) - strlen("-" TYPE_ARM_CPU));
+info->name = cpu_model_from_type(typename);
 info->q_typename = g_strdup(typename);
 
 QAPI_LIST_PREPEND(*cpu_list, info);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index e3f5a7d2bd..7b8257b496 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9409,17 +9409,15 @@ static void arm_cpu_list_entry(gpointer data, gpointer 
user_data)
 {
 ObjectClass *oc = data;
 CPUClass *cc = CPU_CLASS(oc);
-const char *typename;
-char *name;
+const char *typename = object_class_get_name(oc);
+char *model = cpu_model_from_type(typename);
 
-typename = object_class_get_name(oc);
-name = g_strndup(typename, strlen(typename) - strlen("-" TYPE_ARM_CPU));
 if (cc->deprecation_note) {
-qemu_printf("  %s (deprecated)\n", name);
+qemu_printf("  %s (deprecated)\n", model);
 } else {
-qemu_printf("  %s\n", name);
+qemu_printf("  %s\n", model);
 }
-g_free(name);
+g_free(model);
 }
 
 void arm_cpu_list(void)
-- 
2.41.0




[PATCH v3 04/32] target/avr: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/avr, the CPU model name is resolved as same to the CPU
type name in avr_cpu_class_by_name(). Actually, the CPU model name
is the combination of the CPU model name and suffix.

Support the resolution from the combination of CPU model name and
suffix to the CPU type name in avr_cpu_class_by_name(), and use
the generic helper cpu_model_from_type() to show CPU model names
in cpu_list(), with adjusted format to match with other targets.

Signed-off-by: Gavin Shan 
---
 target/avr/cpu.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/target/avr/cpu.c b/target/avr/cpu.c
index 8f741f258c..cef9f84e32 100644
--- a/target/avr/cpu.c
+++ b/target/avr/cpu.c
@@ -157,13 +157,23 @@ static void avr_cpu_initfn(Object *obj)
 static ObjectClass *avr_cpu_class_by_name(const char *cpu_model)
 {
 ObjectClass *oc;
+char *typename;
 
 oc = object_class_by_name(cpu_model);
-if (object_class_dynamic_cast(oc, TYPE_AVR_CPU) == NULL ||
-object_class_is_abstract(oc)) {
-oc = NULL;
+if (object_class_dynamic_cast(oc, TYPE_AVR_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
 }
-return oc;
+
+typename = g_strdup_printf(AVR_CPU_TYPE_NAME("%s"), cpu_model);
+oc = object_class_by_name(typename);
+g_free(typename);
+if (object_class_dynamic_cast(oc, TYPE_AVR_CPU) &&
+!object_class_is_abstract(oc)) {
+return oc;
+}
+
+return NULL;
 }
 
 static void avr_cpu_dump_state(CPUState *cs, FILE *f, int flags)
@@ -366,14 +376,17 @@ typedef struct AVRCPUInfo {
 static void avr_cpu_list_entry(gpointer data, gpointer user_data)
 {
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
+char *model = cpu_model_from_type(typename);
 
-qemu_printf("%s\n", typename);
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void avr_cpu_list(void)
 {
 GSList *list;
 list = object_class_get_list_sorted(TYPE_AVR_CPU, false);
+qemu_printf("Available CPUs:\n");
 g_slist_foreach(list, avr_cpu_list_entry, NULL);
 g_slist_free(list);
 }
-- 
2.41.0




[PATCH v3 05/32] target/cris: Use generic helper to show CPU model names

2023-09-06 Thread Gavin Shan
For target/cris, the CPU type name can be: (1) the combination of
the CPU model name and suffix; (2) alias "any" corresponding to
"crisv32-cris-cpu". The CPU model names have been shown correctly
by following (1).

Use generic helper cpu_model_from_type() to show the CPU model
names in arm_cpu_list_entry(), and rename @name to @model since
it's points to the CPU model name instead of the CPU type name.

Signed-off-by: Gavin Shan 
---
 target/cris/cpu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/cris/cpu.c b/target/cris/cpu.c
index a6a93c2359..420a2f75fb 100644
--- a/target/cris/cpu.c
+++ b/target/cris/cpu.c
@@ -122,11 +122,10 @@ static void cris_cpu_list_entry(gpointer data, gpointer 
user_data)
 {
 ObjectClass *oc = data;
 const char *typename = object_class_get_name(oc);
-char *name;
+char *model = cpu_model_from_type(typename);
 
-name = g_strndup(typename, strlen(typename) - 
strlen(CRIS_CPU_TYPE_SUFFIX));
-qemu_printf("  %s\n", name);
-g_free(name);
+qemu_printf("  %s\n", model);
+g_free(model);
 }
 
 void cris_cpu_list(void)
-- 
2.41.0




[PATCH v3 00/32] Unified CPU type check

2023-09-06 Thread Gavin Shan
There are two places where the user specified CPU type is checked to see
if it's supported or allowed by the board: machine_run_board_init() and
mc->init(). We don't have to maintain two duplicate sets of logic. This
series intends to move the check to machine_run_board_init().

PATCH[01]Adds a generic helper cpu_model_from_type() to convert CPU
 type name to CPU model name.
PATCH[02-19] Uses cpu_model_from_type() in the individual targets
PATCH[20-23] Implements cpu_list() for the missed targets
PATCH[24-27] Improves the CPU type validation in machine_run_board_init()
PATCH[28-32] Validate the CPU type in machine_run_board_init() for the
 individual boards

v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-07/msg00302.html
v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-07/msg00528.html

Testing
===

With the following command lines, the output messages are varied before
and after the series is applied.

  ./build/qemu-system-aarch64\
  -accel tcg -machine virt,gic-version=3 \
  -cpu cortex-a8 -smp maxcpus=2,cpus=1   \
:

Before the series is applied:

  qemu-system-aarch64: mach-virt: CPU type cortex-a8-arm-cpu not supported

After the series is applied:

  qemu-system-aarch64: Invalid CPU type: cortex-a8-arm-cpu
  The valid models are: cortex-a7, cortex-a15, cortex-a35, cortex-a55,
cortex-a72, cortex-a76, a64fx, neoverse-n1,
neoverse-v1, cortex-a53, cortex-a57, max

Changelog
=
v3:
  * Generic helper cpu_model_from_type()(Igor)
  * Apply cpu_model_from_type() to the individual targets   (Igor)
  * Implement cpu_list() for the missed targets (Gavin)
  * Remove mc->valid_cpu_models (Richard)
  * Separate patch to constify mc->validate_cpu_types   (Gavin)
v2:
  * Constify mc->valid_cpu_types(Richard)
  * Print the supported CPU models, instead of typenames(Peter)
  * Misc improvements for the hleper to do the check(Igor)
  * More patches to move the check  (Marcin)

Gavin Shan (32):
  cpu: Add helper cpu_model_from_type()
  target/alpha: Use generic helper to show CPU model names
  target/arm: Use generic helper to show CPU model names
  target/avr: Use generic helper to show CPU model names
  target/cris: Use generic helper to show CPU model names
  target/hexagon: Use generic helper to show CPU model names
  target/i386: Use generic helper to show CPU model names
  target/loongarch: Use generic helper to show CPU model names
  target/m68k: Use generic helper to show CPU model names
  target/mips: Use generic helper to show CPU model names
  target/openrisc: Use generic helper to show CPU model names
  target/ppc: Use generic helper to show CPU model names
  target/riscv: Use generic helper to show CPU model names
  target/rx: Use generic helper to show CPU model names
  target/s390x: Use generic helper to show CPU model names
  target/sh4: Use generic helper to show CPU model names
  target/tricore: Use generic helper to show CPU model names
  target/sparc: Improve sparc_cpu_class_by_name()
  target/xtensa: Improve xtensa_cpu_class_by_name()
  target/hppa: Implement hppa_cpu_list()
  target/microblaze: Implement microblaze_cpu_list()
  target/nios2: Implement nios2_cpu_list()
  Mark cpu_list() supported on all targets
  machine: Constify MachineClass::valid_cpu_types[i]
  machine: Use error handling when CPU type is checked
  machine: Introduce helper is_cpu_type_supported()
  machine: Print CPU model name instead of CPU type name
  hw/arm/virt: Check CPU type in machine_run_board_init()
  hw/arm/virt: Hide host CPU model for tcg
  hw/arm/sbsa-ref: Check CPU type in machine_run_board_init()
  hw/arm: Check CPU type in machine_run_board_init()
  hw/riscv/shakti_c: Check CPU type in machine_run_board_init()

 bsd-user/main.c   |  3 -
 cpu.c | 19 +-
 hw/arm/bananapi_m2u.c | 12 ++--
 hw/arm/cubieboard.c   | 12 ++--
 hw/arm/mps2-tz.c  | 20 --
 hw/arm/mps2.c | 25 ++--
 hw/arm/msf2-som.c | 12 ++--
 hw/arm/musca.c| 13 ++--
 hw/arm/npcm7xx_boards.c   | 13 ++--
 hw/arm/orangepi.c | 12 ++--
 hw/arm/sbsa-ref.c | 21 +--
 hw/arm/virt.c | 23 ++-
 hw/core/machine.c | 90 ---
 hw/m68k/q800.c|  2 +-
 hw/riscv/shakti_c.c   | 11 ++--
 include/hw/boards.h   |  2 +-
 include/hw/core/cpu.h | 12 
 target/alpha/cpu.c|  6 +-
 target/arm/arm-qmp-cmds.c |  6 +-
 target/arm/helper.c   | 12 ++--
 target/avr/cpu.c  

[PATCH v3 01/32] cpu: Add helper cpu_model_from_type()

2023-09-06 Thread Gavin Shan
Add helper cpu_model_from_type() to extract the CPU model name from
the CPU type name in two circumstances: (1) The CPU type name is the
combination of the CPU model name and suffix. (2) The CPU type name
is same to the CPU model name.

The helper will be used in the subsequent patches to conver the
CPU type name to the CPU model name.

Suggested-by: Igor Mammedov 
Signed-off-by: Gavin Shan 
---
 cpu.c | 16 
 include/hw/core/cpu.h | 12 
 2 files changed, 28 insertions(+)

diff --git a/cpu.c b/cpu.c
index 1c948d1161..a19e33ff96 100644
--- a/cpu.c
+++ b/cpu.c
@@ -284,6 +284,22 @@ const char *parse_cpu_option(const char *cpu_option)
 return cpu_type;
 }
 
+char *cpu_model_from_type(const char *typename)
+{
+const char *suffix = "-" CPU_RESOLVING_TYPE;
+
+if (!object_class_by_name(typename)) {
+return NULL;
+}
+
+if (strlen(typename) > strlen(suffix) &&
+!strcmp(typename + strlen(typename) - strlen(suffix), suffix)) {
+return g_strndup(typename, strlen(typename) - strlen(suffix));
+}
+
+return g_strdup(typename);
+}
+
 void list_cpus(void)
 {
 /* XXX: implement xxx_cpu_list for targets that still miss it */
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 92a4234439..6e76d95490 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -657,6 +657,18 @@ CPUState *cpu_create(const char *typename);
  */
 const char *parse_cpu_option(const char *cpu_option);
 
+/**
+ * cpu_model_from_type:
+ * @typename: The CPU type name
+ *
+ * Extract the CPU model name from the CPU type name. The
+ * CPU type name is either the combination of the CPU model
+ * name and suffix, or same to the CPU model name.
+ *
+ * Returns: CPU model name
+ */
+char *cpu_model_from_type(const char *typename);
+
 /**
  * cpu_has_work:
  * @cpu: The vCPU to check.
-- 
2.41.0




[PATCH v3 5/6] cxl/mailbox, type3: Implement MHD get info command callback

2023-09-06 Thread Gregory Price
For multi-headed type 3 devices, this command reports logical device
mappings for each head.  Implement a callback which can be
initialized by MHD devices to field these commands.

Reports "unsupported" if the command is called but the callback is
not implemented.

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c  | 21 +
 hw/mem/cxl_type3.c  |  1 +
 include/hw/cxl/cxl_device.h |  6 ++
 3 files changed, 28 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index b64bbdf45d..3177a59de3 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -91,6 +91,8 @@ enum {
 #define GET_PHYSICAL_PORT_STATE 0x1
 TUNNEL = 0x53,
 #define MANAGEMENT_COMMAND 0x0
+MHD = 0x55,
+#define GET_MHD_INFO 0x0
 };
 
 /* CCI Message Format CXL r3.0 Figure 7-19 */
@@ -184,6 +186,23 @@ static CXLRetCode cmd_tunnel_management_cmd(const struct 
cxl_cmd *cmd,
 return CXL_MBOX_INVALID_INPUT;
 }
 
+/*
+ * CXL r3.0 section 7.6.7.5.1 - Get Multi-Headed Info (Opcode 5500h)
+ */
+static CXLRetCode cmd_mhd_get_info(const struct cxl_cmd *cmd,
+   uint8_t *payload_in, size_t len_in,
+   uint8_t *payload_out, size_t *len_out,
+   CXLCCI *cci)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
+if (cvc->mhd_get_info) {
+return cvc->mhd_get_info(cmd, payload_in, len_in, payload_out,
+ len_out, cci);
+}
+return CXL_MBOX_UNSUPPORTED;
+}
+
 static CXLRetCode cmd_events_get_records(const struct cxl_cmd *cmd,
  uint8_t *payload_in, size_t len_in,
  uint8_t *payload_out, size_t *len_out,
@@ -1598,6 +1617,8 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_media_inject_poison, 8, 0 },
 [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
 cmd_media_clear_poison, 72, 0 },
+[MHD][GET_MHD_INFO] = {"GET_MULTI_HEADED_INFO",
+cmd_mhd_get_info, 2, 0},
 };
 
 static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 1fb3ffeca8..307d7c1fd8 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -2120,6 +2120,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 cvc->get_lsa = get_lsa;
 cvc->set_lsa = set_lsa;
 cvc->set_cacheline = set_cacheline;
+cvc->mhd_get_info = NULL;
 cvc->mhd_access_valid = NULL;
 }
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 37893f8626..4a5d4bd98b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -508,6 +508,12 @@ struct CXLType3Class {
 bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t 
*data);
 
 /* Multi-headed Device */
+CXLRetCode (*mhd_get_info)(const struct cxl_cmd *cmd,
+   uint8_t *payload_in,
+   size_t len_in,
+   uint8_t *payload_out,
+   size_t *len_out,
+   CXLCCI *cci);
 bool (*mhd_access_valid)(PCIDevice *d, uint64_t addr, unsigned int size);
 };
 
-- 
2.39.1




Re: [PATCH RESEND 01/15] ppc: spapr: Introduce Nested PAPR API related macros

2023-09-06 Thread Nicholas Piggin
On Wed Sep 6, 2023 at 2:33 PM AEST, Harsh Prateek Bora wrote:
> Adding new macros for the new hypercall op-codes, their return codes,
> Guest State Buffer (GSB) element IDs and few registers which shall be
> used in following patches to support Nested PAPR API.
>
> Signed-off-by: Michael Neuling 
> Signed-off-by: Shivaprasad G Bhat 
> Signed-off-by: Harsh Prateek Bora 
> ---
>  include/hw/ppc/spapr.h|  23 -
>  include/hw/ppc/spapr_nested.h | 186 ++
>  target/ppc/cpu.h  |   2 +
>  3 files changed, 209 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 538b2dfb89..3990fed1d9 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -367,6 +367,16 @@ struct SpaprMachineState {
>  #define H_NOOP-63
>  #define H_UNSUPPORTED -67
>  #define H_OVERLAP -68
> +#define H_STATE   -75

[snip]

I didn't go through to make sure all the numbers are correct, but
generally looks okay. Are these just copied from KVM sources (or
vice versa)?

> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 25fac9577a..6f7f9b9d58 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1587,9 +1587,11 @@ void ppc_compat_add_property(Object *obj, const char 
> *name,
>  #define SPR_PSPB  (0x09F)
>  #define SPR_DPDES (0x0B0)
>  #define SPR_DAWR0 (0x0B4)
> +#define SPR_DAWR1 (0x0B5)
>  #define SPR_RPR   (0x0BA)
>  #define SPR_CIABR (0x0BB)
>  #define SPR_DAWRX0(0x0BC)
> +#define SPR_DAWRX1(0x0BD)
>  #define SPR_HFSCR (0x0BE)
>  #define SPR_VRSAVE(0x100)
>  #define SPR_USPRG0(0x100)

Stray change? Should be in 2nd DAWR patch, presumably.

Thanks,
Nick



[PATCH v3 4/6] cxl/type3: add an optional mhd validation function for memory accesses

2023-09-06 Thread Gregory Price
When memory accesses are made, some MHSLD's would validate the address
is within the scope of allocated sections.  To do this, the base device
must call an optional function set by inherited devices.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c  | 15 +++
 include/hw/cxl/cxl_device.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 6e3309dc11..1fb3ffeca8 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1034,6 +1034,7 @@ void ct3_realize(PCIDevice *pci_dev, Error **errp)
 goto err_release_cdat;
 }
 }
+
 return;
 
 err_release_cdat:
@@ -1249,6 +1250,7 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
unsigned size, MemTxAttrs attrs)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(d);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
@@ -1259,6 +1261,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
 return MEMTX_ERROR;
 }
 
+if (cvc->mhd_access_valid &&
+!cvc->mhd_access_valid(d, dpa_offset, size)) {
+return MEMTX_ERROR;
+}
+
 if (sanitize_running(>cci)) {
 qemu_guest_getrandom_nofail(data, size);
 return MEMTX_OK;
@@ -1270,6 +1277,7 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(d);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
@@ -1279,6 +1287,12 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 if (res) {
 return MEMTX_ERROR;
 }
+
+if (cvc->mhd_access_valid &&
+!cvc->mhd_access_valid(d, dpa_offset, size)) {
+return MEMTX_ERROR;
+}
+
 if (sanitize_running(>cci)) {
 return MEMTX_OK;
 }
@@ -2106,6 +2120,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 cvc->get_lsa = get_lsa;
 cvc->set_lsa = set_lsa;
 cvc->set_cacheline = set_cacheline;
+cvc->mhd_access_valid = NULL;
 }
 
 static const TypeInfo ct3d_info = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 9c37a54699..37893f8626 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -506,6 +506,9 @@ struct CXLType3Class {
 void (*set_lsa)(CXLType3Dev *ct3d, const void *buf, uint64_t size,
 uint64_t offset);
 bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t 
*data);
+
+/* Multi-headed Device */
+bool (*mhd_access_valid)(PCIDevice *d, uint64_t addr, unsigned int size);
 };
 
 struct CSWMBCCIDev {
-- 
2.39.1




[PATCH v3 6/6] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-09-06 Thread Gregory Price
Create a new device to emulate the SK hynix Niagara MHSLD platform.

This device has custom CCI commands that allow for applying isolation
to each memory block between hosts. This enables an early form of
dynamic capacity, whereby the NUMA node maps the entire region, but
the host is responsible for asking the device which memory blocks
are allocated to it, and therefore may be onlined.

To instantiate:

-device 
cxl-skh-niagara,cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,mhd-head=0,mhd-shmid=15

The linux kernel will require raw CXL commands enabled to allow for
passing through of Niagara CXL commands via the CCI mailbox.

The Niagara MH-SLD has a shared memory region that must be initialized
using the 'init_niagara' tool located in the vendor subdirectory

usage: init_niagara
heads : number of heads on the device
sections  : number of sections
section_size  : size of a section in 128mb increments
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_niagara 4 32 1 $shmid1

Signed-off-by: Gregory Price 
---
 hw/cxl/Kconfig  |   4 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 514 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 161 
 8 files changed, 783 insertions(+)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index c9b2e46bac..dd6c54b54d 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -2,5 +2,9 @@ config CXL
 bool
 default y if PCI_EXPRESS
 
+config CXL_VENDOR
+bool
+default y
+
 config I2C_MCTP_CXL
 bool
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 1393821fc4..e8c8c1355a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -15,3 +15,5 @@ system_ss.add(when: 'CONFIG_CXL',
 system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
+
+subdir('vendor')
diff --git a/hw/cxl/vendor/meson.build b/hw/cxl/vendor/meson.build
new file mode 100644
index 00..12db8991f1
--- /dev/null
+++ b/hw/cxl/vendor/meson.build
@@ -0,0 +1 @@
+subdir('skhynix')
diff --git a/hw/cxl/vendor/skhynix/.gitignore b/hw/cxl/vendor/skhynix/.gitignore
new file mode 100644
index 00..6d96de38ea
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/.gitignore
@@ -0,0 +1 @@
+init_niagara
diff --git a/hw/cxl/vendor/skhynix/init_niagara.c 
b/hw/cxl/vendor/skhynix/init_niagara.c
new file mode 100644
index 00..2c189dc33c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/init_niagara.c
@@ -0,0 +1,99 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2023 MemVerge Inc.
+ * Copyright (c) 2023 SK hynix Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct niagara_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+uint32_t total_sections;
+uint32_t free_sections;
+uint32_t section_size;
+uint32_t sections[];
+};
+
+int main(int argc, char *argv[])
+{
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct niagara_state *niagara_state = NULL;
+size_t state_size;
+uint8_t i;
+
+if (argc != 5) {
+printf("usage: init_niagara
\n"
+"\theads : number of heads on the device\n"
+"\tsections  : number of sections\n"
+"\tsection_size  : size of a section in 128mb increments\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+"It is recommended your shared memory region is at least 
128kb\n");
+return -1;
+}
+
+/* must have at least 1 head */
+heads = (uint32_t)atoi(argv[1]);
+if (heads == 0 || heads > 32) {
+printf("bad heads argument (1-32)\n");
+return -1;
+}
+
+/* Get number of sections */
+sections = (uint32_t)atoi(argv[2]);
+if (sections == 0) {
+printf("bad sections argument\n");
+return -1;
+}
+
+section_size = (uint32_t)atoi(argv[3]);
+if (sections == 0) {
+printf("bad section size argument\n");
+return -1;
+}
+
+shmid = (uint32_t)atoi(argv[4]);
+if (shmid == 0) {
+printf("bad shmid argument\n");
+return -1;
+}
+
+niagara_state = shmat(shmid, NULL, 0);
+if 

[PATCH v3 3/6] cxl/type3: Expose ct3 functions so that inheriters can call them

2023-09-06 Thread Gregory Price
For devices built on top of ct3, we need the init, realize, and
exit functions exposed to correctly start up and tear down.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c  | 6 +++---
 include/hw/cxl/cxl_device.h | 4 
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 80d596ee10..6e3309dc11 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -950,7 +950,7 @@ static DOEProtocol doe_spdm_prot[] = {
 { }
 };
 
-static void ct3_realize(PCIDevice *pci_dev, Error **errp)
+void ct3_realize(PCIDevice *pci_dev, Error **errp)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -1054,7 +1054,7 @@ err_address_space_free:
 return;
 }
 
-static void ct3_exit(PCIDevice *pci_dev)
+void ct3_exit(PCIDevice *pci_dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -1285,7 +1285,7 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 return address_space_write(as, dpa_offset, attrs, , size);
 }
 
-static void ct3d_reset(DeviceState *dev)
+void ct3d_reset(DeviceState *dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(dev);
 uint32_t *reg_state = ct3d->cxl_cstate.crb.cache_mem_registers;
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e824c5ade8..9c37a54699 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -524,6 +524,10 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, 
uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs);
 
+void ct3_realize(PCIDevice *pci_dev, Error **errp);
+void ct3_exit(PCIDevice *pci_dev);
+void ct3d_reset(DeviceState *d);
+
 uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
 
 void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num);
-- 
2.39.1




[PATCH v3 1/6] cxl/mailbox: move mailbox effect definitions to a header

2023-09-06 Thread Gregory Price
Preparation for allowing devices to define their own CCI commands

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c   | 30 +-
 include/hw/cxl/cxl_mailbox.h | 18 ++
 2 files changed, 31 insertions(+), 17 deletions(-)
 create mode 100644 include/hw/cxl/cxl_mailbox.h

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4e8651ebe2..b64bbdf45d 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -12,6 +12,7 @@
 #include "hw/pci/msix.h"
 #include "hw/cxl/cxl.h"
 #include "hw/cxl/cxl_events.h"
+#include "hw/cxl/cxl_mailbox.h"
 #include "hw/pci/pci.h"
 #include "hw/pci-bridge/cxl_upstream_port.h"
 #include "qemu/cutils.h"
@@ -1561,28 +1562,21 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct 
cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
-#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
-#define IMMEDIATE_DATA_CHANGE (1 << 2)
-#define IMMEDIATE_POLICY_CHANGE (1 << 3)
-#define IMMEDIATE_LOG_CHANGE (1 << 4)
-#define SECURITY_STATE_CHANGE (1 << 5)
-#define BACKGROUND_OPERATION (1 << 6)
-
 static const struct cxl_cmd cxl_cmd_set[256][256] = {
 [EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
 cmd_events_get_records, 1, 0 },
 [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
-cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
+cmd_events_clear_records, ~0, CXL_MBOX_IMMEDIATE_LOG_CHANGE },
 [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
   cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
   cmd_events_set_interrupt_policy,
-  ~0, IMMEDIATE_CONFIG_CHANGE },
+  ~0, CXL_MBOX_IMMEDIATE_CONFIG_CHANGE },
 [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
 cmd_firmware_update_get_info, 0, 0 },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8,
- IMMEDIATE_POLICY_CHANGE },
+ CXL_MBOX_IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 [IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
@@ -1591,9 +1585,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_ccls_get_partition_info, 0, 0 },
 [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
 [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
-~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_CONFIG_CHANGE | CXL_MBOX_IMMEDIATE_DATA_CHANGE 
},
 [SANITIZE][OVERWRITE] = { "SANITIZE_OVERWRITE", cmd_sanitize_overwrite, 0,
-IMMEDIATE_DATA_CHANGE | SECURITY_STATE_CHANGE | BACKGROUND_OPERATION },
+(CXL_MBOX_IMMEDIATE_DATA_CHANGE |
+ CXL_MBOX_SECURITY_STATE_CHANGE |
+ CXL_MBOX_BACKGROUND_OPERATION)},
 [PERSISTENT_MEM][GET_SECURITY_STATE] = { "GET_SECURITY_STATE",
 cmd_get_security_state, 0, 0 },
 [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
@@ -1612,10 +1608,10 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = 
{
 8, 0 },
 [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
 "ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
-~0, IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_DATA_CHANGE },
 [DCD_CONFIG][RELEASE_DYN_CAP] = {
 "RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
-~0, IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
@@ -1628,7 +1624,7 @@ static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
  */
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8,
- IMMEDIATE_POLICY_CHANGE },
+ CXL_MBOX_IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0,
   0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
@@ -1670,7 +1666,7 @@ int cxl_process_cci_message(CXLCCI *cci, uint8_t set, 
uint8_t cmd,
 }
 
 /* Only one bg command at a time */
-if ((cxl_cmd->effect & BACKGROUND_OPERATION) &&
+if ((cxl_cmd->effect & CXL_MBOX_BACKGROUND_OPERATION) &&
 cci->bg.runtime > 0) {
 return CXL_MBOX_BUSY;
 }
@@ -1691,7 +1687,7 @@ int cxl_process_cci_message(CXLCCI *cci, uint8_t set, 
uint8_t cmd,
 }
 
 ret = (*h)(cxl_cmd, pl_in, len_in, pl_out, len_out, cci);
-if ((cxl_cmd->effect & BACKGROUND_OPERATION) &&
+if ((cxl_cmd->effect & 

[PATCH v3 2/6] cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions

2023-09-06 Thread Gregory Price
Call CXL_TYPE3 once at top of function to avoid multiple invocations.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index fd9d134d46..80d596ee10 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1248,17 +1248,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev 
*ct3d,
 MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
unsigned size, MemTxAttrs attrs)
 {
+CXLType3Dev *ct3d = CXL_TYPE3(d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
 
-res = cxl_type3_hpa_to_as_and_dpa(CXL_TYPE3(d), host_addr, size,
+res = cxl_type3_hpa_to_as_and_dpa(ct3d, host_addr, size,
   , _offset);
 if (res) {
 return MEMTX_ERROR;
 }
 
-if (sanitize_running(_TYPE3(d)->cci)) {
+if (sanitize_running(>cci)) {
 qemu_guest_getrandom_nofail(data, size);
 return MEMTX_OK;
 }
@@ -1268,16 +1269,17 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs)
 {
+CXLType3Dev *ct3d = CXL_TYPE3(d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
 
-res = cxl_type3_hpa_to_as_and_dpa(CXL_TYPE3(d), host_addr, size,
+res = cxl_type3_hpa_to_as_and_dpa(ct3d, host_addr, size,
   , _offset);
 if (res) {
 return MEMTX_ERROR;
 }
-if (sanitize_running(_TYPE3(d)->cci)) {
+if (sanitize_running(>cci)) {
 return MEMTX_OK;
 }
 return address_space_write(as, dpa_offset, attrs, , size);
-- 
2.39.1




[PATCH v3 0/6] CXL: SK hynix Niagara MHSLD Device

2023-09-06 Thread Gregory Price
v3:
- 6 patch series, first 5 are pull-aheads that can be merged separately
- cci: added MHD back into main mailbox, but implemented a callback
   pattern.  type-3 devices leave the callback null by default and
   report unsupported if nothing implements it.
- cleanup and formatting

v2:
- 5 patch series, first 4 are pull-aheads that can be merged separately
- cci: rebased on 8-30 branch from jic23, dropped cci patches
- mailbox: dropped MHD commands, integrated into niagara (for now)
- mailbox: refactor CCI defines to avoid redefinition in niagara
- type3: cleanup duplicate typecasting
- type3: expose ct3 functions so inheriting devices may access them
- type3: add optional mhd validation function for memory access
- niagara: refactor to make niagara inherit type3 and override behavior
- niagara: refactor command definitions and types into header to make
   understanding the device a bit easier for users
- style and formatting

This patch set includes an emulation of the SK hynix Niagara MHSLD 
platform with custom CCI commands that allow for isolation of memory
blocks between attached hosts.

This device allows hosts to request memory blocks directly from the
device, rather than requiring full the DCD command set.  As a matter
of simplicity, this is beneficial to for testing and applications of
dynamic memory pooling on top of the 1.1 and 2.0 specification.

Note that these CCI commands are not servicable without a proper
driver or the kernel allowing raw CXL commands to be passed through
the mailbox driver, so users should enable
`CONFIG_CXL_MEM_RAW_COMMANDS=y` on the kernel of their QEMU instance
if they wish to test it.

Signed-off-by: Gregory Price 


Gregory Price (6):
  cxl/mailbox: move mailbox effect definitions to a header
  cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions
  cxl/type3: Expose ct3 functions so that inheriters can call them
  cxl/type3: add an optional mhd validation function for memory accesses
  cxl/mailbox,type3: Implement MHD get info command callback
  cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

 hw/cxl/Kconfig  |   4 +
 hw/cxl/cxl-mailbox-utils.c  |  51 ++-
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 514 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 161 
 hw/mem/cxl_type3.c  |  32 +-
 include/hw/cxl/cxl_device.h |  13 +
 include/hw/cxl/cxl_mailbox.h|  18 +
 12 files changed, 873 insertions(+), 24 deletions(-)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h
 create mode 100644 include/hw/cxl/cxl_mailbox.h

-- 
2.39.1




Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-06 Thread William Roche

On 9/6/23 17:16, Peter Xu wrote:


Just a note..

Probably fine for now to reuse block page size, but IIUC the right thing to
do is to fetch it from the signal info (in QEMU's sigbus_handler()) of
kernel_siginfo.si_addr_lsb.

At least for x86 I think that stores the "shift" of covered poisoned page
(one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a
huge page, though.. not aware of any man page for that).  It'll then work
naturally when Linux huge pages will start to support sub-huge-page-size
poisoning someday.  We can definitely leave that for later.



I totally agree with that !



--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus *pss, 
QEMUFile *file,
  uint8_t *p = block->host + offset;
  int len = 0;
  
-if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {

+if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||


Can we move this out of zero page handling?  Zero detection is not
guaranteed to always be the 1st thing to do when processing a guest page.
Currently it'll already skip either rdma or when compression enabled, so
it'll keep crashing there.

Perhaps at the entry of ram_save_target_page_legacy()?


Right, as expected, using migration compression with poisoned pages 
crashes even with this fix...


The difficulty I see to place the poisoned page verification on the
entry of ram_save_target_page_legacy() is what to do to skip the found 
poison page(s) if any ?


Should I continue to treat them as zero pages written with 
save_zero_page_to_file ? Or should I consider the case of an ongoing 
compression use and create a new code compressing an empty page with 
save_compress_page() ?


And what about an RDMA memory region impacted by a memory error ?
This is an important aspect.
Does anyone know how this situation is dealt with ? And how it should be 
handled in Qemu ?


--
Thanks,
William.



[PATCH v3 0/3] Fix MCE handling on AMD hosts

2023-09-06 Thread John Allen
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

v3:
  - Reorder series. Only enable SUCCOR after bugs have been fixed.
  - Introduce new patch ignoring AO errors.

John Allen (2):
  i386: Fix MCE support for AMD hosts
  i386: Add support for SUCCOR feature

William Roche (1):
  i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 34 --
 4 files changed, 49 insertions(+), 11 deletions(-)

-- 
2.39.3




[PATCH v3 1/3] i386: Fix MCE support for AMD hosts

2023-09-06 Thread John Allen
For the most part, AMD hosts can use the same MCE injection code as Intel but,
there are instances where the qemu implementation is Intel specific. First, MCE
deliviery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
v3:
  - Update to latest qemu code that introduces using MCG_STATUS_RIPV in the
case of a BUS_MCEERR_AR on a non-AMD machine.
---
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 17 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 89aa696c6d..9547e2b09d 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -91,6 +91,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
 int family = 0;
 int model = 0;
 
+if (IS_AMD_CPU(env)) {
+return 0;
+}
+
 cpu_x86_version(env, , );
 if ((family == 6 && model >= 14) || family > 6) {
 return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 639a242ad8..5fce74aac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -590,16 +590,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 CPUState *cs = CPU(cpu);
 CPUX86State *env = >env;
 uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+  MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
 uint64_t mcg_status = MCG_STATUS_MCIP;
 int flags = 0;
 
-if (code == BUS_MCEERR_AR) {
-status |= MCI_STATUS_AR | 0x134;
-mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+if (!IS_AMD_CPU(env)) {
+status |= MCI_STATUS_S;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_AR | 0x134;
+mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+} else {
+status |= 0xc0;
+mcg_status |= MCG_STATUS_RIPV;
+}
 } else {
-status |= 0xc0;
-mcg_status |= MCG_STATUS_RIPV;
+mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
 flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
-- 
2.39.3




[PATCH v3 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-06 Thread John Allen
From: William Roche 

AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
as it panics the VM kernel. We filter this event and provide a
warning message.

Signed-off-by: William Roche 
---
v3:
  - New patch
---
 target/i386/kvm/kvm.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5fce74aac5..4d42d3ed4c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 mcg_status |= MCG_STATUS_RIPV;
 }
 } else {
+if (code == BUS_MCEERR_AO) {
+/* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+return;
+}
 mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
@@ -655,7 +659,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 if (ram_addr != RAM_ADDR_INVALID &&
 kvm_physical_memory_addr_from_host(c->kvm_state, addr, )) {
 kvm_hwpoison_page_add(ram_addr);
-kvm_mce_inject(cpu, paddr, code);
+if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO) {
+kvm_mce_inject(cpu, paddr, code);
+}
 
 /*
  * Use different logging severity based on error type.
@@ -668,8 +674,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 addr, paddr, "BUS_MCEERR_AR");
 } else {
  warn_report("Guest MCE Memory Error at QEMU addr %p and "
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
- addr, paddr, "BUS_MCEERR_AO");
+ "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+ addr, paddr, "BUS_MCEERR_AO",
+ IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected");
 }
 
 return;
-- 
2.39.3




[PATCH v3 3/3] i386: Add support for SUCCOR feature

2023-09-06 Thread John Allen
Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche 
Reviewed-by: Joao Martins 
Signed-off-by: John Allen 

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
---
 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/kvm/kvm.c |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 00f913b638..d90d3a9489 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1029,6 +1029,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .tcg_features = TCG_APM_FEATURES,
 .unmigratable_flags = CPUID_APM_INVTSC,
 },
+[FEAT_8000_0007_EBX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, "succor", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = { .eax = 0x8007, .reg = R_EBX, },
+.tcg_features = 0,
+.unmigratable_flags = 0,
+},
 [FEAT_8000_0008_EBX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -6554,7 +6570,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x8007:
 *eax = 0;
-*ebx = 0;
+*ebx = env->features[FEAT_8000_0007_EBX];
 *ecx = 0;
 *edx = env->features[FEAT_8000_0007_EDX];
 break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index a6000e93bd..f5afc5e4fd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -598,6 +598,7 @@ typedef enum FeatureWord {
 FEAT_7_1_EAX,   /* CPUID[EAX=7,ECX=1].EAX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */
@@ -942,6 +943,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4d42d3ed4c..0255863421 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -477,6 +477,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+} else if (function == 0x8007 && reg == R_EBX) {
+ret |= CPUID_8000_0007_EBX_SUCCOR;
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip
-- 
2.39.3




Re: Tips for local testing guestfwd

2023-09-06 Thread Felix Wu
Hi,
I noticed why the chardev socket backend disconnects, and I would like to
make this a RFC to see how I should fix it.
Current scenario after boot-up:

   1. tcp_chr_read_poll keeps polling the slirp_socket_can_recv, and
   slirp_socket_can_recv returns 0 since slirp_find_ctl_socket couldn't
   find the guestfwd socket.
   2. The returned 0 in step 1 was assigned to the s->max_size (s is
SocketChardev
   *), and the socket chardev handler won't read since readable size is 0.
   3. When the 1st request is sent, the guestfwd socket is added into the
   slirp's socket list, instead of 0, tcp_chr_read_poll will return the
   result of sopreprbuf > 0.
   4. tcp_chr_read reads the thing.
   5. tcp_chr_read_poll still returns things > 0, which is the output of
   sopreprbuf.
   6. tcp_chr_read reads the thing again, but there's nothing in the
   buffer, so it's unhappy, and closes the connection.
   7. any follow-up requests won't be handled.

These tcp_chr* functions are in fle [1], and slirp_* are in fle [2].

My questions:
1. Since this thing doesn't work on 2nd and later requests, I want to know
how this thing is supposed to work, and to avoid asking people vaguely, I
will provide my though following and please correct me if I am wrong:
a. The state machine in chardev socket should maintain a connected
state (s->state
== TCP_CHARDEV_STATE_CONNECTED), this means no change in [1].
b. slirp_socket_can_recv should return 0 once all data is read instead of
outcome from sopreprbuf. This means I need to remove the socket or change
its state to no file descriptor [3], namely somehow reset it.
c. When a new request comes in, it will need to add the socket back to this
slirp instance's socket list, populate its file descriptor, and establish
the connection.

b and c sounds convoluted so I want to check.

2. What is the outcome of sopreprbuf function [3]?
Since it's returned to the tcp_chr_read_poll function, I thought it's the
readable bytes in the socket, but in my test I noticed following thing:

tcp_chr_read_poll_size : s->max_size: 132480
tcp_chr_read : size: 2076
tcp_chr_read_poll_size : s->max_size: 129600
tcp_chr_read : size: 0

Even there's not remaining things in the buffer (read size 0), it's still
non-zero, and thus the read function keeps reading it until it becomes
unhappy.
Also, 132480-129600 = 2880 vs 2076, the read byte doesn't match.

Either I need to go with the way in question 1, b.c. steps, or I don't need
to delete the socket, but the sopreprbuf wasn't proper to be used there and
I need to correct it.
Also updated https://gitlab.com/qemu-project/qemu/-/issues/1835.

Any feedback will be appreciated, thanks!
Felix

[1].
https://gitlab.com/qemu-project/qemu/-/blob/master/chardev/char-socket.c#L141
[2].
https://gitlab.freedesktop.org/slirp/libslirp/-/blob/master/src/slirp.c#L1582
[3].
https://gitlab.freedesktop.org/slirp/libslirp/-/blob/master/src/socket.h#L221

On Wed, Aug 23, 2023 at 10:27 AM Felix Wu  wrote:

> Update on debugging this thing (already updated
> https://gitlab.com/qemu-project/qemu/-/issues/1835):
> I saw that `tcp_chr_free_connection` was called after the first response
> being successfully sent:
> ```
>
> slirp_guestfwd_write guestfwd_write: size 80tcp_chr_write tcp_chr_write: 
> s->state:2tcp_chr_write tcp_chr_write: len:80qemu_chr_write_parameter len: 80 
> // tracking qemu_chr_writeqemu_chr_write_res len: 80 // same 
> thingtcp_chr_free_connection tcp_chr_free_connection: state: 2, changing it 
> to disconnecttcp_chr_change_state tcp_chr_change_state: state: 2, next state: 
> 0 // state 2==connected, 0==disconnected.
>
> ```
> And after that, the state of `SocketChardev` remained disconnected, and
> when the 2nd request came in, the `tcp_chr_write` dropped it directly.
> Maybe this state machine should be reset after every connection? Not sure.
>
> On Thu, Aug 17, 2023 at 11:58 AM Felix Wu  wrote:
>
>> Hi Samuel,
>>
>> Thanks for the clarification! I missed the email so didn't reply in time,
>> but was able to figure it out.
>>
>> Hi everyone,
>> IPv6 guestfwd works in my local test but it has a weird bug: if you send
>> two requests, the first one gets the correct response, but the second one
>> gets stuck.
>> I am using a simple http server for this test, and just noticed this bug
>> also exists in IPv4 guestfwd. I've documented it in
>> https://gitlab.com/qemu-project/qemu/-/issues/1835.
>>
>> Just want to check if anyone has seen the same issue before.
>>
>> Thanks! Felix
>>
>> On Thu, Jul 20, 2023 at 7:54 AM Samuel Thibault 
>> wrote:
>>
>>> Hello,
>>>
>>> Felix Wu, le mar. 18 juil. 2023 18:12:16 -0700, a ecrit:
>>> > 02 == SYN so it looks good. But both tcpdump and wireshark (looking
>>> into packet
>>> > dump provided by QEMU invocation)
>>>
>>> Which packet dump?
>>>
>>> > I added multiple prints inside slirp and confirmed the ipv6 version of
>>> [1] was
>>> > reached.
>>> > in tcp_output function [2], I got following print:
>>> > qemu-system-aarch64: info: Slirp: 

[PATCH] migration: Unify and trace vmstate field_exists() checks

2023-09-06 Thread Peter Xu
For both save/load we actually share the logic on deciding whether a field
should exist.  Merge the checks into a helper and use it for both save and
load.  When doing so, add documentations and reformat the code to make it
much easier to read.

The real benefit here (besides code cleanups) is we add a trace-point for
this; this is a known spot where we can easily break migration
compatibilities between binaries, and this trace point will be critical for
us to identify such issues.

For example, this will be handy when debugging things like:

https://gitlab.com/qemu-project/qemu/-/issues/932

Signed-off-by: Peter Xu 
---
 migration/vmstate.c| 34 ++
 migration/trace-events |  1 +
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 31842c3afb..73e74ddea0 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -25,6 +25,30 @@ static int vmstate_subsection_save(QEMUFile *f, const 
VMStateDescription *vmsd,
 static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
void *opaque);
 
+/* Whether this field should exist for either save or load the VM? */
+static bool
+vmstate_field_exists(const VMStateDescription *vmsd, const VMStateField *field,
+ void *opaque, int version_id)
+{
+bool result;
+
+if (field->field_exists) {
+/* If there's the function checker, that's the solo truth */
+result = field->field_exists(opaque, version_id);
+trace_vmstate_field_exists(vmsd->name, field->name, field->version_id,
+   version_id, result);
+} else {
+/*
+ * Otherwise, we only save/load if field version is same or older.
+ * For example, when loading from an old binary with old version,
+ * we ignore new fields with newer version_ids.
+ */
+result = field->version_id <= version_id;
+}
+
+return result;
+}
+
 static int vmstate_n_elems(void *opaque, const VMStateField *field)
 {
 int n_elems = 1;
@@ -104,10 +128,7 @@ int vmstate_load_state(QEMUFile *f, const 
VMStateDescription *vmsd,
 }
 while (field->name) {
 trace_vmstate_load_state_field(vmsd->name, field->name);
-if ((field->field_exists &&
- field->field_exists(opaque, version_id)) ||
-(!field->field_exists &&
- field->version_id <= version_id)) {
+if (vmstate_field_exists(vmsd, field, opaque, version_id)) {
 void *first_elem = opaque + field->offset;
 int i, n_elems = vmstate_n_elems(opaque, field);
 int size = vmstate_size(opaque, field);
@@ -342,10 +363,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
 }
 
 while (field->name) {
-if ((field->field_exists &&
- field->field_exists(opaque, version_id)) ||
-(!field->field_exists &&
- field->version_id <= version_id)) {
+if (vmstate_field_exists(vmsd, field, opaque, version_id)) {
 void *first_elem = opaque + field->offset;
 int i, n_elems = vmstate_n_elems(opaque, field);
 int size = vmstate_size(opaque, field);
diff --git a/migration/trace-events b/migration/trace-events
index 4666f19325..446db0b7ce 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -66,6 +66,7 @@ vmstate_save_state_loop(const char *name, const char *field, 
int n_elems) "%s/%s
 vmstate_save_state_top(const char *idstr) "%s"
 vmstate_subsection_save_loop(const char *name, const char *sub) "%s/%s"
 vmstate_subsection_save_top(const char *idstr) "%s"
+vmstate_field_exists(const char *vmsd, const char *name, int field_version, 
int version, int result) "%s:%s field_version %d version %d result %d"
 
 # vmstate-types.c
 get_qtailq(const char *name, int version_id) "%s v%d"
-- 
2.41.0




Re: [PATCH v3 1/4] docs/qcow2: add the zoned format feature

2023-09-06 Thread Stefan Hajnoczi
On Mon, Aug 28, 2023 at 11:09:52PM +0800, Sam Li wrote:
> Add the specs for the zoned format feature of the qcow2 driver.
> The qcow2 file can be taken as zoned device and passed through by
> virtio-blk device or NVMe ZNS device to the guest given zoned
> information.
> 
> Signed-off-by: Sam Li 
> ---
>  docs/system/qemu-block-drivers.rst.inc | 39 ++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/docs/system/qemu-block-drivers.rst.inc 
> b/docs/system/qemu-block-drivers.rst.inc
> index 105cb9679c..640ab151a7 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -172,6 +172,45 @@ This section describes each format and the options that 
> are supported for it.
>  filename`` to check if the NOCOW flag is set or not (Capital 'C' is
>  NOCOW flag).
>  
> +  .. option:: zoned
> +The zoned interface of zoned storage divices can different forms which
> +is referred to as models. This option uses number to represent, 1 for
> +host-managed and 0 for non-zoned.

I would simplify this paragraph down to:

  1 for a host-managed zoned device or 0 for a non-zoned device.

> +
> +  .. option:: zone_size
> +
> +The size of a zone of the zoned device in bytes. The device is divided

The first sentence is a little confusing due to the repetition of the
word "zone". It can be shortened:

  The size of a zone, in bytes.

> +into zones of this size with the exception of the last zone, which may
> +be smaller.
> +
> +  .. option:: zone_capacity
> +
> +The initial capacity value for all zones. The capacity must be less than

  The initial capacity value, in bytes, for all zones.

> +or equal to zone size. If the last zone is smaller, then its capacity is
> +capped. The device follows the ZBC protocol tends to have the same size
> +as its zone.

I think the last sentence says that ZBC devices tend to have capacity ==
len whereas ZNS devices may have a unique capacity for each zone? You
could drop this last sentence completely.

> +
> +The zone capacity is per zone and may be different between zones in real
> +devices. For simplicity, limits QCow2 emulation to the same zone capacity
> +for all zones.

The last sentence:

  For simplicity, qcow2 sets all zones to the same capacity.

> +
> +  .. option:: zone_nr_conv
> +
> +The number of conventional zones of the zoned device.
> +
> +  .. option:: max_open_zones
> +
> +The maximal allowed open zones.
> +
> +  .. option:: max_active_zones
> +
> +The limit of the zones with implicit open, explicit open or closed state.
> +
> +  .. option:: max_append_sectors
> +
> +The maximal sectors in 512B blocks that is allowed to append to zones
> +while writing.

Rephrasing:

  The maximum number of 512-byte sectors in a zone append request.

> +
>  .. program:: image-formats
>  .. option:: qed
>  
> -- 
> 2.40.1
> 


signature.asc
Description: PGP signature


Re: [PATCH v2 1/2] i386: Add support for SUCCOR feature

2023-09-06 Thread Moger, Babu

Hi John,

On 9/5/2023 10:01 AM, John Allen wrote:

On Fri, Sep 01, 2023 at 11:30:53AM +0100, Joao Martins wrote:

On 26/07/2023 21:41, John Allen wrote:

Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche
Signed-off-by: John Allen

I think this is matching the last discussion:

Reviewed-by: Joao Martins

The patch ordering doesn't look correct though. Perhaps we should expose succor
only after MCE is fixed so this patch would be the second, not the first?

Yes, that makes sense. I will address this and send another version of
the series with the correct ordering.


Also, this should in generally be OK for -cpu host, but might be missing a third
patch that adds "succor" to the AMD models e.g.

Babu,

I think we previously discussed adding this to the models later in a
separate series. Is this your preferred course of action or can we add
it with this series?



Yes. We can add it later as a separate series. We just added EPYC-Genoa. 
We don't want to add EPYC-Genoa-v2 at this point. We have few more 
features pending as well.


Thanks

Babu


Re: [PATCH v1 21/22] vfio/pci: Allow the selection of a given iommu backend

2023-09-06 Thread Alex Williamson
On Wed, 6 Sep 2023 15:10:39 -0300
Jason Gunthorpe  wrote:

> On Wed, Aug 30, 2023 at 06:37:53PM +0800, Zhenzhong Duan wrote:
> > Note the /dev/iommu device may have been pre-opened by a
> > management tool such as libvirt. This mode is no more considered
> > for the legacy backend. So let's remove the "TODO" comment.  
> 
> Can you show an example of that syntax too?

Unless you're just looking for something in the commit log, patch 16/
added the following to the qemu help output:

+#ifdef CONFIG_IOMMUFD
+``-object iommufd,id=id[,fd=fd]``
+Creates an iommufd backend which allows control of DMA mapping
+through the /dev/iommu device.
+
+The ``id`` parameter is a unique ID which frontends (such as
+vfio-pci of vdpa) will use to connect withe the iommufd backend.
+
+The ``fd`` parameter is an optional pre-opened file descriptor
+resulting from /dev/iommu opening. Usually the iommufd is shared
+accross all subsystems, bringing the benefit of centralized
+reference counting.
+#endif
 
> Also, the vfio device should be openable externally as well

Appears to be added in the very next patch in the series.  Thanks,

Alex




[RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()

2023-09-06 Thread Stefan Hajnoczi
Coroutine HMP commands currently run to completion in a nested event
loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
the BQL and cannot process work while the coroutine monitor command is
running. A deadlock occurs when monitor commands attempt to wait for
call_rcu work to finish.

This patch refactors the HMP monitor to use the existing event loop
instead of creating a nested event loop. This will allow the next
patches to rely on draining call_rcu work.

Signed-off-by: Stefan Hajnoczi 
---
 monitor/hmp.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/monitor/hmp.c b/monitor/hmp.c
index 69c1b7e98a..6cff2810aa 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -,15 +,17 @@ typedef struct HandleHmpCommandCo {
 Monitor *mon;
 const HMPCommand *cmd;
 QDict *qdict;
-bool done;
 } HandleHmpCommandCo;
 
-static void handle_hmp_command_co(void *opaque)
+static void coroutine_fn handle_hmp_command_co(void *opaque)
 {
 HandleHmpCommandCo *data = opaque;
+
 handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
 monitor_set_cur(qemu_coroutine_self(), NULL);
-data->done = true;
+qobject_unref(data->qdict);
+monitor_resume(data->mon);
+g_free(data);
 }
 
 void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
@@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char 
*cmdline)
 Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), 
>common);
 handle_hmp_command_exec(>common, cmd, qdict);
 monitor_set_cur(qemu_coroutine_self(), old_mon);
+qobject_unref(qdict);
 } else {
-HandleHmpCommandCo data = {
-.mon = >common,
-.cmd = cmd,
-.qdict = qdict,
-.done = false,
-};
-Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, );
+HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
+
+data = g_new(HandleHmpCommandCo, 1);
+data->mon = >common;
+data->cmd = cmd;
+data->qdict = qdict; /* freed by handle_hmp_command_co() */
+
+Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
+monitor_suspend(>common); /* resumed by handle_hmp_command_co() */
 monitor_set_cur(co, >common);
 aio_co_enter(qemu_get_aio_context(), co);
-AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
 }
-
-qobject_unref(qdict);
 }
 
 static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
-- 
2.41.0




[RFC 2/3] rcu: add drain_call_rcu_co() API

2023-09-06 Thread Stefan Hajnoczi
call_drain_rcu() has limitations that make it unsuitable for use in
qmp_device_add(). Introduce a new coroutine version of drain_call_rcu()
with the same functionality but that does not drop the BQL. The next
patch will use it to fix qmp_device_add().

Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS |  2 ++
 docs/devel/rcu.txt  | 21 +
 include/qemu/rcu.h  |  1 +
 util/rcu-internal.h |  8 +++
 util/rcu-co.c   | 55 +
 util/rcu.c  |  3 ++-
 util/meson.build|  2 +-
 7 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 util/rcu-internal.h
 create mode 100644 util/rcu-co.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3b29568ed4..7f98253bda 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2908,6 +2908,8 @@ F: include/qemu/rcu*.h
 F: tests/unit/rcutorture.c
 F: tests/unit/test-rcu-*.c
 F: util/rcu.c
+F: util/rcu-co.c
+F: util/rcu-internal.h
 
 Human Monitor (HMP)
 M: Dr. David Alan Gilbert 
diff --git a/docs/devel/rcu.txt b/docs/devel/rcu.txt
index 2e6cc607a1..344764527f 100644
--- a/docs/devel/rcu.txt
+++ b/docs/devel/rcu.txt
@@ -130,6 +130,27 @@ The core RCU API is small:
 
 g_free_rcu(, rcu);
 
+ void coroutine_fn drain_call_rcu_co(void);
+
+drain_call_rcu_co() yields until the reclamation phase is finished.
+Reclaimer functions previously submitted with call_rcu1() in this
+thread will have finished by the time drain_call_rcu_co() returns.
+
+ void drain_call_rcu(void);
+
+drain_call_rcu() releases the Big QEMU Lock (BQL), if held, waits until
+the reclamation phase is finished, and then re-acquires the BQL, if
+previously held.  Reclaimer functions previously submitted with
+call_rcu1() in this thread will have finished by the time
+drain_call_rcu() returns.
+
+drain_call_rcu() has the following limitations:
+1. It deadlocks when called within an RCU read-side critical section.
+2. All functions on the call stack must be designed to handle dropping
+   the BQL.
+
+Prefer drain_call_rcu_co() over drain_call_rcu().
+
  typeof(*p) qatomic_rcu_read(p);
 
 qatomic_rcu_read() is similar to qatomic_load_acquire(), but it makes
diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index fea058aa9f..53055df1dc 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -141,6 +141,7 @@ struct rcu_head {
 };
 
 void call_rcu1(struct rcu_head *head, RCUCBFunc *func);
+void coroutine_fn drain_call_rcu_co(void);
 void drain_call_rcu(void);
 
 /* The operands of the minus operator must have the same type,
diff --git a/util/rcu-internal.h b/util/rcu-internal.h
new file mode 100644
index 00..7d85366d54
--- /dev/null
+++ b/util/rcu-internal.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+
+#ifndef RCU_INTERNAL_H
+#define RCU_INTERNAL_H
+
+extern int in_drain_call_rcu;
+
+#endif /* RCU_INTERNAL_H */
diff --git a/util/rcu-co.c b/util/rcu-co.c
new file mode 100644
index 00..920fcacb7a
--- /dev/null
+++ b/util/rcu-co.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+/*
+ * RCU APIs for coroutines
+ *
+ * The RCU coroutine APIs are kept separate from the main RCU code to avoid
+ * depending on AioContext APIs in rcu.c. This is necessary because at least
+ * tests/unit/ptimer-test.c has replacement functions for AioContext APIs that
+ * conflict with the real functions.
+ *
+ * It's also nice to logically separate the core RCU code from the coroutine
+ * APIs :).
+ */
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "qemu/atomic.h"
+#include "qemu/coroutine.h"
+#include "qemu/rcu.h"
+#include "rcu-internal.h"
+
+typedef struct {
+struct rcu_head rcu;
+Coroutine *co;
+} RcuDrainCo;
+
+static void drain_call_rcu_co_bh(void *opaque)
+{
+RcuDrainCo *data = opaque;
+
+/* Re-enter drain_call_rcu_co() where it yielded */
+aio_co_wake(data->co);
+}
+
+static void drain_call_rcu_co_cb(struct rcu_head *node)
+{
+RcuDrainCo *data = container_of(node, RcuDrainCo, rcu);
+AioContext *ctx = qemu_coroutine_get_aio_context(data->co);
+
+/*
+ * drain_call_rcu_co() might still be running in its thread, so schedule a
+ * BH in its thread. The BH only runs after the coroutine has yielded.
+ */
+aio_bh_schedule_oneshot(ctx, drain_call_rcu_co_bh, data);
+}
+
+void coroutine_fn drain_call_rcu_co(void)
+{
+RcuDrainCo data = {
+.co = qemu_coroutine_self(),
+};
+
+qatomic_inc(_drain_call_rcu);
+call_rcu1(, drain_call_rcu_co_cb);
+qemu_coroutine_yield(); /* wait for drain_rcu_co_bh() */
+qatomic_dec(_drain_call_rcu);
+}
diff --git a/util/rcu.c b/util/rcu.c
index e587bcc483..2519bd7d5c 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -32,6 +32,7 @@
 #include "qemu/thread.h"
 #include "qemu/main-loop.h"
 #include "qemu/lockable.h"
+#include "rcu-internal.h"
 #if 

[RFC 3/3] qmp: make qmp_device_add() a coroutine

2023-09-06 Thread Stefan Hajnoczi
It is not safe to call drain_call_rcu() from qmp_device_add() because
some call stacks are not prepared for drain_call_rcu() to drop the Big
QEMU Lock (BQL).

For example, device emulation code is protected by the BQL but when it
calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
BQL is dropped. See bz#2215192 below for a concrete bug of this type.

Another limitation of drain_call_rcu() is that it cannot be invoked
within an RCU read-side critical section since the reclamation phase
cannot complete until the end of the critical section. Unfortunately,
call stacks have been seen where this happens (see bz#2214985 below).

Switch to call_drain_rcu_co() to avoid these problems. This requires
making qmp_device_add() a coroutine. qdev_device_add() is not designed
to be called from coroutines, so it must be invoked from a BH and then
switch back to the coroutine.

Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use 
drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Signed-off-by: Stefan Hajnoczi 
---
 qapi/qdev.json |  1 +
 include/monitor/qdev.h |  3 ++-
 monitor/qmp-cmds.c |  2 +-
 softmmu/qdev-monitor.c | 34 ++
 hmp-commands.hx|  1 +
 5 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/qapi/qdev.json b/qapi/qdev.json
index 6bc5a733b8..78e9d7f7b8 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -79,6 +79,7 @@
 ##
 { 'command': 'device_add',
   'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
+  'coroutine': true,
   'gen': false, # so we can get the additional arguments
   'features': ['json-cli', 'json-cli-hotplug'] }
 
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 1d57bf6577..1fed9eb9ea 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -5,7 +5,8 @@
 
 void hmp_info_qtree(Monitor *mon, const QDict *qdict);
 void hmp_info_qdm(Monitor *mon, const QDict *qdict);
-void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
+void coroutine_fn
+qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
 
 int qdev_device_help(QemuOpts *opts);
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index b0f948d337..a7419226fe 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -202,7 +202,7 @@ static void __attribute__((__constructor__)) 
monitor_init_qmp_commands(void)
 qmp_init_marshal(_commands);
 
 qmp_register_command(_commands, "device_add",
- qmp_device_add, 0, 0);
+ qmp_device_add, QCO_COROUTINE, 0);
 
 QTAILQ_INIT(_cap_negotiation_commands);
 qmp_register_command(_cap_negotiation_commands, "qmp_capabilities",
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 74f4e41338..85ae62f7cf 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -839,8 +839,28 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict)
 qdev_print_devinfos(true);
 }
 
-void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
+typedef struct {
+Coroutine *co;
+QemuOpts *opts;
+Error **errp;
+DeviceState *dev;
+} QmpDeviceAdd;
+
+static void qmp_device_add_bh(void *opaque)
 {
+QmpDeviceAdd *data = opaque;
+
+data->dev = qdev_device_add(data->opts, data->errp);
+aio_co_wake(data->co);
+}
+
+void coroutine_fn
+qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
+{
+QmpDeviceAdd data = {
+.co = qemu_coroutine_self(),
+.errp = errp,
+};
 QemuOpts *opts;
 DeviceState *dev;
 
@@ -852,7 +872,13 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 qemu_opts_del(opts);
 return;
 }
-dev = qdev_device_add(opts, errp);
+
+/* Perform qdev_device_add() call outside coroutine context */
+data.opts = opts;
+aio_bh_schedule_oneshot(qemu_coroutine_get_aio_context(data.co),
+qmp_device_add_bh, );
+qemu_coroutine_yield();
+dev = data.dev;
 
 /*
  * Drain all pending RCU callbacks. This is done because
@@ -863,7 +889,7 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp)
  * will finish its job completely once qmp command returns result
  * to the user
  */
-drain_call_rcu();
+drain_call_rcu_co();
 
 if (!dev) {
 qemu_opts_del(opts);
@@ -956,7 +982,7 @@ void qmp_device_del(const char *id, Error **errp)
 }
 }
 
-void hmp_device_add(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_device_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2cbd0f77a0..c737d1fd64 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -695,6 +695,7 @@ ERST
 .params = "driver[,prop=value][,...]",
 .help   = "add device, 

[RFC 0/3] qmp: make qmp_device_add() a coroutine

2023-09-06 Thread Stefan Hajnoczi
It is not safe to call drain_call_rcu() from qmp_device_add() because
some call stacks are not prepared for drain_call_rcu() to drop the Big
QEMU Lock (BQL).

For example, device emulation code is protected by the BQL but when it
calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
BQL is dropped. See https://bugzilla.redhat.com/show_bug.cgi?id=2215192 for a
concrete bug of this type.

Another limitation of drain_call_rcu() is that it cannot be invoked within an
RCU read-side critical section since the reclamation phase cannot complete
until the end of the critical section. Unfortunately, call stacks have been
seen where this happens (see
https://bugzilla.redhat.com/show_bug.cgi?id=2214985).

This patch series introduces drain_call_rcu_co(), which does the same thing as
drain_call_rcu() but asynchronously. By yielding back to the event loop we can
wait until the caller drops the BQL and leaves its RCU read-side critical
section.

Patch 1 changes HMP so that coroutine monitor commands yield back to the event
loop instead of running inside a nested event loop.

Patch 2 introduces the new drain_call_rcu_co() API.

Patch 3 converts qmp_device_add() into a coroutine monitor command and uses
drain_call_rcu_co().

I'm sending this as an RFC because I don't have confirmation yet that the bugs
mentioned above are fixed by this patch series.

Stefan Hajnoczi (3):
  hmp: avoid the nested event loop in handle_hmp_command()
  rcu: add drain_call_rcu_co() API
  qmp: make qmp_device_add() a coroutine

 MAINTAINERS|  2 ++
 docs/devel/rcu.txt | 21 
 qapi/qdev.json |  1 +
 include/monitor/qdev.h |  3 ++-
 include/qemu/rcu.h |  1 +
 util/rcu-internal.h|  8 ++
 monitor/hmp.c  | 28 +++--
 monitor/qmp-cmds.c |  2 +-
 softmmu/qdev-monitor.c | 34 +++---
 util/rcu-co.c  | 55 ++
 util/rcu.c |  3 ++-
 hmp-commands.hx|  1 +
 util/meson.build   |  2 +-
 13 files changed, 140 insertions(+), 21 deletions(-)
 create mode 100644 util/rcu-internal.h
 create mode 100644 util/rcu-co.c

-- 
2.41.0




Re: [PULL 00/52] UI patches

2023-09-06 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature


Re: [PULL 00/13] linux-user patch queue

2023-09-06 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature


Re: [PULL 00/26] aspeed queue

2023-09-06 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature


Re: [PATCH v1 21/22] vfio/pci: Allow the selection of a given iommu backend

2023-09-06 Thread Jason Gunthorpe
On Wed, Aug 30, 2023 at 06:37:53PM +0800, Zhenzhong Duan wrote:
> Note the /dev/iommu device may have been pre-opened by a
> management tool such as libvirt. This mode is no more considered
> for the legacy backend. So let's remove the "TODO" comment.

Can you show an example of that syntax too?

Also, the vfio device should be openable externally as well

Jason



  1   2   3   >