date:20210611

Re: [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.

2021-06-11 Thread Markus Armbruster

Andrew Melnychenko  writes:

> New qmp command to query ebpf helper.
> It's crucial that qemu and helper are in sync and in touch.
> Technically helper should pass eBPF fds that qemu may accept.
> And different qemu's builds may have different eBPF programs and helpers.
> Qemu returns helper that should "fit" to virtio-net.
>
> Signed-off-by: Andrew Melnychenko 
> ---
>  monitor/qmp-cmds.c | 78 ++
>  qapi/misc.json | 29 +
>  2 files changed, 107 insertions(+)
>
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index f7d64a6457..5dd2a58ea2 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg, Error 
> **errp)
>  abort();
>  }
>  }
> +
> +#ifdef CONFIG_LINUX
> +
> +static const char *get_dirname(char *path)
> +{
> +char *sep;
> +
> +sep = strrchr(path, '/');
> +if (sep == path) {
> +return "/";
> +} else if (sep) {
> +*sep = 0;
> +return path;
> +}
> +return ".";
> +}
> +
> +static char *find_helper(const char *name)
> +{
> +char qemu_exec[PATH_MAX];
> +const char *qemu_dir = NULL;
> +char *helper = NULL;
> +
> +if (name == NULL) {
> +return NULL;
> +}
> +
> +if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
> +qemu_dir = get_dirname(qemu_exec);
> +
> +helper = g_strdup_printf("%s/%s", qemu_dir, name);
> +if (access(helper, F_OK) == 0) {
> +return helper;
> +}
> +g_free(helper);
> +}
> +
> +helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);
> +if (access(helper, F_OK) == 0) {
> +return helper;
> +}
> +g_free(helper);
> +
> +return NULL;
> +}

This returns the helper in the same directory as the running executable,
or as a fallback the helper in CONFIG_QEMU_HELPERDIR.

Checking F_OK (existence) instea of X_OK is odd.

It uses /proc/self/exe to find the running executable's directory.  This
is specific to Linux[*].  You get different behavior on Linux vs. other
systems.

CONFIG_QEMU_HELPERDIR is $prefix/libexec/.

If $prefix is /usr, then qemu-system-FOO is normally installed in
/usr/bin/, and the helper in /usr/libexec/.  We look for the helper in
the wrong place first, and the right one only when it isn't in the wrong
place.  Feels overcomplicated and fragile.

Consider the following scenario:

* The system has a binary package's /usr/bin/qemu-system-x86_64 and
  /usr/libexec/qemu-ebpf-rss-helper installed

* Alice builds her own QEMU with prefix /usr (and no intention to
  install), resulting in bld/qemu-system-x86_64, bld/qemu-ebpf-rss-path,
  and a symlink bld/x86_64-softmmu/qemu-system-x86_64.

Now:

* If Alice runs bld/qemu-system-x86_64, and the host is Linux,
  find_helper() returns bld/qemu-ebpf-rss-path.  Good.

* If the host isn't Linux, it returns /usr/libexec/qemu-ebpf-rss-helper.
  Not good.

* If Alice runs bld/x86_64-softmmu/qemu-system-x86_64, it also returns
  /usr/libexec/qemu-ebpf-rss-helper.  Not good.

> +
> +HelperPathList *qmp_query_helper_paths(Error **errp)
> +{
> +HelperPathList *ret = NULL;
> +const char *helpers_list[] = {
> +#ifdef CONFIG_EBPF
> +"qemu-ebpf-rss-helper",
> +#endif
> +NULL
> +};
> +const char **helper_iter = helpers_list;
> +
> +for (; *helper_iter != NULL; ++helper_iter) {
> +char *path = find_helper(*helper_iter);
> +if (path) {
> +HelperPath *helper = g_new0(HelperPath, 1);
> +helper->name = g_strdup(*helper_iter);
> +helper->path = path;
> +
> +QAPI_LIST_PREPEND(ret, helper);
> +}
> +}
> +
> +return ret;
> +}
> +#else
> +
> +HelperPathList *qmp_query_helper_paths(Error **errp)
> +{
> +return NULL;
> +}
> +
> +#endif
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 156f98203e..023bd2120d 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -519,3 +519,32 @@
>   'data': { '*option': 'str' },
>   'returns': ['CommandLineOptionInfo'],
>   'allow-preconfig': true }
> +
> +##
> +# @HelperPath:
> +#
> +# Name of the helper and binary location.
> +##
> +{ 'struct': 'HelperPath',
> +  'data': {'name': 'str', 'path': 'str'} }
> +
> +##
> +# @query-helper-paths:
> +#
> +# Query specific eBPF RSS helper for current qemu binary.
> +#
> +# Returns: list of object that contains name and path for helper.
> +#
> +# Example:
> +#
> +# -> { "execute": "query-helper-paths" }
> +# <- { "return": [
> +#{
> +#  "name": "qemu-ebpf-rss-helper",
> +#  "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> +#}
> +#  ]
> +#}
> +#
> +##
> +{ 'command': 'query-helper-paths', 'returns': ['HelperPath'] }

The name query-helper-paths is generic, the documented purpose "Query
specific eBPF RSS helper" is specific.

qemu-ebpf-rss-helper isn't necessarily the only helper that needs to be

[RFC PATCH V1 0/3] tpm: Eliminate TPM related code if CONFIG_TPM is not set

2021-06-11 Thread Stefan Berger

The following patches entirely elimiante TPM related code if CONFIG_TPM
is not set.

  Stefan

Stefan Berger (3):
  acpi: Eliminate all TPM related code if CONFIG_TPM is not set
  arm: Eliminate all TPM related code if CONFIG_TPM is not set
  sysemu: Make TPM structures inaccessible if CONFIG_TPM is not defined

 hw/acpi/aml-build.c  |  2 ++
 hw/arm/sysbus-fdt.c  |  4 
 hw/arm/virt-acpi-build.c |  6 ++
 hw/arm/virt.c|  2 ++
 hw/i386/acpi-build.c | 20 
 include/hw/acpi/tpm.h|  4 
 include/sysemu/tpm.h |  6 +-
 include/sysemu/tpm_backend.h |  6 +-
 stubs/tpm.c  |  4 
 9 files changed, 48 insertions(+), 6 deletions(-)

-- 
2.31.1

[RFC PATCH V1 1/3] acpi: Eliminate all TPM related code if CONFIG_TPM is not set

2021-06-11 Thread Stefan Berger

Cc: M: Michael S. Tsirkin 
Cc: Igor Mammedov 
Signed-off-by: Stefan Berger 
---
 hw/acpi/aml-build.c  |  2 ++
 hw/arm/virt-acpi-build.c |  2 ++
 hw/i386/acpi-build.c | 20 
 include/hw/acpi/tpm.h|  4 
 stubs/tpm.c  |  4 
 5 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f0035d2b4a..d5103e6d7b 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2044,6 +2044,7 @@ build_hdr:
  "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
 }
 
+#ifdef CONFIG_TPM
 /*
  * build_tpm2 - Build the TPM2 table as specified in
  * table 7: TCG Hardware Interface Description Table Format for TPM 2.0
@@ -2101,6 +2102,7 @@ void build_tpm2(GArray *table_data, BIOSLinker *linker, 
GArray *tcpalog,
  (void *)(table_data->data + tpm2_start),
  "TPM2", table_data->len - tpm2_start, 4, oem_id, 
oem_table_id);
 }
+#endif
 
 Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set, uint32_t io_offset,
uint32_t mmio32_offset, uint64_t mmio64_offset,
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 60fe2e65a7..6b3c1fdb0a 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -745,11 +745,13 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 build_iort(tables_blob, tables->linker, vms);
 }
 
+#ifdef CONFIG_TPM
 if (tpm_get_version(tpm_find()) == TPM_VERSION_2_0) {
 acpi_add_table(table_offsets, tables_blob);
 build_tpm2(tables_blob, tables->linker, tables->tcpalog, vms->oem_id,
vms->oem_table_id);
 }
+#endif
 
 /* XSDT is pointed to by RSDP */
 xsdt = tables_blob->len;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 80bee00da6..796ffc6f5c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -107,7 +107,9 @@ typedef struct AcpiPmInfo {
 typedef struct AcpiMiscInfo {
 bool is_piix4;
 bool has_hpet;
+#ifdef CONFIG_TPM
 TPMVersion tpm_version;
+#endif
 const unsigned char *dsdt_code;
 unsigned dsdt_size;
 uint16_t pvpanic_port;
@@ -286,7 +288,9 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
 }
 
 info->has_hpet = hpet_find();
+#ifdef CONFIG_TPM
 info->tpm_version = tpm_get_version(tpm_find());
+#endif
 info->pvpanic_port = pvpanic_port();
 info->applesmc_io_base = applesmc_port();
 }
@@ -1371,7 +1375,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 uint32_t nr_mem = machine->ram_slots;
 int root_bus_limit = 0xFF;
 PCIBus *bus = NULL;
+#ifdef CONFIG_TPM
 TPMIf *tpm = tpm_find();
+#endif
 int i;
 VMBusBridge *vmbus_bridge = vmbus_bridge_find();
 
@@ -1604,10 +1610,12 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 }
 }
 
+#ifdef CONFIG_TPM
 if (TPM_IS_TIS_ISA(tpm_find())) {
 aml_append(crs, aml_memory32_fixed(TPM_TIS_ADDR_BASE,
TPM_TIS_ADDR_SIZE, AML_READ_WRITE));
 }
+#endif
 aml_append(scope, aml_name_decl("_CRS", crs));
 
 /* reserve GPE0 block resources */
@@ -1753,6 +1761,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 /* Scan all PCI buses. Generate tables to support hotplug. */
 build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
 
+#ifdef CONFIG_TPM
 if (TPM_IS_TIS_ISA(tpm)) {
 if (misc->tpm_version == TPM_VERSION_2_0) {
 dev = aml_device("TPM");
@@ -1780,11 +1789,13 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 
 aml_append(scope, dev);
 }
+#endif
 
 aml_append(sb_scope, scope);
 }
 }
 
+#ifdef CONFIG_TPM
 if (TPM_IS_CRB(tpm)) {
 dev = aml_device("TPM");
 aml_append(dev, aml_name_decl("_HID", aml_string("MSFT0101")));
@@ -1799,6 +1810,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 
 aml_append(sb_scope, dev);
 }
+#endif
 
 aml_append(dsdt, sb_scope);
 
@@ -1828,6 +1840,7 @@ build_hpet(GArray *table_data, BIOSLinker *linker, const 
char *oem_id,
  "HPET", sizeof(*hpet), 1, oem_id, oem_table_id);
 }
 
+#ifdef CONFIG_TPM
 static void
 build_tpm_tcpa(GArray *table_data, BIOSLinker *linker, GArray *tcpalog,
const char *oem_id, const char *oem_table_id)
@@ -1854,6 +1867,7 @@ build_tpm_tcpa(GArray *table_data, BIOSLinker *linker, 
GArray *tcpalog,
  (void *)(table_data->data + tcpa_start),
  "TCPA", sizeof(*tcpa), 2, oem_id, oem_table_id);
 }
+#endif
 
 #define HOLE_640K_START  (640 * KiB)
 #define HOLE_640K_END   (1 * MiB)
@@ -2403,6 +2417,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 build_hpet(tables_blob, tables->linker, x86ms->oem_id,
x86ms->oem_table_id);
 }
+#ifdef CONFIG_TPM
 if (misc.tpm_version != TPM_VERSION_UNSPEC) {

[RFC PATCH V1 2/3] arm: Eliminate all TPM related code if CONFIG_TPM is not set

2021-06-11 Thread Stefan Berger

Cc: Peter Maydell 
Signed-off-by: Stefan Berger 
---
 hw/arm/sysbus-fdt.c  | 4 
 hw/arm/virt-acpi-build.c | 4 
 hw/arm/virt.c| 2 ++
 3 files changed, 10 insertions(+)

diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 6b6906f4cf..48c5fe9bf1 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -437,6 +437,7 @@ static bool vfio_platform_match(SysBusDevice *sbdev,
 
 #endif /* CONFIG_LINUX */
 
+#ifdef CONFIG_TPM
 /*
  * add_tpm_tis_fdt_node: Create a DT node for TPM TIS
  *
@@ -467,6 +468,7 @@ static int add_tpm_tis_fdt_node(SysBusDevice *sbdev, void 
*opaque)
 g_free(nodename);
 return 0;
 }
+#endif
 
 static int no_fdt_node(SysBusDevice *sbdev, void *opaque)
 {
@@ -488,7 +490,9 @@ static const BindingEntry bindings[] = {
 TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
 VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
 #endif
+#ifdef CONFIG_TPM
 TYPE_BINDING(TYPE_TPM_TIS_SYSBUS, add_tpm_tis_fdt_node),
+#endif
 TYPE_BINDING(TYPE_RAMFB_DEVICE, no_fdt_node),
 TYPE_BINDING("", NULL), /* last element */
 };
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 6b3c1fdb0a..f1024843dd 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -205,6 +205,7 @@ static void acpi_dsdt_add_gpio(Aml *scope, const 
MemMapEntry *gpio_memmap,
 aml_append(scope, dev);
 }
 
+#ifdef CONFIG_TPM
 static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms)
 {
 PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(vms->platform_bus_dev);
@@ -236,6 +237,7 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState 
*vms)
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 }
+#endif
 
 static void
 build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -642,7 +644,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 }
 
 acpi_dsdt_add_power_button(scope);
+#ifdef CONFIG_TPM
 acpi_dsdt_add_tpm(scope, vms);
+#endif
 
 aml_append(dsdt, scope);
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 840758666d..9122e22ee0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2599,7 +2599,9 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
+#ifdef CONFIG_TPM
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
+#endif
 mc->block_default_type = IF_VIRTIO;
 mc->no_cdrom = 1;
 mc->pci_allow_0_address = true;
-- 
2.31.1

[RFC PATCH V1 3/3] sysemu: Make TPM structures inaccessible if CONFIG_TPM is not defined

2021-06-11 Thread Stefan Berger

Signed-off-by: Stefan Berger 
---
 include/sysemu/tpm.h | 6 +-
 include/sysemu/tpm_backend.h | 6 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/tpm.h b/include/sysemu/tpm.h
index 1a85564e47..2ca3fa32ee 100644
--- a/include/sysemu/tpm.h
+++ b/include/sysemu/tpm.h
@@ -15,10 +15,12 @@
 #include "qapi/qapi-types-tpm.h"
 #include "qom/object.h"
 
-int tpm_config_parse(QemuOptsList *opts_list, const char *optarg);
 int tpm_init(void);
 void tpm_cleanup(void);
 
+#ifdef CONFIG_TPM
+int tpm_config_parse(QemuOptsList *opts_list, const char *optarg);
+
 typedef enum TPMVersion {
 TPM_VERSION_UNSPEC = 0,
 TPM_VERSION_1_2 = 1,
@@ -73,4 +75,6 @@ static inline TPMVersion tpm_get_version(TPMIf *ti)
 return TPM_IF_GET_CLASS(ti)->get_version(ti);
 }
 
+#endif /* CONFIG_TPM */
+
 #endif /* QEMU_TPM_H */
diff --git a/include/sysemu/tpm_backend.h b/include/sysemu/tpm_backend.h
index 6f078f5f48..8fd3269c11 100644
--- a/include/sysemu/tpm_backend.h
+++ b/include/sysemu/tpm_backend.h
@@ -18,6 +18,8 @@
 #include "sysemu/tpm.h"
 #include "qapi/error.h"
 
+#ifdef CONFIG_TPM
+
 #define TYPE_TPM_BACKEND "tpm-backend"
 OBJECT_DECLARE_TYPE(TPMBackend, TPMBackendClass,
 TPM_BACKEND)
@@ -209,4 +211,6 @@ TPMInfo *tpm_backend_query_tpm(TPMBackend *s);
 
 TPMBackend *qemu_find_tpm_be(const char *id);
 
-#endif
+#endif /* CONFIG_TPM */
+
+#endif /* TPM_BACKEND_H */
-- 
2.31.1

[PATCH v2 2/3] hw/nvme: namespace parameter for EUI-64

2021-06-11 Thread Heinrich Schuchardt

The EUI-64 field is the only identifier for NVMe namespaces in UEFI device
paths. Add a new namespace property "eui64", that provides the user the
option to specify the EUI-64.

Signed-off-by: Heinrich Schuchardt 
Acked-by: Klaus Jensen 
---
v2:
fix typo %s/EUI64/EUI-64/
---
 docs/system/nvme.rst |  4 +++
 hw/nvme/ctrl.c   | 58 ++--
 hw/nvme/ns.c |  2 ++
 hw/nvme/nvme.h   |  1 +
 4 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst
index f7f63d6bf6..b5f8288d7c 100644
--- a/docs/system/nvme.rst
+++ b/docs/system/nvme.rst
@@ -81,6 +81,10 @@ There are a number of parameters available:
   Set the UUID of the namespace. This will be reported as a "Namespace UUID"
   descriptor in the Namespace Identification Descriptor List.

+``eui64``
+  Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended
+  Unique Identifier" descriptor in the Namespace Identification Descriptor 
List.
+
 ``bus``
   If there are more ``nvme`` devices defined, this parameter may be used to
   attach the namespace to a specific ``nvme`` device (identified by an ``id``
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8dd9cb2ccb..f37c4fd635 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4436,19 +4436,19 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
*n, NvmeRequest *req)
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
 uint32_t nsid = le32_to_cpu(c->nsid);
 uint8_t list[NVME_IDENTIFY_DATA_SIZE] = {};
-
-struct data {
-struct {
-NvmeIdNsDescr hdr;
-uint8_t v[NVME_NIDL_UUID];
-} uuid;
-struct {
-NvmeIdNsDescr hdr;
-uint8_t v;
-} csi;
-};
-
-struct data *ns_descrs = (struct data *)list;
+uint8_t *pos = list;
+struct {
+NvmeIdNsDescr hdr;
+uint8_t v[NVME_NIDL_UUID];
+} QEMU_PACKED uuid;
+struct {
+NvmeIdNsDescr hdr;
+uint64_t v;
+} QEMU_PACKED eui64;
+struct {
+NvmeIdNsDescr hdr;
+uint8_t v;
+} QEMU_PACKED csi;

 trace_pci_nvme_identify_ns_descr_list(nsid);

@@ -4462,17 +4462,29 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
*n, NvmeRequest *req)
 }

 /*
- * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data
- * structure, a Namespace UUID (nidt = 3h) must be reported in the
- * Namespace Identification Descriptor. Add the namespace UUID here.
+ * If the EUI-64 field is 0 and the NGUID field is 0, the namespace must
+ * provide a valid Namespace UUID in the Namespace Identification 
Descriptor
+ * data structure. QEMU does not yet support setting NGUID.
  */
-ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID;
-ns_descrs->uuid.hdr.nidl = NVME_NIDL_UUID;
-memcpy(_descrs->uuid.v, ns->params.uuid.data, NVME_NIDL_UUID);
-
-ns_descrs->csi.hdr.nidt = NVME_NIDT_CSI;
-ns_descrs->csi.hdr.nidl = NVME_NIDL_CSI;
-ns_descrs->csi.v = ns->csi;
+uuid.hdr.nidt = NVME_NIDT_UUID;
+uuid.hdr.nidl = NVME_NIDL_UUID;
+memcpy(uuid.v, ns->params.uuid.data, NVME_NIDL_UUID);
+memcpy(pos, , sizeof(uuid));
+pos += sizeof(uuid);
+
+if (ns->params.eui64) {
+eui64.hdr.nidt = NVME_NIDT_EUI64;
+eui64.hdr.nidl = NVME_NIDL_EUI64;
+eui64.v = cpu_to_be64(ns->params.eui64);
+memcpy(pos, , sizeof(eui64));
+pos += sizeof(eui64);
+}
+
+csi.hdr.nidt = NVME_NIDT_CSI;
+csi.hdr.nidl = NVME_NIDL_CSI;
+csi.v = ns->csi;
+memcpy(pos, , sizeof(csi));
+pos += sizeof(csi);

 return nvme_c2h(n, list, sizeof(list), req);
 }
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 3fec9c6273..45e457de6a 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -77,6 +77,7 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 id_ns->mssrl = cpu_to_le16(ns->params.mssrl);
 id_ns->mcl = cpu_to_le32(ns->params.mcl);
 id_ns->msrc = ns->params.msrc;
+id_ns->eui64 = cpu_to_be64(ns->params.eui64);

 ds = 31 - clz32(ns->blkconf.logical_block_size);
 ms = ns->params.ms;
@@ -511,6 +512,7 @@ static Property nvme_ns_props[] = {
 DEFINE_PROP_BOOL("shared", NvmeNamespace, params.shared, false),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
+DEFINE_PROP_UINT64("eui64", NvmeNamespace, params.eui64, 0),
 DEFINE_PROP_UINT16("ms", NvmeNamespace, params.ms, 0),
 DEFINE_PROP_UINT8("mset", NvmeNamespace, params.mset, 0),
 DEFINE_PROP_UINT8("pi", NvmeNamespace, params.pi, 0),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 93a7e0e538..ac90e13d7b 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -83,6 +83,7 @@ typedef struct NvmeNamespaceParams {
 bool shared;
 uint32_t nsid;
 QemuUUID uuid;
+uint64_t eui64;

 uint16_t ms;
 uint8_t  mset;
--
2.30.2

[PULL 34/34] docs/devel: Explain in more detail the TB chaining mechanisms

2021-06-11 Thread Richard Henderson

From: Luis Pires 

Signed-off-by: Luis Pires 
Message-Id: <20210601125143.191165-1-luis.pi...@eldorado.org.br>
Signed-off-by: Richard Henderson 
---
 docs/devel/tcg.rst | 101 -
 1 file changed, 90 insertions(+), 11 deletions(-)

diff --git a/docs/devel/tcg.rst b/docs/devel/tcg.rst
index 4ebde44b9d..a65fb7b1c4 100644
--- a/docs/devel/tcg.rst
+++ b/docs/devel/tcg.rst
@@ -11,13 +11,14 @@ performances.
 QEMU's dynamic translation backend is called TCG, for "Tiny Code
 Generator". For more information, please take a look at ``tcg/README``.
 
-Some notable features of QEMU's dynamic translator are:
+The following sections outline some notable features and implementation
+details of QEMU's dynamic translator.
 
 CPU state optimisations
 ---
 
-The target CPUs have many internal states which change the way it
-evaluates instructions. In order to achieve a good speed, the
+The target CPUs have many internal states which change the way they
+evaluate instructions. In order to achieve a good speed, the
 translation phase considers that some state information of the virtual
 CPU cannot change in it. The state is recorded in the Translation
 Block (TB). If the state changes (e.g. privilege level), a new TB will
@@ -31,17 +32,95 @@ Direct block chaining
 -
 
 After each translated basic block is executed, QEMU uses the simulated
-Program Counter (PC) and other cpu state information (such as the CS
+Program Counter (PC) and other CPU state information (such as the CS
 segment base value) to find the next basic block.
 
-In order to accelerate the most common cases where the new simulated PC
-is known, QEMU can patch a basic block so that it jumps directly to the
-next one.
+In its simplest, less optimized form, this is done by exiting from the
+current TB, going through the TB epilogue, and then back to the
+main loop. That’s where QEMU looks for the next TB to execute,
+translating it from the guest architecture if it isn’t already available
+in memory. Then QEMU proceeds to execute this next TB, starting at the
+prologue and then moving on to the translated instructions.
 
-The most portable code uses an indirect jump. An indirect jump makes
-it easier to make the jump target modification atomic. On some host
-architectures (such as x86 or PowerPC), the ``JUMP`` opcode is
-directly patched so that the block chaining has no overhead.
+Exiting from the TB this way will cause the ``cpu_exec_interrupt()``
+callback to be re-evaluated before executing additional instructions.
+It is mandatory to exit this way after any CPU state changes that may
+unmask interrupts.
+
+In order to accelerate the cases where the TB for the new
+simulated PC is already available, QEMU has mechanisms that allow
+multiple TBs to be chained directly, without having to go back to the
+main loop as described above. These mechanisms are:
+
+``lookup_and_goto_ptr``
+^^^
+
+Calling ``tcg_gen_lookup_and_goto_ptr()`` will emit a call to
+``helper_lookup_tb_ptr``. This helper will look for an existing TB that
+matches the current CPU state. If the destination TB is available its
+code address is returned, otherwise the address of the JIT epilogue is
+returned. The call to the helper is always followed by the tcg ``goto_ptr``
+opcode, which branches to the returned address. In this way, we either
+branch to the next TB or return to the main loop.
+
+``goto_tb + exit_tb``
+^
+
+The translation code usually implements branching by performing the
+following steps:
+
+1. Call ``tcg_gen_goto_tb()`` passing a jump slot index (either 0 or 1)
+   as a parameter.
+
+2. Emit TCG instructions to update the CPU state with any information
+   that has been assumed constant and is required by the main loop to
+   correctly locate and execute the next TB. For most guests, this is
+   just the PC of the branch destination, but others may store additional
+   data. The information updated in this step must be inferable from both
+   ``cpu_get_tb_cpu_state()`` and ``cpu_restore_state()``.
+
+3. Call ``tcg_gen_exit_tb()`` passing the address of the current TB and
+   the jump slot index again.
+
+Step 1, ``tcg_gen_goto_tb()``, will emit a ``goto_tb`` TCG
+instruction that later on gets translated to a jump to an address
+associated with the specified jump slot. Initially, this is the address
+of step 2's instructions, which update the CPU state information. Step 3,
+``tcg_gen_exit_tb()``, exits from the current TB returning a tagged
+pointer composed of the last executed TB’s address and the jump slot
+index.
+
+The first time this whole sequence is executed, step 1 simply jumps
+to step 2. Then the CPU state information gets updated and we exit from
+the current TB. As a result, the behavior is very similar to the less
+optimized form described earlier in this section.
+
+Next, the main loop looks for the next TB to execute using the
+current CPU

[PATCH v2 1/3] hw: virt: consider hw_compat_6_0

2021-06-11 Thread Heinrich Schuchardt

virt-6.0 must consider hw_compat_6_0.

Fixes: da7e13c00b59 ("hw: add compat machines for 6.1")
Signed-off-by: Heinrich Schuchardt 
Reviewed-by: Cornelia Huck 
---
v2:
add missing Fixes: tag
---
 hw/arm/virt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 840758666d..8bc3b408fe 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2764,6 +2764,8 @@ DEFINE_VIRT_MACHINE_AS_LATEST(6, 1)

 static void virt_machine_6_0_options(MachineClass *mc)
 {
+virt_machine_6_1_options(mc);
+compat_props_add(mc->compat_props, hw_compat_6_0, hw_compat_6_0_len);
 }
 DEFINE_VIRT_MACHINE(6, 0)

--
2.30.2

[PULL 32/34] tcg/arm: Fix tcg_out_op function signature

2021-06-11 Thread Richard Henderson

From: "Jose R. Ziviani" 

Commit 5e8892db93 fixed several function signatures but tcg_out_op for
arm is missing. This patch fixes it as well.

Signed-off-by: Jose R. Ziviani 
Message-Id: <20210610224450.23425-1-jzivi...@suse.de>
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.c.inc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index f4c9cb8f9f..5157143246 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1984,7 +1984,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 static void tcg_out_epilogue(TCGContext *s);
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
-const TCGArg *args, const int *const_args)
+const TCGArg args[TCG_MAX_OP_ARGS],
+const int const_args[TCG_MAX_OP_ARGS])
 {
 TCGArg a0, a1, a2, a3, a4, a5;
 int c;
-- 
2.25.1

[PATCH v2 3/3] hw/nvme: default for namespace EUI-64

2021-06-11 Thread Heinrich Schuchardt

On machines with version > 6.0 replace a missing EUI-64 by a generated
value.

Signed-off-by: Heinrich Schuchardt 
---
v2:
new patch
---
 docs/system/nvme.rst | 2 ++
 hw/core/machine.c| 1 +
 hw/nvme/ns.c | 9 +
 hw/nvme/nvme.h   | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst
index b5f8288d7c..33a15c7dbc 100644
--- a/docs/system/nvme.rst
+++ b/docs/system/nvme.rst
@@ -84,6 +84,8 @@ There are a number of parameters available:
 ``eui64``
   Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended
   Unique Identifier" descriptor in the Namespace Identification Descriptor 
List.
+  Since machine type 6.1 a non-zero default value is used if the parameter
+  is not provided. For earlier machine types the field defaults to 0.

 ``bus``
   If there are more ``nvme`` devices defined, this parameter may be used to
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 55b9bc7817..d0e934 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -39,6 +39,7 @@
 GlobalProperty hw_compat_6_0[] = {
 { "gpex-pcihost", "allow-unmapped-accesses", "false" },
 { "i8042", "extended-state", "false"},
+{ "nvme-ns", "eui64-default", "off"},
 };
 const size_t hw_compat_6_0_len = G_N_ELEMENTS(hw_compat_6_0);

diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 45e457de6a..4275c3db63 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -56,6 +56,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)

 static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 {
+static uint64_t ns_count;
 NvmeIdNs *id_ns = >id_ns;
 uint8_t ds;
 uint16_t ms;
@@ -73,6 +74,12 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 id_ns->nmic |= NVME_NMIC_NS_SHARED;
 }

+/* Substitute a missing EUI-64 by an autogenerated one */
+++ns_count;
+if (!ns->params.eui64 && ns->params.eui64_default) {
+ns->params.eui64 = ns_count + NVME_EUI64_DEFAULT;
+}
+
 /* simple copy */
 id_ns->mssrl = cpu_to_le16(ns->params.mssrl);
 id_ns->mcl = cpu_to_le32(ns->params.mcl);
@@ -533,6 +540,8 @@ static Property nvme_ns_props[] = {
params.max_open_zones, 0),
 DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace,
params.zd_extension_size, 0),
+DEFINE_PROP_BOOL("eui64-default", NvmeNamespace, params.eui64_default,
+ true),
 DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index ac90e13d7b..3fb869731d 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -26,6 +26,7 @@

 #define NVME_MAX_CONTROLLERS 32
 #define NVME_MAX_NAMESPACES  256
+#define NVME_EUI64_DEFAULT 0x27fed9272381cbd0UL

 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
@@ -84,6 +85,7 @@ typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 QemuUUID uuid;
 uint64_t eui64;
+bool eui64_default;

 uint16_t ms;
 uint8_t  mset;
--
2.30.2

[PATCH v2 0/3] hw/nvme: namespace parameter for EUI-64

2021-06-11 Thread Heinrich Schuchardt

The EUI-64 field is the only identifier for NVMe namespaces in UEFI device
paths. Add a new namespace property "eui64", that provides the user the
option to specify the EUI-64.

v2:
include patch for hw_compat_6_0
add a patch to implement default values for the EUI-64

Heinrich Schuchardt (3):
  hw: virt: consider hw_compat_6_0
  hw/nvme: namespace parameter for EUI-64
  hw/nvme: default for namespace EUI-64

 docs/system/nvme.rst |  6 +
 hw/arm/virt.c|  2 ++
 hw/core/machine.c|  1 +
 hw/nvme/ctrl.c   | 58 ++--
 hw/nvme/ns.c | 11 +
 hw/nvme/nvme.h   |  3 +++
 6 files changed, 58 insertions(+), 23 deletions(-)

--
2.30.2

[PULL 25/34] util/osdep: Add qemu_mprotect_rw

2021-06-11 Thread Richard Henderson

For --enable-tcg-interpreter on Windows, we will need this.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/qemu/osdep.h | 1 +
 util/osdep.c | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 4c6f2390be..236a045671 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -512,6 +512,7 @@ void sigaction_invoke(struct sigaction *action,
 #endif
 
 int qemu_madvise(void *addr, size_t len, int advice);
+int qemu_mprotect_rw(void *addr, size_t size);
 int qemu_mprotect_rwx(void *addr, size_t size);
 int qemu_mprotect_none(void *addr, size_t size);
 
diff --git a/util/osdep.c b/util/osdep.c
index 66d01b9160..42a0a4986a 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -97,6 +97,15 @@ static int qemu_mprotect__osdep(void *addr, size_t size, int 
prot)
 #endif
 }
 
+int qemu_mprotect_rw(void *addr, size_t size)
+{
+#ifdef _WIN32
+return qemu_mprotect__osdep(addr, size, PAGE_READWRITE);
+#else
+return qemu_mprotect__osdep(addr, size, PROT_READ | PROT_WRITE);
+#endif
+}
+
 int qemu_mprotect_rwx(void *addr, size_t size)
 {
 #ifdef _WIN32
-- 
2.25.1

[PULL 33/34] softfloat: Fix tp init in float32_exp2

2021-06-11 Thread Richard Henderson

Typo in the conversion to FloatParts64.

Fixes: 572c4d862ff2
Fixes: Coverity CID 1457457
Signed-off-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Message-Id: <20210607223812.110596-1-richard.hender...@linaro.org>
Signed-off-by: Richard Henderson 
---
 fpu/softfloat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1cb162882b..4d0160fe9c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -4818,7 +4818,7 @@ float32 float32_exp2(float32 a, float_status *status)
 
 float_raise(float_flag_inexact, status);
 
-float64_unpack_canonical(, float64_ln2, status);
+float64_unpack_canonical(, float64_ln2, status);
 xp = *parts_mul(, , status);
 xnp = xp;
 
-- 
2.25.1

[PULL 29/34] tcg: Move tcg_init_ctx and tcg_ctx from accel/tcg/

2021-06-11 Thread Richard Henderson

These variables belong to the jit side, not the user side.

Since tcg_init_ctx is no longer used outside of tcg/, move
the declaration to tcg-internal.h.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h | 1 -
 tcg/tcg-internal.h| 1 +
 accel/tcg/translate-all.c | 3 ---
 tcg/tcg.c | 3 +++
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index f48b5aa166..e95abac9f4 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -689,7 +689,6 @@ static inline bool temp_readonly(TCGTemp *ts)
 return ts->kind >= TEMP_FIXED;
 }
 
-extern TCGContext tcg_init_ctx;
 extern __thread TCGContext *tcg_ctx;
 extern const void *tcg_code_gen_epilogue;
 extern uintptr_t tcg_splitwx_diff;
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index f9906523da..181f86507a 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -27,6 +27,7 @@
 
 #define TCG_HIGHWATER 1024
 
+extern TCGContext tcg_init_ctx;
 extern TCGContext **tcg_ctxs;
 extern unsigned int tcg_cur_ctxs;
 extern unsigned int tcg_max_ctxs;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 59609d62d5..7929a7e320 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -218,9 +218,6 @@ static int v_l2_levels;
 
 static void *l1_map[V_L1_MAX_SIZE];
 
-/* code generation context */
-TCGContext tcg_init_ctx;
-__thread TCGContext *tcg_ctx;
 TBContext tb_ctx;
 
 static void page_table_config_init(void)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 4bb35b455b..81da553244 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -149,6 +149,9 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct);
 static int tcg_out_ldst_finalize(TCGContext *s);
 #endif
 
+TCGContext tcg_init_ctx;
+__thread TCGContext *tcg_ctx;
+
 TCGContext **tcg_ctxs;
 unsigned int tcg_cur_ctxs;
 unsigned int tcg_max_ctxs;
-- 
2.25.1

[PULL 24/34] tcg: Sink qemu_madvise call to common code

2021-06-11 Thread Richard Henderson

Move the call out of the N versions of alloc_code_gen_buffer
and into tcg_region_init.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index e6c80b35b1..2e541cd2bf 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -559,7 +559,6 @@ static int alloc_code_gen_buffer(size_t tb_size, int 
splitwx, Error **errp)
 error_setg_errno(errp, errno, "mprotect of jit buffer");
 return false;
 }
-qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
 region.start_aligned = buf;
 region.total_size = size;
@@ -635,9 +634,6 @@ static int alloc_code_gen_buffer_anon(size_t size, int prot,
 }
 #endif
 
-/* Request large pages for the buffer.  */
-qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
-
 region.start_aligned = buf;
 region.total_size = size;
 return prot;
@@ -687,9 +683,6 @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t 
size, Error **errp)
 region.total_size = size;
 tcg_splitwx_diff = buf_rx - buf_rw;
 
-/* Request large pages for the buffer and the splitwx.  */
-qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE);
-qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
 return PROT_READ | PROT_WRITE;
 
  fail_rx:
@@ -857,6 +850,13 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
   splitwx, _fatal);
 assert(have_prot >= 0);
 
+/* Request large pages for the buffer and the splitwx.  */
+qemu_madvise(region.start_aligned, region.total_size, QEMU_MADV_HUGEPAGE);
+if (tcg_splitwx_diff) {
+qemu_madvise(region.start_aligned + tcg_splitwx_diff,
+ region.total_size, QEMU_MADV_HUGEPAGE);
+}
+
 /*
  * Make region_size a multiple of page_size, using aligned as the start.
  * As a result of this we might end up with a few extra pages at the end of
-- 
2.25.1

[PULL 30/34] tcg: Introduce tcg_remove_ops_after

2021-06-11 Thread Richard Henderson

Introduce a function to remove everything emitted
since a given point.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h | 10 ++
 tcg/tcg.c | 13 +
 2 files changed, 23 insertions(+)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index e95abac9f4..1d056ed0ed 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -1071,6 +1071,16 @@ void tcg_op_remove(TCGContext *s, TCGOp *op);
 TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc);
 TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc);
 
+/**
+ * tcg_remove_ops_after:
+ * @op: target operation
+ *
+ * Discard any opcodes emitted since @op.  Expected usage is to save
+ * a starting point with tcg_last_op(), speculatively emit opcodes,
+ * then decide whether or not to keep those opcodes after the fact.
+ */
+void tcg_remove_ops_after(TCGOp *op);
+
 void tcg_optimize(TCGContext *s);
 
 /* Allocate a new temporary and initialize it with a constant. */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 81da553244..ca482c2301 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2083,6 +2083,19 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
 #endif
 }
 
+void tcg_remove_ops_after(TCGOp *op)
+{
+TCGContext *s = tcg_ctx;
+
+while (true) {
+TCGOp *last = tcg_last_op();
+if (last == op) {
+return;
+}
+tcg_op_remove(s, last);
+}
+}
+
 static TCGOp *tcg_op_alloc(TCGOpcode opc)
 {
 TCGContext *s = tcg_ctx;
-- 
2.25.1

[PULL 22/34] tcg: Allocate code_gen_buffer into struct tcg_region_state

2021-06-11 Thread Richard Henderson

Do not mess around with setting values within tcg_init_ctx.
Put the values into 'region' directly, which is where they
will live for the lifetime of the program.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 64 ++--
 1 file changed, 27 insertions(+), 37 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index 5beba41412..ed7efba4b4 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -70,13 +70,12 @@ static size_t tree_size;
 
 bool in_code_gen_buffer(const void *p)
 {
-const TCGContext *s = _init_ctx;
 /*
  * Much like it is valid to have a pointer to the byte past the
  * end of an array (so long as you don't dereference it), allow
  * a pointer to the byte past the end of the code gen buffer.
  */
-return (size_t)(p - s->code_gen_buffer) <= s->code_gen_buffer_size;
+return (size_t)(p - region.start_aligned) <= region.total_size;
 }
 
 #ifdef CONFIG_DEBUG_TCG
@@ -562,8 +561,8 @@ static bool alloc_code_gen_buffer(size_t tb_size, int 
splitwx, Error **errp)
 }
 qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
-tcg_ctx->code_gen_buffer = buf;
-tcg_ctx->code_gen_buffer_size = size;
+region.start_aligned = buf;
+region.total_size = size;
 return true;
 }
 #elif defined(_WIN32)
@@ -584,8 +583,8 @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, 
Error **errp)
 return false;
 }
 
-tcg_ctx->code_gen_buffer = buf;
-tcg_ctx->code_gen_buffer_size = size;
+region.start_aligned = buf;
+region.total_size = size;
 return true;
 }
 #else
@@ -637,8 +636,8 @@ static bool alloc_code_gen_buffer_anon(size_t size, int 
prot,
 /* Request large pages for the buffer.  */
 qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
-tcg_ctx->code_gen_buffer = buf;
-tcg_ctx->code_gen_buffer_size = size;
+region.start_aligned = buf;
+region.total_size = size;
 return true;
 }
 
@@ -659,8 +658,8 @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t 
size, Error **errp)
 return false;
 }
 /* The size of the mapping may have been adjusted. */
-size = tcg_ctx->code_gen_buffer_size;
-buf_rx = tcg_ctx->code_gen_buffer;
+buf_rx = region.start_aligned;
+size = region.total_size;
 #endif
 
 buf_rw = qemu_memfd_alloc("tcg-jit", size, 0, , errp);
@@ -682,8 +681,8 @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t 
size, Error **errp)
 #endif
 
 close(fd);
-tcg_ctx->code_gen_buffer = buf_rw;
-tcg_ctx->code_gen_buffer_size = size;
+region.start_aligned = buf_rw;
+region.total_size = size;
 tcg_splitwx_diff = buf_rx - buf_rw;
 
 /* Request large pages for the buffer and the splitwx.  */
@@ -734,7 +733,7 @@ static bool alloc_code_gen_buffer_splitwx_vmremap(size_t 
size, Error **errp)
 return false;
 }
 
-buf_rw = (mach_vm_address_t)tcg_ctx->code_gen_buffer;
+buf_rw = region.start_aligned;
 buf_rx = 0;
 ret = mach_vm_remap(mach_task_self(),
 _rx,
@@ -846,11 +845,8 @@ static bool alloc_code_gen_buffer(size_t size, int 
splitwx, Error **errp)
  */
 void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
 {
-void *buf, *aligned, *end;
-size_t total_size;
 size_t page_size;
 size_t region_size;
-size_t n_regions;
 size_t i;
 bool ok;
 
@@ -858,39 +854,33 @@ void tcg_region_init(size_t tb_size, int splitwx, 
unsigned max_cpus)
splitwx, _fatal);
 assert(ok);
 
-buf = tcg_init_ctx.code_gen_buffer;
-total_size = tcg_init_ctx.code_gen_buffer_size;
-page_size = qemu_real_host_page_size;
-n_regions = tcg_n_regions(total_size, max_cpus);
-
-/* The first region will be 'aligned - buf' bytes larger than the others */
-aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
-g_assert(aligned < tcg_init_ctx.code_gen_buffer + total_size);
-
 /*
  * Make region_size a multiple of page_size, using aligned as the start.
  * As a result of this we might end up with a few extra pages at the end of
  * the buffer; we will assign those to the last region.
  */
-region_size = (total_size - (aligned - buf)) / n_regions;
+region.n = tcg_n_regions(region.total_size, max_cpus);
+page_size = qemu_real_host_page_size;
+region_size = region.total_size / region.n;
 region_size = QEMU_ALIGN_DOWN(region_size, page_size);
 
 /* A region must have at least 2 pages; one code, one guard */
 g_assert(region_size >= 2 * page_size);
+region.stride = region_size;
+
+/* Reserve space for guard pages. */
+region.size = region_size - page_size;
+region.total_size -= page_size;
+
+/*
+ * The first region will be smaller than the others, via the prologue,
+ * which has yet to be allocated.  For now, the first region begins at
+ * the page boundary.
+ */
+

[PULL 21/34] tcg: Move in_code_gen_buffer and tests to region.c

2021-06-11 Thread Richard Henderson

Shortly, the full code_gen_buffer will only be visible
to region.c, so move in_code_gen_buffer out-of-line.

Move the debugging versions of tcg_splitwx_to_{rx,rw}
to region.c as well, so that the compiler gets to see
the implementation of in_code_gen_buffer.

This leaves exactly one use of in_code_gen_buffer outside
of region.c, in cpu_restore_state.  Which, being on the
exception path, is not performance critical.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h | 11 +--
 tcg/region.c  | 34 ++
 tcg/tcg.c | 23 ---
 3 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 2316a64139..f48b5aa166 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -695,16 +695,7 @@ extern const void *tcg_code_gen_epilogue;
 extern uintptr_t tcg_splitwx_diff;
 extern TCGv_env cpu_env;
 
-static inline bool in_code_gen_buffer(const void *p)
-{
-const TCGContext *s = _init_ctx;
-/*
- * Much like it is valid to have a pointer to the byte past the
- * end of an array (so long as you don't dereference it), allow
- * a pointer to the byte past the end of the code gen buffer.
- */
-return (size_t)(p - s->code_gen_buffer) <= s->code_gen_buffer_size;
-}
+bool in_code_gen_buffer(const void *p);
 
 #ifdef CONFIG_DEBUG_TCG
 const void *tcg_splitwx_to_rx(void *rw);
diff --git a/tcg/region.c b/tcg/region.c
index 445a278702..5beba41412 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -68,6 +68,40 @@ static struct tcg_region_state region;
 static void *region_trees;
 static size_t tree_size;
 
+bool in_code_gen_buffer(const void *p)
+{
+const TCGContext *s = _init_ctx;
+/*
+ * Much like it is valid to have a pointer to the byte past the
+ * end of an array (so long as you don't dereference it), allow
+ * a pointer to the byte past the end of the code gen buffer.
+ */
+return (size_t)(p - s->code_gen_buffer) <= s->code_gen_buffer_size;
+}
+
+#ifdef CONFIG_DEBUG_TCG
+const void *tcg_splitwx_to_rx(void *rw)
+{
+/* Pass NULL pointers unchanged. */
+if (rw) {
+g_assert(in_code_gen_buffer(rw));
+rw += tcg_splitwx_diff;
+}
+return rw;
+}
+
+void *tcg_splitwx_to_rw(const void *rx)
+{
+/* Pass NULL pointers unchanged. */
+if (rx) {
+rx -= tcg_splitwx_diff;
+/* Assert that we end with a pointer in the rw region. */
+g_assert(in_code_gen_buffer(rx));
+}
+return (void *)rx;
+}
+#endif /* CONFIG_DEBUG_TCG */
+
 /* compare a pointer @ptr and a tb_tc @s */
 static int ptr_cmp_tb_tc(const void *ptr, const struct tb_tc *s)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 9880d5205e..4bb35b455b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -416,29 +416,6 @@ static const TCGTargetOpDef constraint_sets[] = {
 
 #include "tcg-target.c.inc"
 
-#ifdef CONFIG_DEBUG_TCG
-const void *tcg_splitwx_to_rx(void *rw)
-{
-/* Pass NULL pointers unchanged. */
-if (rw) {
-g_assert(in_code_gen_buffer(rw));
-rw += tcg_splitwx_diff;
-}
-return rw;
-}
-
-void *tcg_splitwx_to_rw(const void *rx)
-{
-/* Pass NULL pointers unchanged. */
-if (rx) {
-rx -= tcg_splitwx_diff;
-/* Assert that we end with a pointer in the rw region. */
-g_assert(in_code_gen_buffer(rx));
-}
-return (void *)rx;
-}
-#endif /* CONFIG_DEBUG_TCG */
-
 static void alloc_tcg_plugin_context(TCGContext *s)
 {
 #ifdef CONFIG_PLUGIN
-- 
2.25.1

[PULL 28/34] tcg: When allocating for !splitwx, begin with PROT_NONE

2021-06-11 Thread Richard Henderson

There's a change in mprotect() behaviour [1] in the latest macOS
on M1 and it's not yet clear if it's going to be fixed by Apple.

In this case, instead of changing permissions of N guard pages,
we change permissions of N rwx regions.  The same number of
syscalls are required either way.

[1] https://gist.github.com/hikalium/75ae822466ee4da13cbbe486498a191f

Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index 294cbd8e65..0f6808afdb 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -770,12 +770,15 @@ static int alloc_code_gen_buffer(size_t size, int 
splitwx, Error **errp)
 error_free_or_abort(errp);
 }
 
-prot = PROT_READ | PROT_WRITE | PROT_EXEC;
+/*
+ * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
+ * rejects a permission change from RWX -> NONE when reserving the
+ * guard pages later.  We can go the other way with the same number
+ * of syscalls, so always begin with PROT_NONE.
+ */
+prot = PROT_NONE;
 flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#ifdef CONFIG_TCG_INTERPRETER
-/* The tcg interpreter does not need execute permission. */
-prot = PROT_READ | PROT_WRITE;
-#elif defined(CONFIG_DARWIN)
+#ifdef CONFIG_DARWIN
 /* Applicable to both iOS and macOS (Apple Silicon). */
 if (!splitwx) {
 flags |= MAP_JIT;
@@ -906,11 +909,7 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
 }
 }
 if (have_prot != 0) {
-/*
- * macOS 11.2 has a bug (Apple Feedback FB8994773) in which 
mprotect
- * rejects a permission change from RWX -> NONE.  Guard pages are
- * nice for bug detection but are not essential; ignore any 
failure.
- */
+/* Guard pages are nice for bug detection but are not essential. */
 (void)qemu_mprotect_none(end, page_size);
 }
 }
-- 
2.25.1

[PULL 18/34] tcg: Rename region.start to region.after_prologue

2021-06-11 Thread Richard Henderson

Give the field a name reflecting its actual meaning.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index 7a34c96d74..b143eaf69c 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -46,8 +46,8 @@ struct tcg_region_state {
 QemuMutex lock;
 
 /* fields set at init time */
-void *start;
 void *start_aligned;
+void *after_prologue;
 size_t n;
 size_t size; /* size of one region */
 size_t stride; /* .size + guard size */
@@ -276,7 +276,7 @@ static void tcg_region_bounds(size_t curr_region, void 
**pstart, void **pend)
 end = start + region.size;
 
 if (curr_region == 0) {
-start = region.start;
+start = region.after_prologue;
 }
 /* The final region may have a few extra pages due to earlier rounding. */
 if (curr_region == region.n - 1) {
@@ -855,7 +855,7 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
 region.n = n_regions;
 region.size = region_size - page_size;
 region.stride = region_size;
-region.start = buf;
+region.after_prologue = buf;
 region.start_aligned = aligned;
 /* page-align the end, since its last page will be a guard page */
 end = QEMU_ALIGN_PTR_DOWN(buf + total_size, page_size);
@@ -895,15 +895,16 @@ void tcg_region_init(size_t tb_size, int splitwx, 
unsigned max_cpus)
 void tcg_region_prologue_set(TCGContext *s)
 {
 /* Deduct the prologue from the first region.  */
-g_assert(region.start == s->code_gen_buffer);
-region.start = s->code_ptr;
+g_assert(region.start_aligned == s->code_gen_buffer);
+region.after_prologue = s->code_ptr;
 
 /* Recompute boundaries of the first region. */
 tcg_region_assign(s, 0);
 
 /* Register the balance of the buffer with gdb. */
-tcg_register_jit(tcg_splitwx_to_rx(region.start),
- region.start_aligned + region.total_size - region.start);
+tcg_register_jit(tcg_splitwx_to_rx(region.after_prologue),
+ region.start_aligned + region.total_size -
+ region.after_prologue);
 }
 
 /*
-- 
2.25.1

[PULL 26/34] tcg: Round the tb_size default from qemu_get_host_physmem

2021-06-11 Thread Richard Henderson

If qemu_get_host_physmem returns an odd number of pages,
then physmem / 8 will not be a multiple of the page size.

The following was observed on a gitlab runner:

ERROR qtest-arm/boot-serial-test - Bail out!
ERROR:../util/osdep.c:80:qemu_mprotect__osdep: \
  assertion failed: (!(size & ~qemu_real_host_page_mask))

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 47 +--
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index 2e541cd2bf..e1790ce1e4 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -470,26 +470,6 @@ static size_t tcg_n_regions(size_t tb_size, unsigned 
max_cpus)
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE)
 
-static size_t size_code_gen_buffer(size_t tb_size)
-{
-/* Size the buffer.  */
-if (tb_size == 0) {
-size_t phys_mem = qemu_get_host_physmem();
-if (phys_mem == 0) {
-tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
-} else {
-tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, phys_mem / 8);
-}
-}
-if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
-tb_size = MIN_CODE_GEN_BUFFER_SIZE;
-}
-if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
-tb_size = MAX_CODE_GEN_BUFFER_SIZE;
-}
-return tb_size;
-}
-
 #ifdef __mips__
 /*
  * In order to use J and JAL within the code_gen_buffer, we require
@@ -841,13 +821,29 @@ static int alloc_code_gen_buffer(size_t size, int 
splitwx, Error **errp)
  */
 void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
 {
-size_t page_size;
+const size_t page_size = qemu_real_host_page_size;
 size_t region_size;
 size_t i;
 int have_prot;
 
-have_prot = alloc_code_gen_buffer(size_code_gen_buffer(tb_size),
-  splitwx, _fatal);
+/* Size the buffer.  */
+if (tb_size == 0) {
+size_t phys_mem = qemu_get_host_physmem();
+if (phys_mem == 0) {
+tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
+} else {
+tb_size = QEMU_ALIGN_DOWN(phys_mem / 8, page_size);
+tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, tb_size);
+}
+}
+if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
+tb_size = MIN_CODE_GEN_BUFFER_SIZE;
+}
+if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
+tb_size = MAX_CODE_GEN_BUFFER_SIZE;
+}
+
+have_prot = alloc_code_gen_buffer(tb_size, splitwx, _fatal);
 assert(have_prot >= 0);
 
 /* Request large pages for the buffer and the splitwx.  */
@@ -862,9 +858,8 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
  * As a result of this we might end up with a few extra pages at the end of
  * the buffer; we will assign those to the last region.
  */
-region.n = tcg_n_regions(region.total_size, max_cpus);
-page_size = qemu_real_host_page_size;
-region_size = region.total_size / region.n;
+region.n = tcg_n_regions(tb_size, max_cpus);
+region_size = tb_size / region.n;
 region_size = QEMU_ALIGN_DOWN(region_size, page_size);
 
 /* A region must have at least 2 pages; one code, one guard */
-- 
2.25.1

[PULL 27/34] tcg: Merge buffer protection and guard page protection

2021-06-11 Thread Richard Henderson

Do not handle protections on a case-by-case basis in the
various alloc_code_gen_buffer instances; do it within a
single loop in tcg_region_init.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index e1790ce1e4..294cbd8e65 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -535,11 +535,6 @@ static int alloc_code_gen_buffer(size_t tb_size, int 
splitwx, Error **errp)
 }
 #endif
 
-if (qemu_mprotect_rwx(buf, size)) {
-error_setg_errno(errp, errno, "mprotect of jit buffer");
-return false;
-}
-
 region.start_aligned = buf;
 region.total_size = size;
 
@@ -823,8 +818,7 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
 {
 const size_t page_size = qemu_real_host_page_size;
 size_t region_size;
-size_t i;
-int have_prot;
+int have_prot, need_prot;
 
 /* Size the buffer.  */
 if (tb_size == 0) {
@@ -884,18 +878,41 @@ void tcg_region_init(size_t tb_size, int splitwx, 
unsigned max_cpus)
  * Set guard pages in the rw buffer, as that's the one into which
  * buffer overruns could occur.  Do not set guard pages in the rx
  * buffer -- let that one use hugepages throughout.
+ * Work with the page protections set up with the initial mapping.
  */
-for (i = 0; i < region.n; i++) {
+need_prot = PAGE_READ | PAGE_WRITE;
+#ifndef CONFIG_TCG_INTERPRETER
+if (tcg_splitwx_diff == 0) {
+need_prot |= PAGE_EXEC;
+}
+#endif
+for (size_t i = 0, n = region.n; i < n; i++) {
 void *start, *end;
 
 tcg_region_bounds(i, , );
+if (have_prot != need_prot) {
+int rc;
 
-/*
- * macOS 11.2 has a bug (Apple Feedback FB8994773) in which mprotect
- * rejects a permission change from RWX -> NONE.  Guard pages are
- * nice for bug detection but are not essential; ignore any failure.
- */
-(void)qemu_mprotect_none(end, page_size);
+if (need_prot == (PAGE_READ | PAGE_WRITE | PAGE_EXEC)) {
+rc = qemu_mprotect_rwx(start, end - start);
+} else if (need_prot == (PAGE_READ | PAGE_WRITE)) {
+rc = qemu_mprotect_rw(start, end - start);
+} else {
+g_assert_not_reached();
+}
+if (rc) {
+error_setg_errno(_fatal, errno,
+ "mprotect of jit buffer");
+}
+}
+if (have_prot != 0) {
+/*
+ * macOS 11.2 has a bug (Apple Feedback FB8994773) in which 
mprotect
+ * rejects a permission change from RWX -> NONE.  Guard pages are
+ * nice for bug detection but are not essential; ignore any 
failure.
+ */
+(void)qemu_mprotect_none(end, page_size);
+}
 }
 
 tcg_region_trees_init();
-- 
2.25.1

[PULL 13/34] accel/tcg: Use MiB in tcg_init_machine

2021-06-11 Thread Richard Henderson

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-all.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index e990180c4b..1ee89902c3 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -32,6 +32,7 @@
 #include "qemu/error-report.h"
 #include "qemu/accel.h"
 #include "qapi/qapi-builtin-visit.h"
+#include "qemu/units.h"
 #include "internal.h"
 
 struct TCGState {
@@ -115,7 +116,7 @@ static int tcg_init_machine(MachineState *ms)
 
 page_init();
 tb_htable_init();
-tcg_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
+tcg_init(s->tb_size * MiB, s->splitwx_enabled);
 
 #if defined(CONFIG_SOFTMMU)
 /*
-- 
2.25.1

[PULL 16/34] tcg: Move MAX_CODE_GEN_BUFFER_SIZE to tcg-target.h

2021-06-11 Thread Richard Henderson

Remove the ifdef ladder and move each define into the
appropriate header file.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  1 +
 tcg/arm/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  2 ++
 tcg/mips/tcg-target.h|  6 ++
 tcg/ppc/tcg-target.h |  2 ++
 tcg/riscv/tcg-target.h   |  1 +
 tcg/s390/tcg-target.h|  3 +++
 tcg/sparc/tcg-target.h   |  1 +
 tcg/tci/tcg-target.h |  1 +
 tcg/region.c | 33 +
 10 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 5ec30dba25..ef55f7c185 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -15,6 +15,7 @@
 
 #define TCG_TARGET_INSN_UNIT_SIZE  4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 24
+#define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #undef TCG_TARGET_STACK_GROWSUP
 
 typedef enum {
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index d6222ba2db..57fd0c0c74 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -60,6 +60,7 @@ extern int arm_arch;
 #undef TCG_TARGET_STACK_GROWSUP
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
+#define MAX_CODE_GEN_BUFFER_SIZE  UINT32_MAX
 
 typedef enum {
 TCG_REG_R0 = 0,
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index b693d3692d..ac10066c3e 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -31,9 +31,11 @@
 #ifdef __x86_64__
 # define TCG_TARGET_REG_BITS  64
 # define TCG_TARGET_NB_REGS   32
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #else
 # define TCG_TARGET_REG_BITS  32
 # define TCG_TARGET_NB_REGS   24
+# define MAX_CODE_GEN_BUFFER_SIZE  UINT32_MAX
 #endif
 
 typedef enum {
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index c2c32fb38f..e81e824cab 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -39,6 +39,12 @@
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 #define TCG_TARGET_NB_REGS 32
 
+/*
+ * We have a 256MB branch region, but leave room to make sure the
+ * main executable is also within that region.
+ */
+#define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
+
 typedef enum {
 TCG_REG_ZERO = 0,
 TCG_REG_AT,
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index d1339afc66..c13ed5640a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -27,8 +27,10 @@
 
 #ifdef _ARCH_PPC64
 # define TCG_TARGET_REG_BITS  64
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #else
 # define TCG_TARGET_REG_BITS  32
+# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
 #endif
 
 #define TCG_TARGET_NB_REGS 64
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 727c8df418..87ea94666b 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -34,6 +34,7 @@
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
 #define TCG_TARGET_NB_REGS 32
+#define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 typedef enum {
 TCG_REG_ZERO,
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 641464eea4..b04b72b7eb 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -28,6 +28,9 @@
 #define TCG_TARGET_INSN_UNIT_SIZE 2
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 19
 
+/* We have a +- 4GB range on the branches; leave some slop.  */
+#define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
+
 typedef enum TCGReg {
 TCG_REG_R0 = 0,
 TCG_REG_R1,
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index f66f5d07dc..86bb9a2d39 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -30,6 +30,7 @@
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
 #define TCG_TARGET_NB_REGS 32
+#define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 
 typedef enum {
 TCG_REG_G0 = 0,
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 52af6d8bc5..d0b5f3fa64 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -43,6 +43,7 @@
 #define TCG_TARGET_INTERPRETER 1
 #define TCG_TARGET_INSN_UNIT_SIZE 1
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
+#define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 #if UINTPTR_MAX == UINT32_MAX
 # define TCG_TARGET_REG_BITS 32
diff --git a/tcg/region.c b/tcg/region.c
index 57069a38ff..13087aa0c9 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -401,37 +401,14 @@ static size_t tcg_n_regions(unsigned max_cpus)
 /*
  * Minimum size of the code gen buffer.  This number is randomly chosen,
  * but not so small that we can't have a fair number of TB's live.
+ *
+ * Maximum size, MAX_CODE_GEN_BUFFER_SIZE, is defined in tcg-target.h.
+ * Unless otherwise indicated, this is constrained by the range of
+ * direct branches on the host cpu, as used by the TCG implementation
+ * of goto_tb.
  */
 #define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB)
 
-/*
- * Maximum size of the code gen buffer we'd like to use.  Unless otherwise
- * indicated, this is constrained by the range of direct branches on the

[PULL 20/34] tcg: Tidy split_cross_256mb

2021-06-11 Thread Richard Henderson

Return output buffer and size via output pointer arguments,
rather than returning size via tcg_ctx->code_gen_buffer_size.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index 037a01e4ed..445a278702 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -470,9 +470,10 @@ static inline bool cross_256mb(void *addr, size_t size)
 /*
  * We weren't able to allocate a buffer without crossing that boundary,
  * so make do with the larger portion of the buffer that doesn't cross.
- * Returns the new base of the buffer, and adjusts code_gen_buffer_size.
+ * Returns the new base and size of the buffer in *obuf and *osize.
  */
-static inline void *split_cross_256mb(void *buf1, size_t size1)
+static inline void split_cross_256mb(void **obuf, size_t *osize,
+ void *buf1, size_t size1)
 {
 void *buf2 = (void *)(((uintptr_t)buf1 + size1) & ~0x0ffful);
 size_t size2 = buf1 + size1 - buf2;
@@ -483,8 +484,8 @@ static inline void *split_cross_256mb(void *buf1, size_t 
size1)
 buf1 = buf2;
 }
 
-tcg_ctx->code_gen_buffer_size = size1;
-return buf1;
+*obuf = buf1;
+*osize = size1;
 }
 #endif
 
@@ -514,12 +515,10 @@ static bool alloc_code_gen_buffer(size_t tb_size, int 
splitwx, Error **errp)
 if (size > tb_size) {
 size = QEMU_ALIGN_DOWN(tb_size, qemu_real_host_page_size);
 }
-tcg_ctx->code_gen_buffer_size = size;
 
 #ifdef __mips__
 if (cross_256mb(buf, size)) {
-buf = split_cross_256mb(buf, size);
-size = tcg_ctx->code_gen_buffer_size;
+split_cross_256mb(, , buf, size);
 }
 #endif
 
@@ -530,6 +529,7 @@ static bool alloc_code_gen_buffer(size_t tb_size, int 
splitwx, Error **errp)
 qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
 tcg_ctx->code_gen_buffer = buf;
+tcg_ctx->code_gen_buffer_size = size;
 return true;
 }
 #elif defined(_WIN32)
@@ -566,7 +566,6 @@ static bool alloc_code_gen_buffer_anon(size_t size, int 
prot,
  "allocate %zu bytes for jit buffer", size);
 return false;
 }
-tcg_ctx->code_gen_buffer_size = size;
 
 #ifdef __mips__
 if (cross_256mb(buf, size)) {
@@ -588,8 +587,7 @@ static bool alloc_code_gen_buffer_anon(size_t size, int 
prot,
 /* fallthru */
 default:
 /* Split the original buffer.  Free the smaller half.  */
-buf2 = split_cross_256mb(buf, size);
-size2 = tcg_ctx->code_gen_buffer_size;
+split_cross_256mb(, , buf, size);
 if (buf == buf2) {
 munmap(buf + size2, size - size2);
 } else {
@@ -606,6 +604,7 @@ static bool alloc_code_gen_buffer_anon(size_t size, int 
prot,
 qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
 tcg_ctx->code_gen_buffer = buf;
+tcg_ctx->code_gen_buffer_size = size;
 return true;
 }
 
-- 
2.25.1

[PULL 23/34] tcg: Return the map protection from alloc_code_gen_buffer

2021-06-11 Thread Richard Henderson

Change the interface from a boolean error indication to a
negative error vs a non-negative protection.  For the moment
this is only interface change, not making use of the new data.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 63 +++-
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index ed7efba4b4..e6c80b35b1 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -526,14 +526,14 @@ static inline void split_cross_256mb(void **obuf, size_t 
*osize,
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
 __attribute__((aligned(CODE_GEN_ALIGN)));
 
-static bool alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
+static int alloc_code_gen_buffer(size_t tb_size, int splitwx, Error **errp)
 {
 void *buf, *end;
 size_t size;
 
 if (splitwx > 0) {
 error_setg(errp, "jit split-wx not supported");
-return false;
+return -1;
 }
 
 /* page-align the beginning and end of the buffer */
@@ -563,16 +563,17 @@ static bool alloc_code_gen_buffer(size_t tb_size, int 
splitwx, Error **errp)
 
 region.start_aligned = buf;
 region.total_size = size;
-return true;
+
+return PROT_READ | PROT_WRITE;
 }
 #elif defined(_WIN32)
-static bool alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
+static int alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
 {
 void *buf;
 
 if (splitwx > 0) {
 error_setg(errp, "jit split-wx not supported");
-return false;
+return -1;
 }
 
 buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
@@ -585,11 +586,12 @@ static bool alloc_code_gen_buffer(size_t size, int 
splitwx, Error **errp)
 
 region.start_aligned = buf;
 region.total_size = size;
-return true;
+
+return PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 }
 #else
-static bool alloc_code_gen_buffer_anon(size_t size, int prot,
-   int flags, Error **errp)
+static int alloc_code_gen_buffer_anon(size_t size, int prot,
+  int flags, Error **errp)
 {
 void *buf;
 
@@ -597,7 +599,7 @@ static bool alloc_code_gen_buffer_anon(size_t size, int 
prot,
 if (buf == MAP_FAILED) {
 error_setg_errno(errp, errno,
  "allocate %zu bytes for jit buffer", size);
-return false;
+return -1;
 }
 
 #ifdef __mips__
@@ -638,7 +640,7 @@ static bool alloc_code_gen_buffer_anon(size_t size, int 
prot,
 
 region.start_aligned = buf;
 region.total_size = size;
-return true;
+return prot;
 }
 
 #ifndef CONFIG_TCG_INTERPRETER
@@ -652,9 +654,9 @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t 
size, Error **errp)
 
 #ifdef __mips__
 /* Find space for the RX mapping, vs the 256MiB regions. */
-if (!alloc_code_gen_buffer_anon(size, PROT_NONE,
-MAP_PRIVATE | MAP_ANONYMOUS |
-MAP_NORESERVE, errp)) {
+if (alloc_code_gen_buffer_anon(size, PROT_NONE,
+   MAP_PRIVATE | MAP_ANONYMOUS |
+   MAP_NORESERVE, errp) < 0) {
 return false;
 }
 /* The size of the mapping may have been adjusted. */
@@ -688,7 +690,7 @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t 
size, Error **errp)
 /* Request large pages for the buffer and the splitwx.  */
 qemu_madvise(buf_rw, size, QEMU_MADV_HUGEPAGE);
 qemu_madvise(buf_rx, size, QEMU_MADV_HUGEPAGE);
-return true;
+return PROT_READ | PROT_WRITE;
 
  fail_rx:
 error_setg_errno(errp, errno, "failed to map shared memory for execute");
@@ -702,7 +704,7 @@ static bool alloc_code_gen_buffer_splitwx_memfd(size_t 
size, Error **errp)
 if (fd >= 0) {
 close(fd);
 }
-return false;
+return -1;
 }
 #endif /* CONFIG_POSIX */
 
@@ -721,7 +723,7 @@ extern kern_return_t mach_vm_remap(vm_map_t target_task,
vm_prot_t *max_protection,
vm_inherit_t inheritance);
 
-static bool alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
+static int alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
 {
 kern_return_t ret;
 mach_vm_address_t buf_rw, buf_rx;
@@ -730,7 +732,7 @@ static bool alloc_code_gen_buffer_splitwx_vmremap(size_t 
size, Error **errp)
 /* Map the read-write portion via normal anon memory. */
 if (!alloc_code_gen_buffer_anon(size, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_ANONYMOUS, errp)) {
-return false;
+return -1;
 }
 
 buf_rw = region.start_aligned;
@@ -750,23 +752,23 @@ static bool alloc_code_gen_buffer_splitwx_vmremap(size_t 
size, Error **errp)
 /* TODO: Convert "ret" to a human readable error message. */

[PULL 12/34] accel/tcg: Merge tcg_exec_init into tcg_init_machine

2021-06-11 Thread Richard Henderson

There is only one caller, and shortly we will need access
to the MachineState, which tcg_init_machine already has.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 accel/tcg/internal.h  |  2 ++
 include/sysemu/tcg.h  |  2 --
 accel/tcg/tcg-all.c   | 16 +++-
 accel/tcg/translate-all.c | 21 ++---
 bsd-user/main.c   |  2 +-
 5 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index e9c145e0fb..881bc1ede0 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -16,5 +16,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
   int cflags);
 
 void QEMU_NORETURN cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
+void page_init(void);
+void tb_htable_init(void);
 
 #endif /* ACCEL_TCG_INTERNAL_H */
diff --git a/include/sysemu/tcg.h b/include/sysemu/tcg.h
index 00349fb18a..53352450ff 100644
--- a/include/sysemu/tcg.h
+++ b/include/sysemu/tcg.h
@@ -8,8 +8,6 @@
 #ifndef SYSEMU_TCG_H
 #define SYSEMU_TCG_H
 
-void tcg_exec_init(unsigned long tb_size, int splitwx);
-
 #ifdef CONFIG_TCG
 extern bool tcg_allowed;
 #define tcg_enabled() (tcg_allowed)
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index 30d81ff7f5..e990180c4b 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -32,6 +32,7 @@
 #include "qemu/error-report.h"
 #include "qemu/accel.h"
 #include "qapi/qapi-builtin-visit.h"
+#include "internal.h"
 
 struct TCGState {
 AccelState parent_obj;
@@ -109,8 +110,21 @@ static int tcg_init_machine(MachineState *ms)
 {
 TCGState *s = TCG_STATE(current_accel());
 
-tcg_exec_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
+tcg_allowed = true;
 mttcg_enabled = s->mttcg_enabled;
+
+page_init();
+tb_htable_init();
+tcg_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
+
+#if defined(CONFIG_SOFTMMU)
+/*
+ * There's no guest base to take into account, so go ahead and
+ * initialize the prologue now.
+ */
+tcg_prologue_init(tcg_ctx);
+#endif
+
 return 0;
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4f563b8724..59609d62d5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -408,7 +408,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, 
bool will_exit)
 return false;
 }
 
-static void page_init(void)
+void page_init(void)
 {
 page_size_init();
 page_table_config_init();
@@ -907,30 +907,13 @@ static bool tb_cmp(const void *ap, const void *bp)
 a->page_addr[1] == b->page_addr[1];
 }
 
-static void tb_htable_init(void)
+void tb_htable_init(void)
 {
 unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
 qht_init(_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
 }
 
-/* Must be called before using the QEMU cpus. 'tb_size' is the size
-   (in bytes) allocated to the translation buffer. Zero means default
-   size. */
-void tcg_exec_init(unsigned long tb_size, int splitwx)
-{
-tcg_allowed = true;
-page_init();
-tb_htable_init();
-tcg_init(tb_size, splitwx);
-
-#if defined(CONFIG_SOFTMMU)
-/* There's no guest base to take into account, so go ahead and
-   initialize the prologue now.  */
-tcg_prologue_init(tcg_ctx);
-#endif
-}
-
 /* call with @p->lock held */
 static inline void invalidate_page_bitmap(PageDesc *p)
 {
diff --git a/bsd-user/main.c b/bsd-user/main.c
index 270cf2ca70..fe66204b6b 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -813,7 +813,7 @@ int main(int argc, char **argv)
 envlist_free(envlist);
 
 /*
- * Now that page sizes are configured in tcg_exec_init() we can do
+ * Now that page sizes are configured we can do
  * proper page alignment for guest_base.
  */
 guest_base = HOST_PAGE_ALIGN(guest_base);
-- 
2.25.1

[PULL 17/34] tcg: Replace region.end with region.total_size

2021-06-11 Thread Richard Henderson

A size is easier to work with than an end point,
particularly during initial buffer allocation.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index 13087aa0c9..7a34c96d74 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -48,10 +48,10 @@ struct tcg_region_state {
 /* fields set at init time */
 void *start;
 void *start_aligned;
-void *end;
 size_t n;
 size_t size; /* size of one region */
 size_t stride; /* .size + guard size */
+size_t total_size; /* size of entire buffer, >= n * stride */
 
 /* fields protected by the lock */
 size_t current; /* current region index */
@@ -278,8 +278,9 @@ static void tcg_region_bounds(size_t curr_region, void 
**pstart, void **pend)
 if (curr_region == 0) {
 start = region.start;
 }
+/* The final region may have a few extra pages due to earlier rounding. */
 if (curr_region == region.n - 1) {
-end = region.end;
+end = region.start_aligned + region.total_size;
 }
 
 *pstart = start;
@@ -817,8 +818,8 @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, 
Error **errp)
  */
 void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
 {
-void *buf, *aligned;
-size_t size;
+void *buf, *aligned, *end;
+size_t total_size;
 size_t page_size;
 size_t region_size;
 size_t n_regions;
@@ -830,19 +831,20 @@ void tcg_region_init(size_t tb_size, int splitwx, 
unsigned max_cpus)
 assert(ok);
 
 buf = tcg_init_ctx.code_gen_buffer;
-size = tcg_init_ctx.code_gen_buffer_size;
+total_size = tcg_init_ctx.code_gen_buffer_size;
 page_size = qemu_real_host_page_size;
 n_regions = tcg_n_regions(max_cpus);
 
 /* The first region will be 'aligned - buf' bytes larger than the others */
 aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
-g_assert(aligned < tcg_init_ctx.code_gen_buffer + size);
+g_assert(aligned < tcg_init_ctx.code_gen_buffer + total_size);
+
 /*
  * Make region_size a multiple of page_size, using aligned as the start.
  * As a result of this we might end up with a few extra pages at the end of
  * the buffer; we will assign those to the last region.
  */
-region_size = (size - (aligned - buf)) / n_regions;
+region_size = (total_size - (aligned - buf)) / n_regions;
 region_size = QEMU_ALIGN_DOWN(region_size, page_size);
 
 /* A region must have at least 2 pages; one code, one guard */
@@ -856,9 +858,11 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
 region.start = buf;
 region.start_aligned = aligned;
 /* page-align the end, since its last page will be a guard page */
-region.end = QEMU_ALIGN_PTR_DOWN(buf + size, page_size);
+end = QEMU_ALIGN_PTR_DOWN(buf + total_size, page_size);
 /* account for that last guard page */
-region.end -= page_size;
+end -= page_size;
+total_size = end - aligned;
+region.total_size = total_size;
 
 /*
  * Set guard pages in the rw buffer, as that's the one into which
@@ -899,7 +903,7 @@ void tcg_region_prologue_set(TCGContext *s)
 
 /* Register the balance of the buffer with gdb. */
 tcg_register_jit(tcg_splitwx_to_rx(region.start),
- region.end - region.start);
+ region.start_aligned + region.total_size - region.start);
 }
 
 /*
@@ -940,8 +944,10 @@ size_t tcg_code_capacity(void)
 
 /* no need for synchronization; these variables are set at init time */
 guard_size = region.stride - region.size;
-capacity = region.end + guard_size - region.start;
-capacity -= region.n * (guard_size + TCG_HIGHWATER);
+capacity = region.total_size;
+capacity -= (region.n - 1) * guard_size;
+capacity -= region.n * TCG_HIGHWATER;
+
 return capacity;
 }
 
-- 
2.25.1

[PULL 14/34] accel/tcg: Pass down max_cpus to tcg_init

2021-06-11 Thread Richard Henderson

Start removing the include of hw/boards.h from tcg/.
Pass down the max_cpus value from tcg_init_machine,
where we have the MachineState already.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h   |  2 +-
 tcg/tcg-internal.h  |  2 +-
 accel/tcg/tcg-all.c | 10 +-
 tcg/region.c| 32 +++-
 tcg/tcg.c   | 10 --
 5 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index b3304ce095..2316a64139 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -905,7 +905,7 @@ static inline void *tcg_malloc(int size)
 }
 }
 
-void tcg_init(size_t tb_size, int splitwx);
+void tcg_init(size_t tb_size, int splitwx, unsigned max_cpus);
 void tcg_register_thread(void);
 void tcg_prologue_init(TCGContext *s);
 void tcg_func_start(TCGContext *s);
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index f13c564d9b..fcfeca232f 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -30,7 +30,7 @@
 extern TCGContext **tcg_ctxs;
 extern unsigned int n_tcg_ctxs;
 
-void tcg_region_init(size_t tb_size, int splitwx);
+void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus);
 bool tcg_region_alloc(TCGContext *s);
 void tcg_region_initial_alloc(TCGContext *s);
 void tcg_region_prologue_set(TCGContext *s);
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index 1ee89902c3..00803f76d8 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -33,6 +33,9 @@
 #include "qemu/accel.h"
 #include "qapi/qapi-builtin-visit.h"
 #include "qemu/units.h"
+#if !defined(CONFIG_USER_ONLY)
+#include "hw/boards.h"
+#endif
 #include "internal.h"
 
 struct TCGState {
@@ -110,13 +113,18 @@ bool mttcg_enabled;
 static int tcg_init_machine(MachineState *ms)
 {
 TCGState *s = TCG_STATE(current_accel());
+#ifdef CONFIG_USER_ONLY
+unsigned max_cpus = 1;
+#else
+unsigned max_cpus = ms->smp.max_cpus;
+#endif
 
 tcg_allowed = true;
 mttcg_enabled = s->mttcg_enabled;
 
 page_init();
 tb_htable_init();
-tcg_init(s->tb_size * MiB, s->splitwx_enabled);
+tcg_init(s->tb_size * MiB, s->splitwx_enabled, max_cpus);
 
 #if defined(CONFIG_SOFTMMU)
 /*
diff --git a/tcg/region.c b/tcg/region.c
index 162b4d6486..877baf16f5 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -27,9 +27,6 @@
 #include "qapi/error.h"
 #include "exec/exec-all.h"
 #include "tcg/tcg.h"
-#if !defined(CONFIG_USER_ONLY)
-#include "hw/boards.h"
-#endif
 #include "tcg-internal.h"
 
 
@@ -366,27 +363,20 @@ void tcg_region_reset_all(void)
 tcg_region_tree_reset_all();
 }
 
+static size_t tcg_n_regions(unsigned max_cpus)
+{
 #ifdef CONFIG_USER_ONLY
-static size_t tcg_n_regions(void)
-{
 return 1;
-}
 #else
-/*
- * It is likely that some vCPUs will translate more code than others, so we
- * first try to set more regions than max_cpus, with those regions being of
- * reasonable size. If that's not possible we make do by evenly dividing
- * the code_gen_buffer among the vCPUs.
- */
-static size_t tcg_n_regions(void)
-{
+/*
+ * It is likely that some vCPUs will translate more code than others,
+ * so we first try to set more regions than max_cpus, with those regions
+ * being of reasonable size. If that's not possible we make do by evenly
+ * dividing the code_gen_buffer among the vCPUs.
+ */
 size_t i;
 
 /* Use a single region if all we have is one vCPU thread */
-#if !defined(CONFIG_USER_ONLY)
-MachineState *ms = MACHINE(qdev_get_machine());
-unsigned int max_cpus = ms->smp.max_cpus;
-#endif
 if (max_cpus == 1 || !qemu_tcg_mttcg_enabled()) {
 return 1;
 }
@@ -405,8 +395,8 @@ static size_t tcg_n_regions(void)
 }
 /* If we can't, then just allocate one region per vCPU thread */
 return max_cpus;
-}
 #endif
+}
 
 /*
  * Minimum size of the code gen buffer.  This number is randomly chosen,
@@ -848,7 +838,7 @@ static bool alloc_code_gen_buffer(size_t size, int splitwx, 
Error **errp)
  * in practice. Multi-threaded guests share most if not all of their translated
  * code, which makes parallel code generation less appealing than in softmmu.
  */
-void tcg_region_init(size_t tb_size, int splitwx)
+void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
 {
 void *buf, *aligned;
 size_t size;
@@ -865,7 +855,7 @@ void tcg_region_init(size_t tb_size, int splitwx)
 buf = tcg_init_ctx.code_gen_buffer;
 size = tcg_init_ctx.code_gen_buffer_size;
 page_size = qemu_real_host_page_size;
-n_regions = tcg_n_regions();
+n_regions = tcg_n_regions(max_cpus);
 
 /* The first region will be 'aligned - buf' bytes larger than the others */
 aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 2625d9e502..5cc384e205 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -576,7 +576,7 @@ static void process_op_defs(TCGContext

[PULL 07/34] tcg: Split out region.c

2021-06-11 Thread Richard Henderson

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-internal.h |  37 +++
 tcg/region.c   | 572 +
 tcg/tcg.c  | 547 +--
 tcg/meson.build|   1 +
 4 files changed, 613 insertions(+), 544 deletions(-)
 create mode 100644 tcg/tcg-internal.h
 create mode 100644 tcg/region.c

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
new file mode 100644
index 00..b1dda343c2
--- /dev/null
+++ b/tcg/tcg-internal.h
@@ -0,0 +1,37 @@
+/*
+ * Internal declarations for Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TCG_INTERNAL_H
+#define TCG_INTERNAL_H 1
+
+#define TCG_HIGHWATER 1024
+
+extern TCGContext **tcg_ctxs;
+extern unsigned int n_tcg_ctxs;
+
+bool tcg_region_alloc(TCGContext *s);
+void tcg_region_initial_alloc(TCGContext *s);
+void tcg_region_prologue_set(TCGContext *s);
+
+#endif /* TCG_INTERNAL_H */
diff --git a/tcg/region.c b/tcg/region.c
new file mode 100644
index 00..6e34fcf775
--- /dev/null
+++ b/tcg/region.c
@@ -0,0 +1,572 @@
+/*
+ * Memory region management for Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/exec-all.h"
+#include "tcg/tcg.h"
+#if !defined(CONFIG_USER_ONLY)
+#include "hw/boards.h"
+#endif
+#include "tcg-internal.h"
+
+
+struct tcg_region_tree {
+QemuMutex lock;
+GTree *tree;
+/* padding to avoid false sharing is computed at run-time */
+};
+
+/*
+ * We divide code_gen_buffer into equally-sized "regions" that TCG threads
+ * dynamically allocate from as demand dictates. Given appropriate region
+ * sizing, this minimizes flushes even when some TCG threads generate a lot
+ * more code than others.
+ */
+struct tcg_region_state {
+QemuMutex lock;
+
+/* fields set at init time */
+void *start;
+void *start_aligned;
+void *end;
+size_t n;
+size_t size; /* size of one region */
+size_t stride; /* .size + guard size */
+
+/* fields protected by the lock */
+size_t current; /* current region index */
+size_t agg_size_full; /* aggregate size of full regions */
+};
+
+static struct tcg_region_state region;
+
+/*
+ * This is an array of struct tcg_region_tree's, with padding.
+ * We use void * to simplify the computation of region_trees[i]; each
+ * struct is found every tree_size bytes.
+ */
+static void *region_trees;
+static size_t tree_size;
+
+/* compare a pointer @ptr and a tb_tc @s */
+static int ptr_cmp_tb_tc(const void *ptr, const struct tb_tc *s)
+{
+if (ptr >= s->ptr + s->size) {
+return 1;
+} else if (ptr < s->ptr) {
+return -1;
+}
+return 0;
+}
+
+static gint tb_tc_cmp(gconstpointer

[PULL 19/34] tcg: Tidy tcg_n_regions

2021-06-11 Thread Richard Henderson

Compute the value using straight division and bounds,
rather than a loop.  Pass in tb_size rather than reading
from tcg_init_ctx.code_gen_buffer_size,

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/region.c | 29 -
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index b143eaf69c..037a01e4ed 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -364,38 +364,33 @@ void tcg_region_reset_all(void)
 tcg_region_tree_reset_all();
 }
 
-static size_t tcg_n_regions(unsigned max_cpus)
+static size_t tcg_n_regions(size_t tb_size, unsigned max_cpus)
 {
 #ifdef CONFIG_USER_ONLY
 return 1;
 #else
+size_t n_regions;
+
 /*
  * It is likely that some vCPUs will translate more code than others,
  * so we first try to set more regions than max_cpus, with those regions
  * being of reasonable size. If that's not possible we make do by evenly
  * dividing the code_gen_buffer among the vCPUs.
  */
-size_t i;
-
 /* Use a single region if all we have is one vCPU thread */
 if (max_cpus == 1 || !qemu_tcg_mttcg_enabled()) {
 return 1;
 }
 
-/* Try to have more regions than max_cpus, with each region being >= 2 MB 
*/
-for (i = 8; i > 0; i--) {
-size_t regions_per_thread = i;
-size_t region_size;
-
-region_size = tcg_init_ctx.code_gen_buffer_size;
-region_size /= max_cpus * regions_per_thread;
-
-if (region_size >= 2 * 1024u * 1024) {
-return max_cpus * regions_per_thread;
-}
+/*
+ * Try to have more regions than max_cpus, with each region being >= 2 MB.
+ * If we can't, then just allocate one region per vCPU thread.
+ */
+n_regions = tb_size / (2 * MiB);
+if (n_regions <= max_cpus) {
+return max_cpus;
 }
-/* If we can't, then just allocate one region per vCPU thread */
-return max_cpus;
+return MIN(n_regions, max_cpus * 8);
 #endif
 }
 
@@ -833,7 +828,7 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
 buf = tcg_init_ctx.code_gen_buffer;
 total_size = tcg_init_ctx.code_gen_buffer_size;
 page_size = qemu_real_host_page_size;
-n_regions = tcg_n_regions(max_cpus);
+n_regions = tcg_n_regions(total_size, max_cpus);
 
 /* The first region will be 'aligned - buf' bytes larger than the others */
 aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
-- 
2.25.1

[PULL 09/34] accel/tcg: Move alloc_code_gen_buffer to tcg/region.c

2021-06-11 Thread Richard Henderson

Buffer management is integral to tcg.  Do not leave the allocation
to code outside of tcg/.  This is code movement, with further
cleanups to follow.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h |   2 +-
 accel/tcg/translate-all.c | 414 +---
 tcg/region.c  | 431 +-
 3 files changed, 428 insertions(+), 419 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 74cb345308..834785fc23 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -873,7 +873,7 @@ void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 TranslationBlock *tcg_tb_alloc(TCGContext *s);
 
-void tcg_region_init(void);
+void tcg_region_init(size_t tb_size, int splitwx);
 void tb_destroy(TranslationBlock *tb);
 void tcg_region_reset_all(void);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 337fbb11fa..ad7a25d9f0 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -18,7 +18,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/units.h"
 #include "qemu-common.h"
 
 #define NO_CPU_IO_DEFS
@@ -49,7 +48,6 @@
 #include "exec/cputlb.h"
 #include "exec/translate-all.h"
 #include "qemu/bitmap.h"
-#include "qemu/error-report.h"
 #include "qemu/qemu-print.h"
 #include "qemu/timer.h"
 #include "qemu/main-loop.h"
@@ -895,408 +893,6 @@ static void page_lock_pair(PageDesc **ret_p1, 
tb_page_addr_t phys1,
 }
 }
 
-/* Minimum size of the code gen buffer.  This number is randomly chosen,
-   but not so small that we can't have a fair number of TB's live.  */
-#define MIN_CODE_GEN_BUFFER_SIZE (1 * MiB)
-
-/* Maximum size of the code gen buffer we'd like to use.  Unless otherwise
-   indicated, this is constrained by the range of direct branches on the
-   host cpu, as used by the TCG implementation of goto_tb.  */
-#if defined(__x86_64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
-#elif defined(__sparc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
-#elif defined(__powerpc64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
-#elif defined(__powerpc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
-#elif defined(__aarch64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
-#elif defined(__s390x__)
-  /* We have a +- 4GB range on the branches; leave some slop.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
-#elif defined(__mips__)
-  /* We have a 256MB branch region, but leave room to make sure the
- main executable is also within that region.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
-#else
-# define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
-#endif
-
-#if TCG_TARGET_REG_BITS == 32
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
-#ifdef CONFIG_USER_ONLY
-/*
- * For user mode on smaller 32 bit systems we may run into trouble
- * allocating big chunks of data in the right place. On these systems
- * we utilise a static code generation buffer directly in the binary.
- */
-#define USE_STATIC_CODE_GEN_BUFFER
-#endif
-#else /* TCG_TARGET_REG_BITS == 64 */
-#ifdef CONFIG_USER_ONLY
-/*
- * As user-mode emulation typically means running multiple instances
- * of the translator don't go too nuts with our default code gen
- * buffer lest we make things too hard for the OS.
- */
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
-#else
-/*
- * We expect most system emulation to run one or two guests per host.
- * Users running large scale system emulation may want to tweak their
- * runtime setup via the tb-size control on the command line.
- */
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
-#endif
-#endif
-
-#define DEFAULT_CODE_GEN_BUFFER_SIZE \
-  (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-   ? DEFAULT_CODE_GEN_BUFFER_SIZE_1 : MAX_CODE_GEN_BUFFER_SIZE)
-
-static size_t size_code_gen_buffer(size_t tb_size)
-{
-/* Size the buffer.  */
-if (tb_size == 0) {
-size_t phys_mem = qemu_get_host_physmem();
-if (phys_mem == 0) {
-tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
-} else {
-tb_size = MIN(DEFAULT_CODE_GEN_BUFFER_SIZE, phys_mem / 8);
-}
-}
-if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
-tb_size = MIN_CODE_GEN_BUFFER_SIZE;
-}
-if (tb_size > MAX_CODE_GEN_BUFFER_SIZE) {
-tb_size = MAX_CODE_GEN_BUFFER_SIZE;
-}
-return tb_size;
-}
-
-#ifdef __mips__
-/* In order to use J and JAL within the code_gen_buffer, we require
-   that the buffer not cross a 256MB boundary.  */
-static inline bool cross_256mb(void *addr, size_t size)
-{
-return ((uintptr_t)addr ^ ((uintptr_t)addr + size)) & ~0x0ffful;
-}
-
-/* We weren't able to allocate a buffer without crossing that boundary,
-   so make do with the larger portion of the buffer that doesn't cross.
-   Returns the new base of the buffer, and adjusts code_gen_buffer_size.  */
-static inline void *split_cross_256mb(void *buf1, size_t

[PULL 08/34] accel/tcg: Inline cpu_gen_init

2021-06-11 Thread Richard Henderson

It consists of one function call and has only one caller.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 04764626bc..337fbb11fa 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -245,11 +245,6 @@ static void page_table_config_init(void)
 assert(v_l2_levels >= 0);
 }
 
-static void cpu_gen_init(void)
-{
-tcg_context_init(_init_ctx);
-}
-
 /* Encode VAL as a signed leb128 sequence at P.
Return P incremented past the encoded value.  */
 static uint8_t *encode_sleb128(uint8_t *p, target_long val)
@@ -1331,7 +1326,7 @@ void tcg_exec_init(unsigned long tb_size, int splitwx)
 bool ok;
 
 tcg_allowed = true;
-cpu_gen_init();
+tcg_context_init(_init_ctx);
 page_init();
 tb_htable_init();
 
-- 
2.25.1

[PULL 15/34] tcg: Introduce tcg_max_ctxs

2021-06-11 Thread Richard Henderson

Finish the divorce of tcg/ from hw/, and do not take
the max cpu value from MachineState; just remember what
we were passed in tcg_init.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-internal.h |  3 ++-
 tcg/region.c   |  6 +++---
 tcg/tcg.c  | 23 ++-
 3 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index fcfeca232f..f9906523da 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -28,7 +28,8 @@
 #define TCG_HIGHWATER 1024
 
 extern TCGContext **tcg_ctxs;
-extern unsigned int n_tcg_ctxs;
+extern unsigned int tcg_cur_ctxs;
+extern unsigned int tcg_max_ctxs;
 
 void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus);
 bool tcg_region_alloc(TCGContext *s);
diff --git a/tcg/region.c b/tcg/region.c
index 877baf16f5..57069a38ff 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -347,7 +347,7 @@ void tcg_region_initial_alloc(TCGContext *s)
 /* Call from a safe-work context */
 void tcg_region_reset_all(void)
 {
-unsigned int n_ctxs = qatomic_read(_tcg_ctxs);
+unsigned int n_ctxs = qatomic_read(_cur_ctxs);
 unsigned int i;
 
 qemu_mutex_lock();
@@ -934,7 +934,7 @@ void tcg_region_prologue_set(TCGContext *s)
  */
 size_t tcg_code_size(void)
 {
-unsigned int n_ctxs = qatomic_read(_tcg_ctxs);
+unsigned int n_ctxs = qatomic_read(_cur_ctxs);
 unsigned int i;
 size_t total;
 
@@ -970,7 +970,7 @@ size_t tcg_code_capacity(void)
 
 size_t tcg_tb_phys_invalidate_count(void)
 {
-unsigned int n_ctxs = qatomic_read(_tcg_ctxs);
+unsigned int n_ctxs = qatomic_read(_cur_ctxs);
 unsigned int i;
 size_t total = 0;
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5cc384e205..9880d5205e 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -43,11 +43,6 @@
 #define NO_CPU_IO_DEFS
 
 #include "exec/exec-all.h"
-
-#if !defined(CONFIG_USER_ONLY)
-#include "hw/boards.h"
-#endif
-
 #include "tcg/tcg-op.h"
 
 #if UINTPTR_MAX == UINT32_MAX
@@ -155,7 +150,8 @@ static int tcg_out_ldst_finalize(TCGContext *s);
 #endif
 
 TCGContext **tcg_ctxs;
-unsigned int n_tcg_ctxs;
+unsigned int tcg_cur_ctxs;
+unsigned int tcg_max_ctxs;
 TCGv_env cpu_env = 0;
 const void *tcg_code_gen_epilogue;
 uintptr_t tcg_splitwx_diff;
@@ -475,7 +471,6 @@ void tcg_register_thread(void)
 #else
 void tcg_register_thread(void)
 {
-MachineState *ms = MACHINE(qdev_get_machine());
 TCGContext *s = g_malloc(sizeof(*s));
 unsigned int i, n;
 
@@ -491,8 +486,8 @@ void tcg_register_thread(void)
 }
 
 /* Claim an entry in tcg_ctxs */
-n = qatomic_fetch_inc(_tcg_ctxs);
-g_assert(n < ms->smp.max_cpus);
+n = qatomic_fetch_inc(_cur_ctxs);
+g_assert(n < tcg_max_ctxs);
 qatomic_set(_ctxs[n], s);
 
 if (n > 0) {
@@ -643,9 +638,11 @@ static void tcg_context_init(unsigned max_cpus)
  */
 #ifdef CONFIG_USER_ONLY
 tcg_ctxs = _ctx;
-n_tcg_ctxs = 1;
+tcg_cur_ctxs = 1;
+tcg_max_ctxs = 1;
 #else
-tcg_ctxs = g_new(TCGContext *, max_cpus);
+tcg_max_ctxs = max_cpus;
+tcg_ctxs = g_new0(TCGContext *, max_cpus);
 #endif
 
 tcg_debug_assert(!tcg_regset_test_reg(s->reserved_regs, TCG_AREG0));
@@ -3937,7 +3934,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 static inline
 void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
 {
-unsigned int n_ctxs = qatomic_read(_tcg_ctxs);
+unsigned int n_ctxs = qatomic_read(_cur_ctxs);
 unsigned int i;
 
 for (i = 0; i < n_ctxs; i++) {
@@ -4000,7 +3997,7 @@ void tcg_dump_op_count(void)
 
 int64_t tcg_cpu_exec_time(void)
 {
-unsigned int n_ctxs = qatomic_read(_tcg_ctxs);
+unsigned int n_ctxs = qatomic_read(_cur_ctxs);
 unsigned int i;
 int64_t ret = 0;
 
-- 
2.25.1

[PULL 03/34] tcg: Re-order tcg_region_init vs tcg_prologue_init

2021-06-11 Thread Richard Henderson

Instead of delaying tcg_region_init until after tcg_prologue_init
is complete, do tcg_region_init first and let tcg_prologue_init
shrink the first region by the size of the generated prologue.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-all.c   | 11 -
 accel/tcg/translate-all.c |  3 +++
 bsd-user/main.c   |  1 -
 linux-user/main.c |  1 -
 tcg/tcg.c | 52 ++-
 5 files changed, 22 insertions(+), 46 deletions(-)

diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index e378c2db73..f132033999 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -111,17 +111,6 @@ static int tcg_init(MachineState *ms)
 
 tcg_exec_init(s->tb_size * 1024 * 1024, s->splitwx_enabled);
 mttcg_enabled = s->mttcg_enabled;
-
-/*
- * Initialize TCG regions only for softmmu.
- *
- * This needs to be done later for user mode, because the prologue
- * generation needs to be delayed so that GUEST_BASE is already set.
- */
-#ifndef CONFIG_USER_ONLY
-tcg_region_init();
-#endif /* !CONFIG_USER_ONLY */
-
 return 0;
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 1eefe6ea8d..04764626bc 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1339,6 +1339,9 @@ void tcg_exec_init(unsigned long tb_size, int splitwx)
splitwx, _fatal);
 assert(ok);
 
+/* TODO: allocating regions is hand-in-glove with code_gen_buffer. */
+tcg_region_init();
+
 #if defined(CONFIG_SOFTMMU)
 /* There's no guest base to take into account, so go ahead and
initialize the prologue now.  */
diff --git a/bsd-user/main.c b/bsd-user/main.c
index 9d370bc8f6..270cf2ca70 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -879,7 +879,6 @@ int main(int argc, char **argv)
  * the real value of GUEST_BASE into account.
  */
 tcg_prologue_init(tcg_ctx);
-tcg_region_init();
 
 /* build Task State */
 memset(ts, 0, sizeof(TaskState));
diff --git a/linux-user/main.c b/linux-user/main.c
index 4dfc47ad3b..2fb3a366a6 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -868,7 +868,6 @@ int main(int argc, char **argv, char **envp)
generating the prologue until now so that the prologue can take
the real value of GUEST_BASE into account.  */
 tcg_prologue_init(tcg_ctx);
-tcg_region_init();
 
 target_cpu_copy_regs(env, regs);
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0dc271aac9..1e683b80e4 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1206,32 +1206,18 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
 
 void tcg_prologue_init(TCGContext *s)
 {
-size_t prologue_size, total_size;
-void *buf0, *buf1;
+size_t prologue_size;
 
 /* Put the prologue at the beginning of code_gen_buffer.  */
-buf0 = s->code_gen_buffer;
-total_size = s->code_gen_buffer_size;
-s->code_ptr = buf0;
-s->code_buf = buf0;
+tcg_region_assign(s, 0);
+s->code_ptr = s->code_gen_ptr;
+s->code_buf = s->code_gen_ptr;
 s->data_gen_ptr = NULL;
 
-/*
- * The region trees are not yet configured, but tcg_splitwx_to_rx
- * needs the bounds for an assert.
- */
-region.start = buf0;
-region.end = buf0 + total_size;
-
 #ifndef CONFIG_TCG_INTERPRETER
-tcg_qemu_tb_exec = (tcg_prologue_fn *)tcg_splitwx_to_rx(buf0);
+tcg_qemu_tb_exec = (tcg_prologue_fn *)tcg_splitwx_to_rx(s->code_ptr);
 #endif
 
-/* Compute a high-water mark, at which we voluntarily flush the buffer
-   and start over.  The size here is arbitrary, significantly larger
-   than we expect the code generation for any one opcode to require.  */
-s->code_gen_highwater = s->code_gen_buffer + (total_size - TCG_HIGHWATER);
-
 #ifdef TCG_TARGET_NEED_POOL_LABELS
 s->pool_labels = NULL;
 #endif
@@ -1248,32 +1234,32 @@ void tcg_prologue_init(TCGContext *s)
 }
 #endif
 
-buf1 = s->code_ptr;
+prologue_size = tcg_current_code_size(s);
+
 #ifndef CONFIG_TCG_INTERPRETER
-flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(buf0), (uintptr_t)buf0,
-tcg_ptr_byte_diff(buf1, buf0));
+flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(s->code_buf),
+(uintptr_t)s->code_buf, prologue_size);
 #endif
 
-/* Deduct the prologue from the buffer.  */
-prologue_size = tcg_current_code_size(s);
-s->code_gen_ptr = buf1;
-s->code_gen_buffer = buf1;
-s->code_buf = buf1;
-total_size -= prologue_size;
-s->code_gen_buffer_size = total_size;
+/* Deduct the prologue from the first region.  */
+region.start = s->code_ptr;
 
-tcg_register_jit(tcg_splitwx_to_rx(s->code_gen_buffer), total_size);
+/* Recompute boundaries of the first region. */
+tcg_region_assign(s, 0);
+
+tcg_register_jit(tcg_splitwx_to_rx(region.start),
+ region.end -

[PULL 04/34] tcg: Remove error return from tcg_region_initial_alloc__locked

2021-06-11 Thread Richard Henderson

All callers immediately assert on error, so move the assert
into the function itself.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 1e683b80e4..ba690e0483 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -720,9 +720,10 @@ static bool tcg_region_alloc(TCGContext *s)
  * Perform a context's first region allocation.
  * This function does _not_ increment region.agg_size_full.
  */
-static inline bool tcg_region_initial_alloc__locked(TCGContext *s)
+static void tcg_region_initial_alloc__locked(TCGContext *s)
 {
-return tcg_region_alloc__locked(s);
+bool err = tcg_region_alloc__locked(s);
+g_assert(!err);
 }
 
 /* Call from a safe-work context */
@@ -737,9 +738,7 @@ void tcg_region_reset_all(void)
 
 for (i = 0; i < n_ctxs; i++) {
 TCGContext *s = qatomic_read(_ctxs[i]);
-bool err = tcg_region_initial_alloc__locked(s);
-
-g_assert(!err);
+tcg_region_initial_alloc__locked(s);
 }
 qemu_mutex_unlock();
 
@@ -876,11 +875,7 @@ void tcg_region_init(void)
 
 /* In user-mode we support only one ctx, so do the initial allocation now 
*/
 #ifdef CONFIG_USER_ONLY
-{
-bool err = tcg_region_initial_alloc__locked(tcg_ctx);
-
-g_assert(!err);
-}
+tcg_region_initial_alloc__locked(tcg_ctx);
 #endif
 }
 
@@ -942,7 +937,6 @@ void tcg_register_thread(void)
 MachineState *ms = MACHINE(qdev_get_machine());
 TCGContext *s = g_malloc(sizeof(*s));
 unsigned int i, n;
-bool err;
 
 *s = tcg_init_ctx;
 
@@ -966,8 +960,7 @@ void tcg_register_thread(void)
 
 tcg_ctx = s;
 qemu_mutex_lock();
-err = tcg_region_initial_alloc__locked(tcg_ctx);
-g_assert(!err);
+tcg_region_initial_alloc__locked(s);
 qemu_mutex_unlock();
 }
 #endif /* !CONFIG_USER_ONLY */
-- 
2.25.1

[PULL 31/34] tcg: Fix documentation for tcg_constant_* vs tcg_temp_free_*

2021-06-11 Thread Richard Henderson

At some point during the development of tcg_constant_*, I changed
my mind about whether such temps should be able to be passed to
tcg_temp_free_*.  The final version committed allows this, but the
commentary was not updated to match.

Fixes: c0522136adf
Reported-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 1d056ed0ed..064dab383b 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -1095,7 +1095,8 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec);
 
 /*
  * Locate or create a read-only temporary that is a constant.
- * This kind of temporary need not and should not be freed.
+ * This kind of temporary need not be freed, but for convenience
+ * will be silently ignored by tcg_temp_free_*.
  */
 TCGTemp *tcg_constant_internal(TCGType type, int64_t val);
 
-- 
2.25.1

[PULL 06/34] tcg: Split out tcg_region_prologue_set

2021-06-11 Thread Richard Henderson

This has only one user, but will make more sense after some
code motion.

Always leave the tcg_init_ctx initialized to the first region,
in preparation for tcg_prologue_init().  This also requires
that we don't re-allocate the region for the first cpu, lest
we hit the assertion for total number of regions allocated .

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 36ea21d596..eca72990c1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -882,10 +882,26 @@ void tcg_region_init(void)
 
 tcg_region_trees_init();
 
-/* In user-mode we support only one ctx, so do the initial allocation now 
*/
-#ifdef CONFIG_USER_ONLY
-tcg_region_initial_alloc__locked(tcg_ctx);
-#endif
+/*
+ * Leave the initial context initialized to the first region.
+ * This will be the context into which we generate the prologue.
+ * It is also the only context for CONFIG_USER_ONLY.
+ */
+tcg_region_initial_alloc__locked(_init_ctx);
+}
+
+static void tcg_region_prologue_set(TCGContext *s)
+{
+/* Deduct the prologue from the first region.  */
+g_assert(region.start == s->code_gen_buffer);
+region.start = s->code_ptr;
+
+/* Recompute boundaries of the first region. */
+tcg_region_assign(s, 0);
+
+/* Register the balance of the buffer with gdb. */
+tcg_register_jit(tcg_splitwx_to_rx(region.start),
+ region.end - region.start);
 }
 
 #ifdef CONFIG_DEBUG_TCG
@@ -965,10 +981,10 @@ void tcg_register_thread(void)
 
 if (n > 0) {
 alloc_tcg_plugin_context(s);
+tcg_region_initial_alloc(s);
 }
 
 tcg_ctx = s;
-tcg_region_initial_alloc(s);
 }
 #endif /* !CONFIG_USER_ONLY */
 
@@ -1208,8 +1224,6 @@ void tcg_prologue_init(TCGContext *s)
 {
 size_t prologue_size;
 
-/* Put the prologue at the beginning of code_gen_buffer.  */
-tcg_region_assign(s, 0);
 s->code_ptr = s->code_gen_ptr;
 s->code_buf = s->code_gen_ptr;
 s->data_gen_ptr = NULL;
@@ -1241,14 +1255,7 @@ void tcg_prologue_init(TCGContext *s)
 (uintptr_t)s->code_buf, prologue_size);
 #endif
 
-/* Deduct the prologue from the first region.  */
-region.start = s->code_ptr;
-
-/* Recompute boundaries of the first region. */
-tcg_region_assign(s, 0);
-
-tcg_register_jit(tcg_splitwx_to_rx(region.start),
- region.end - region.start);
+tcg_region_prologue_set(s);
 
 #ifdef DEBUG_DISAS
 if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM)) {
-- 
2.25.1

[PULL 00/34] tcg patch queue

2021-06-11 Thread Richard Henderson

This is mostly my code_gen_buffer cleanup, plus a few other random
changes thrown in.  Including a fix for a recent float32_exp2 bug.


r~


The following changes since commit 894fc4fd670aaf04a67dc7507739f914ff4bacf2:

  Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into 
staging (2021-06-11 09:21:48 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210611

for you to fetch changes up to 60afaddc208d34f6dc86dd974f6e02724fba6eb6:

  docs/devel: Explain in more detail the TB chaining mechanisms (2021-06-11 
09:41:25 -0700)


Clean up code_gen_buffer allocation.
Add tcg_remove_ops_after.
Fix tcg_constant_* documentation.
Improve TB chaining documentation.
Fix float32_exp2.


Jose R. Ziviani (1):
  tcg/arm: Fix tcg_out_op function signature

Luis Pires (1):
  docs/devel: Explain in more detail the TB chaining mechanisms

Richard Henderson (32):
  meson: Split out tcg/meson.build
  meson: Split out fpu/meson.build
  tcg: Re-order tcg_region_init vs tcg_prologue_init
  tcg: Remove error return from tcg_region_initial_alloc__locked
  tcg: Split out tcg_region_initial_alloc
  tcg: Split out tcg_region_prologue_set
  tcg: Split out region.c
  accel/tcg: Inline cpu_gen_init
  accel/tcg: Move alloc_code_gen_buffer to tcg/region.c
  accel/tcg: Rename tcg_init to tcg_init_machine
  tcg: Create tcg_init
  accel/tcg: Merge tcg_exec_init into tcg_init_machine
  accel/tcg: Use MiB in tcg_init_machine
  accel/tcg: Pass down max_cpus to tcg_init
  tcg: Introduce tcg_max_ctxs
  tcg: Move MAX_CODE_GEN_BUFFER_SIZE to tcg-target.h
  tcg: Replace region.end with region.total_size
  tcg: Rename region.start to region.after_prologue
  tcg: Tidy tcg_n_regions
  tcg: Tidy split_cross_256mb
  tcg: Move in_code_gen_buffer and tests to region.c
  tcg: Allocate code_gen_buffer into struct tcg_region_state
  tcg: Return the map protection from alloc_code_gen_buffer
  tcg: Sink qemu_madvise call to common code
  util/osdep: Add qemu_mprotect_rw
  tcg: Round the tb_size default from qemu_get_host_physmem
  tcg: Merge buffer protection and guard page protection
  tcg: When allocating for !splitwx, begin with PROT_NONE
  tcg: Move tcg_init_ctx and tcg_ctx from accel/tcg/
  tcg: Introduce tcg_remove_ops_after
  tcg: Fix documentation for tcg_constant_* vs tcg_temp_free_*
  softfloat: Fix tp init in float32_exp2

 docs/devel/tcg.rst| 101 -
 meson.build   |  12 +-
 accel/tcg/internal.h  |   2 +
 include/qemu/osdep.h  |   1 +
 include/sysemu/tcg.h  |   2 -
 include/tcg/tcg.h |  28 +-
 tcg/aarch64/tcg-target.h  |   1 +
 tcg/arm/tcg-target.h  |   1 +
 tcg/i386/tcg-target.h |   2 +
 tcg/mips/tcg-target.h |   6 +
 tcg/ppc/tcg-target.h  |   2 +
 tcg/riscv/tcg-target.h|   1 +
 tcg/s390/tcg-target.h |   3 +
 tcg/sparc/tcg-target.h|   1 +
 tcg/tcg-internal.h|  40 ++
 tcg/tci/tcg-target.h  |   1 +
 accel/tcg/tcg-all.c   |  32 +-
 accel/tcg/translate-all.c | 439 +---
 bsd-user/main.c   |   3 +-
 fpu/softfloat.c   |   2 +-
 linux-user/main.c |   1 -
 tcg/region.c  | 999 ++
 tcg/tcg.c | 649 +++---
 util/osdep.c  |   9 +
 tcg/arm/tcg-target.c.inc  |   3 +-
 fpu/meson.build   |   1 +
 tcg/meson.build   |  14 +
 27 files changed, 1266 insertions(+), 1090 deletions(-)
 create mode 100644 tcg/tcg-internal.h
 create mode 100644 tcg/region.c
 create mode 100644 fpu/meson.build
 create mode 100644 tcg/meson.build

[PULL 11/34] tcg: Create tcg_init

2021-06-11 Thread Richard Henderson

Perform both tcg_context_init and tcg_region_init.
Do not leave this split to the caller.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h | 3 +--
 tcg/tcg-internal.h| 1 +
 accel/tcg/translate-all.c | 3 +--
 tcg/tcg.c | 9 -
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 834785fc23..b3304ce095 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -873,7 +873,6 @@ void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 TranslationBlock *tcg_tb_alloc(TCGContext *s);
 
-void tcg_region_init(size_t tb_size, int splitwx);
 void tb_destroy(TranslationBlock *tb);
 void tcg_region_reset_all(void);
 
@@ -906,7 +905,7 @@ static inline void *tcg_malloc(int size)
 }
 }
 
-void tcg_context_init(TCGContext *s);
+void tcg_init(size_t tb_size, int splitwx);
 void tcg_register_thread(void);
 void tcg_prologue_init(TCGContext *s);
 void tcg_func_start(TCGContext *s);
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index b1dda343c2..f13c564d9b 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -30,6 +30,7 @@
 extern TCGContext **tcg_ctxs;
 extern unsigned int n_tcg_ctxs;
 
+void tcg_region_init(size_t tb_size, int splitwx);
 bool tcg_region_alloc(TCGContext *s);
 void tcg_region_initial_alloc(TCGContext *s);
 void tcg_region_prologue_set(TCGContext *s);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ad7a25d9f0..4f563b8724 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -920,10 +920,9 @@ static void tb_htable_init(void)
 void tcg_exec_init(unsigned long tb_size, int splitwx)
 {
 tcg_allowed = true;
-tcg_context_init(_init_ctx);
 page_init();
 tb_htable_init();
-tcg_region_init(tb_size, splitwx);
+tcg_init(tb_size, splitwx);
 
 #if defined(CONFIG_SOFTMMU)
 /* There's no guest base to take into account, so go ahead and
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8c43c0f316..2625d9e502 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -576,8 +576,9 @@ static void process_op_defs(TCGContext *s);
 static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
 TCGReg reg, const char *name);
 
-void tcg_context_init(TCGContext *s)
+static void tcg_context_init(void)
 {
+TCGContext *s = _init_ctx;
 int op, total_args, n, i;
 TCGOpDef *def;
 TCGArgConstraint *args_ct;
@@ -654,6 +655,12 @@ void tcg_context_init(TCGContext *s)
 cpu_env = temp_tcgv_ptr(ts);
 }
 
+void tcg_init(size_t tb_size, int splitwx)
+{
+tcg_context_init();
+tcg_region_init(tb_size, splitwx);
+}
+
 /*
  * Allocate TBs right before their corresponding translated code, making
  * sure that TBs and code are on different cache lines.
-- 
2.25.1

[PULL 02/34] meson: Split out fpu/meson.build

2021-06-11 Thread Richard Henderson

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 meson.build | 4 +---
 fpu/meson.build | 1 +
 2 files changed, 2 insertions(+), 3 deletions(-)
 create mode 100644 fpu/meson.build

diff --git a/meson.build b/meson.build
index b5b2cf9e04..a2311eda6e 100644
--- a/meson.build
+++ b/meson.build
@@ -1968,9 +1968,6 @@ subdir('softmmu')
 
 common_ss.add(capstone)
 specific_ss.add(files('cpu.c', 'disas.c', 'gdbstub.c'), capstone)
-specific_ss.add(when: 'CONFIG_TCG', if_true: files(
-  'fpu/softfloat.c',
-))
 
 # Work around a gcc bug/misfeature wherein constant propagation looks
 # through an alias:
@@ -2001,6 +1998,7 @@ subdir('replay')
 subdir('semihosting')
 subdir('hw')
 subdir('tcg')
+subdir('fpu')
 subdir('accel')
 subdir('plugins')
 subdir('bsd-user')
diff --git a/fpu/meson.build b/fpu/meson.build
new file mode 100644
index 00..1a9992ded5
--- /dev/null
+++ b/fpu/meson.build
@@ -0,0 +1 @@
+specific_ss.add(when: 'CONFIG_TCG', if_true: files('softfloat.c'))
-- 
2.25.1

[PULL 05/34] tcg: Split out tcg_region_initial_alloc

2021-06-11 Thread Richard Henderson

This has only one user, and currently needs an ifdef,
but will make more sense after some code motion.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ba690e0483..36ea21d596 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -726,6 +726,15 @@ static void tcg_region_initial_alloc__locked(TCGContext *s)
 g_assert(!err);
 }
 
+#ifndef CONFIG_USER_ONLY
+static void tcg_region_initial_alloc(TCGContext *s)
+{
+qemu_mutex_lock();
+tcg_region_initial_alloc__locked(s);
+qemu_mutex_unlock();
+}
+#endif
+
 /* Call from a safe-work context */
 void tcg_region_reset_all(void)
 {
@@ -959,9 +968,7 @@ void tcg_register_thread(void)
 }
 
 tcg_ctx = s;
-qemu_mutex_lock();
-tcg_region_initial_alloc__locked(s);
-qemu_mutex_unlock();
+tcg_region_initial_alloc(s);
 }
 #endif /* !CONFIG_USER_ONLY */
 
-- 
2.25.1

[PULL 01/34] meson: Split out tcg/meson.build

2021-06-11 Thread Richard Henderson

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 meson.build |  8 +---
 tcg/meson.build | 13 +
 2 files changed, 14 insertions(+), 7 deletions(-)
 create mode 100644 tcg/meson.build

diff --git a/meson.build b/meson.build
index d2a9ce91f5..b5b2cf9e04 100644
--- a/meson.build
+++ b/meson.build
@@ -1970,14 +1970,7 @@ common_ss.add(capstone)
 specific_ss.add(files('cpu.c', 'disas.c', 'gdbstub.c'), capstone)
 specific_ss.add(when: 'CONFIG_TCG', if_true: files(
   'fpu/softfloat.c',
-  'tcg/optimize.c',
-  'tcg/tcg-common.c',
-  'tcg/tcg-op-gvec.c',
-  'tcg/tcg-op-vec.c',
-  'tcg/tcg-op.c',
-  'tcg/tcg.c',
 ))
-specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
 
 # Work around a gcc bug/misfeature wherein constant propagation looks
 # through an alias:
@@ -2007,6 +2000,7 @@ subdir('net')
 subdir('replay')
 subdir('semihosting')
 subdir('hw')
+subdir('tcg')
 subdir('accel')
 subdir('plugins')
 subdir('bsd-user')
diff --git a/tcg/meson.build b/tcg/meson.build
new file mode 100644
index 00..84064a341e
--- /dev/null
+++ b/tcg/meson.build
@@ -0,0 +1,13 @@
+tcg_ss = ss.source_set()
+
+tcg_ss.add(files(
+  'optimize.c',
+  'tcg.c',
+  'tcg-common.c',
+  'tcg-op.c',
+  'tcg-op-gvec.c',
+  'tcg-op-vec.c',
+))
+tcg_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tci.c'))
+
+specific_ss.add_all(when: 'CONFIG_TCG', if_true: tcg_ss)
-- 
2.25.1

[PULL 10/34] accel/tcg: Rename tcg_init to tcg_init_machine

2021-06-11 Thread Richard Henderson

We shortly want to use tcg_init for something else.
Since the hook is called init_machine, match that.

Reviewed-by: Luis Pires 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-all.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index f132033999..30d81ff7f5 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -105,7 +105,7 @@ static void tcg_accel_instance_init(Object *obj)
 
 bool mttcg_enabled;
 
-static int tcg_init(MachineState *ms)
+static int tcg_init_machine(MachineState *ms)
 {
 TCGState *s = TCG_STATE(current_accel());
 
@@ -189,7 +189,7 @@ static void tcg_accel_class_init(ObjectClass *oc, void 
*data)
 {
 AccelClass *ac = ACCEL_CLASS(oc);
 ac->name = "tcg";
-ac->init_machine = tcg_init;
+ac->init_machine = tcg_init_machine;
 ac->allowed = _allowed;
 
 object_class_property_add_str(oc, "thread",
-- 
2.25.1

[PATCH 3/8] util: Use real functions for thread-posix QemuRecMutex

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

Move the declarations from thread-win32.h into thread.h
and remove the macro redirection from thread-posix.h.
This will be required by following cleanups.

Signed-off-by: Richard Henderson 
---
 include/qemu/thread-posix.h |  4 
 include/qemu/thread-win32.h |  6 --
 include/qemu/thread.h   |  9 ++---
 util/qemu-thread-posix.c| 20 
 4 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index c903525062..cf8bc90468 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -5,10 +5,6 @@
 #include 
 
 typedef QemuMutex QemuRecMutex;
-#define qemu_rec_mutex_destroy qemu_mutex_destroy
-#define qemu_rec_mutex_lock_implqemu_mutex_lock_impl
-#define qemu_rec_mutex_trylock_impl qemu_mutex_trylock_impl
-#define qemu_rec_mutex_unlock qemu_mutex_unlock
 
 struct QemuMutex {
 pthread_mutex_t lock;
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index d0a1a9597e..d95af4498f 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -18,12 +18,6 @@ struct QemuRecMutex {
 bool initialized;
 };
 
-void qemu_rec_mutex_destroy(QemuRecMutex *mutex);
-void qemu_rec_mutex_lock_impl(QemuRecMutex *mutex, const char *file, int line);
-int qemu_rec_mutex_trylock_impl(QemuRecMutex *mutex, const char *file,
-int line);
-void qemu_rec_mutex_unlock(QemuRecMutex *mutex);
-
 struct QemuCond {
 CONDITION_VARIABLE var;
 bool initialized;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 5435763184..2c0d85f3bc 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -28,6 +28,12 @@ int qemu_mutex_trylock_impl(QemuMutex *mutex, const char 
*file, const int line);
 void qemu_mutex_lock_impl(QemuMutex *mutex, const char *file, const int line);
 void qemu_mutex_unlock_impl(QemuMutex *mutex, const char *file, const int 
line);
 
+void qemu_rec_mutex_init(QemuRecMutex *mutex);
+void qemu_rec_mutex_destroy(QemuRecMutex *mutex);
+void qemu_rec_mutex_lock_impl(QemuRecMutex *mutex, const char *file, int line);
+int qemu_rec_mutex_trylock_impl(QemuRecMutex *mutex, const char *file, int 
line);
+void qemu_rec_mutex_unlock(QemuRecMutex *mutex);
+
 typedef void (*QemuMutexLockFunc)(QemuMutex *m, const char *f, int l);
 typedef int (*QemuMutexTrylockFunc)(QemuMutex *m, const char *f, int l);
 typedef void (*QemuRecMutexLockFunc)(QemuRecMutex *m, const char *f, int l);
@@ -129,9 +135,6 @@ static inline int (qemu_rec_mutex_trylock)(QemuRecMutex 
*mutex)
 return qemu_rec_mutex_trylock(mutex);
 }
 
-/* Prototypes for other functions are in thread-posix.h/thread-win32.h.  */
-void qemu_rec_mutex_init(QemuRecMutex *mutex);
-
 void qemu_cond_init(QemuCond *cond);
 void qemu_cond_destroy(QemuCond *cond);
 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index dcff5e7c5d..8e2b6653f5 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -124,6 +124,26 @@ void qemu_rec_mutex_init(QemuRecMutex *mutex)
 mutex->initialized = true;
 }
 
+void qemu_rec_mutex_destroy(QemuRecMutex *mutex)
+{
+qemu_mutex_destroy(mutex);
+}
+
+void qemu_rec_mutex_lock_impl(QemuRecMutex *mutex, const char *file, int line)
+{
+qemu_mutex_lock_impl(mutex, file, line);
+}
+
+int qemu_rec_mutex_trylock_impl(QemuRecMutex *mutex, const char *file, int 
line)
+{
+return qemu_mutex_trylock_impl(mutex, file, line);
+}
+
+void qemu_rec_mutex_unlock(QemuRecMutex *mutex)
+{
+qemu_mutex_unlock(mutex);
+}
+
 void qemu_cond_init(QemuCond *cond)
 {
 int err;
-- 
2.25.1

Re: [PATCH 2/2] nbd: Add new qemu:joint-allocation metadata context

2021-06-11 Thread Nir Soffer

On Wed, Jun 9, 2021 at 9:01 PM Eric Blake  wrote:
>
> When trying to reconstruct a qcow2 chain using information provided
> over NBD, ovirt had been relying on an unsafe assumption that any
> portion of the qcow2 file advertised as sparse would defer to the
> backing image; this worked with what qemu 5.2 reports for a qcow2 BSD
> loaded with "backing":null.  However, in 6.0, commit 0da9856851 (nbd:
> server: Report holes for raw images) also had a side-effect of
> reporting unallocated zero clusters in qcow2 files as sparse.  This
> change is correct from the NBD spec perspective (advertising bits has
> always been optional based on how much information the server has
> available, and should only be used to optimize behavior when a bit is
> set, while not assuming semantics merely because a bit is clear), but
> means that a qcow2 file that uses an unallocated zero cluster to
> override a backing file now shows up as sparse over NBD, and causes
> ovirt to fail to reproduce that cluster (ie. ovirt was assuming it
> only had to write clusters where the bit was clear, and the 6.0
> behavior change shows the flaw in that assumption).
>
> The correct fix is for ovirt to additionally use the
> qemu:allocation-depth metadata context added in 5.2: after all, the
> actual determination for what is needed to recreate a qcow2 file is
> not whether a cluster is sparse, but whether the allocation-depth
> shows the cluster to be local.  But reproducing an image is more
> efficient when handling known-zero clusters, which means that ovirt
> has to track both base:allocation and qemu:allocation-depth metadata
> contexts simultaneously.  While NBD_CMD_BLOCK_STATUS is just fine
> sending back information for two contexts in parallel, it comes with
> some bookkeeping overhead at the client side: the two contexts need
> not report the same length of replies, and it involves more network
> traffic.

Since this change is not simple, and the chance that we also get the dirty
bitmap included in the result seems to be very low, I decided to check the
direction of merging multiple extents.

I started with merging "base:allocation" and "qemu:dirty-bitmap:xxx" since
we already have both. It was not hard to do, although it is not completely
tested yet.

Here is the merging code:
https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/ovirt_imageio/_internal/nbdutil.py

To make merging easy and safe, we map the NBD_STATE_DIRTY bit to a private bit
so it cannot clash with the NBD_STATE_HOLE bit:
https://gerrit.ovirt.org/c/ovirt-imageio/+/115215/1/daemon/ovirt_imageio/_internal/nbd.py

Here is a functional test using qemu-nbd showing that it works:
https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/test/client_test.py

I'll try to use "qemu:allocation-depth" in a similar way next week, probably
mapping depth > 0 to EXTENT_EXISTS, to use when reporting holes in
single qcow2 images.

If this is successful, we can start using this in the next ovirt release, and we
don't need "qemu:joint-allocation".

Nir

[PATCH 8/8] configure: Remove probe for _Static_assert

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

_Static_assert is part of C11, which is now required.

Signed-off-by: Richard Henderson 
---
 configure   | 18 --
 include/qemu/compiler.h | 11 ---
 2 files changed, 29 deletions(-)

diff --git a/configure b/configure
index 0489864667..debd50c085 100755
--- a/configure
+++ b/configure
@@ -5090,20 +5090,6 @@ if compile_prog "" "" ; then
 have_sysmacros=yes
 fi
 
-##
-# check for _Static_assert()
-
-have_static_assert=no
-cat > $TMPC << EOF
-_Static_assert(1, "success");
-int main(void) {
-return 0;
-}
-EOF
-if compile_prog "" "" ; then
-have_static_assert=yes
-fi
-
 ##
 # check for utmpx.h, it is missing e.g. on OpenBSD
 
@@ -6035,10 +6021,6 @@ if test "$have_sysmacros" = "yes" ; then
   echo "CONFIG_SYSMACROS=y" >> $config_host_mak
 fi
 
-if test "$have_static_assert" = "yes" ; then
-  echo "CONFIG_STATIC_ASSERT=y" >> $config_host_mak
-fi
-
 if test "$have_utmpx" = "yes" ; then
   echo "HAVE_UTMPX=y" >> $config_host_mak
 fi
diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 5766d61589..3baa5e3790 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -72,18 +72,7 @@
 int:(x) ? -1 : 1; \
 }
 
-/* QEMU_BUILD_BUG_MSG() emits the message given if _Static_assert is
- * supported; otherwise, it will be omitted from the compiler error
- * message (but as it remains present in the source code, it can still
- * be useful when debugging). */
-#if defined(CONFIG_STATIC_ASSERT)
 #define QEMU_BUILD_BUG_MSG(x, msg) _Static_assert(!(x), msg)
-#elif defined(__COUNTER__)
-#define QEMU_BUILD_BUG_MSG(x, msg) typedef QEMU_BUILD_BUG_ON_STRUCT(x) \
-glue(qemu_build_bug_on__, __COUNTER__) __attribute__((unused))
-#else
-#define QEMU_BUILD_BUG_MSG(x, msg)
-#endif
 
 #define QEMU_BUILD_BUG_ON(x) QEMU_BUILD_BUG_MSG(x, "not expecting: " #x)
 
-- 
2.25.1

[PATCH 6/8] include/qemu/lockable: Use _Generic instead of QEMU_GENERIC

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

This is both more and less complicated than our expansion
using __builtin_choose_expr and __builtin_types_compatible_p.

The expansion through QEMU_MAKE_LOCKABLE_ doesn't work because
we're not emumerating all of the types within the same _Generic,
which results in errors about unhandled cases.  We must also
handle void* explicitly, so that the NULL constant can be used.

Signed-off-by: Richard Henderson 
---
 include/qemu/lockable.h | 85 +
 1 file changed, 43 insertions(+), 42 deletions(-)

diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
index b620023141..9118d54200 100644
--- a/include/qemu/lockable.h
+++ b/include/qemu/lockable.h
@@ -24,19 +24,6 @@ struct QemuLockable {
 QemuLockUnlockFunc *unlock;
 };
 
-/* This function gives an error if an invalid, non-NULL pointer type is passed
- * to QEMU_MAKE_LOCKABLE.  For optimized builds, we can rely on dead-code 
elimination
- * from the compiler, and give the errors already at link time.
- */
-#if defined(__OPTIMIZE__) && !defined(__SANITIZE_ADDRESS__)
-void unknown_lock_type(void *);
-#else
-static inline void unknown_lock_type(void *unused)
-{
-abort();
-}
-#endif
-
 static inline __attribute__((__always_inline__)) QemuLockable *
 qemu_make_lockable(void *x, QemuLockable *lockable)
 {
@@ -46,57 +33,71 @@ qemu_make_lockable(void *x, QemuLockable *lockable)
 return x ? lockable : NULL;
 }
 
-/* Auxiliary macros to simplify QEMU_MAKE_LOCABLE.  */
-#define QEMU_LOCK_FUNC(x) ((QemuLockUnlockFunc *)\
-QEMU_GENERIC(x,  \
- (QemuMutex *, qemu_mutex_lock), \
- (QemuRecMutex *, qemu_rec_mutex_lock), \
- (CoMutex *, qemu_co_mutex_lock),\
- (QemuSpin *, qemu_spin_lock),   \
- unknown_lock_type))
+static inline __attribute__((__always_inline__)) QemuLockable *
+qemu_null_lockable(void *x)
+{
+if (x != NULL) {
+qemu_build_not_reached();
+}
+return NULL;
+}
 
-#define QEMU_UNLOCK_FUNC(x) ((QemuLockUnlockFunc *)  \
-QEMU_GENERIC(x,  \
- (QemuMutex *, qemu_mutex_unlock),   \
- (QemuRecMutex *, qemu_rec_mutex_unlock), \
- (CoMutex *, qemu_co_mutex_unlock),  \
- (QemuSpin *, qemu_spin_unlock), \
- unknown_lock_type))
+/* Auxiliary macros to simplify QEMU_MAKE_LOCABLE.  */
+#define QEMU_LOCK_FUNC(x) ((QemuLockUnlockFunc *)  \
+_Generic((x), QemuMutex *: qemu_mutex_lock,\
+  QemuRecMutex *: qemu_rec_mutex_lock, \
+  CoMutex *: qemu_co_mutex_lock,   \
+  QemuSpin *: qemu_spin_lock))
+
+#define QEMU_UNLOCK_FUNC(x) ((QemuLockUnlockFunc *)  \
+_Generic((x), QemuMutex *: qemu_mutex_unlock,\
+  QemuRecMutex *: qemu_rec_mutex_unlock, \
+  CoMutex *: qemu_co_mutex_unlock,   \
+  QemuSpin *: qemu_spin_unlock))
 
 /* In C, compound literals have the lifetime of an automatic variable.
  * In C++ it would be different, but then C++ wouldn't need QemuLockable
  * either...
  */
-#define QEMU_MAKE_LOCKABLE_(x) (&(QemuLockable) { \
-.object = (x),   \
-.lock = QEMU_LOCK_FUNC(x),   \
-.unlock = QEMU_UNLOCK_FUNC(x),   \
+#define QML_OBJ_(x, name) (&(QemuLockable) {\
+.object = (x),  \
+.lock = (QemuLockUnlockFunc *) qemu_ ## name ## _lock,  \
+.unlock = (QemuLockUnlockFunc *) qemu_ ## name ## _unlock   \
 })
 
 /* QEMU_MAKE_LOCKABLE - Make a polymorphic QemuLockable
  *
- * @x: a lock object (currently one of QemuMutex, QemuRecMutex, CoMutex, 
QemuSpin).
+ * @x: a lock object (currently one of QemuMutex, QemuRecMutex,
+ * CoMutex, QemuSpin).
  *
  * Returns a QemuLockable object that can be passed around
  * to a function that can operate with locks of any kind, or
  * NULL if @x is %NULL.
+ *
+ * Note the special case for void *, so that we may pass "NULL".
  */
-#define QEMU_MAKE_LOCKABLE(x)\
-QEMU_GENERIC(x,  \
- (QemuLockable *, (x)),  \
- qemu_make_lockable((x), QEMU_MAKE_LOCKABLE_(x)))
+#define QEMU_MAKE_LOCKABLE(x)   \
+_Generic((x), QemuLockable *: (x),  \
+ void *: qemu_null_lockable(x), \
+ QemuMutex *: qemu_make_lockable(x, QML_OBJ_(x, mutex)),\
+ QemuRecMutex *: qemu_make_lockable(x, QML_OBJ_(x, rec_mutex)), \
+ CoMutex *: qemu_make_lockable(x, QML_OBJ_(x, co_mutex)),   \
+ QemuSpin *: qemu_make_lockable(x, QML_OBJ_(x,

[PATCH 5/8] util: Use unique type for QemuRecMutex in thread-posix.h

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

We will shortly convert lockable.h to _Generic, and we cannot
have two compatible types in the same expansion.  Wrap QemuMutex
in a struct, and unwrap in qemu-thread-posix.c.

Signed-off-by: Richard Henderson 
---
 include/qemu/thread-posix.h | 10 --
 util/qemu-thread-posix.c| 12 ++--
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index cf8bc90468..b792e6ef37 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -4,8 +4,6 @@
 #include 
 #include 
 
-typedef QemuMutex QemuRecMutex;
-
 struct QemuMutex {
 pthread_mutex_t lock;
 #ifdef CONFIG_DEBUG_MUTEX
@@ -15,6 +13,14 @@ struct QemuMutex {
 bool initialized;
 };
 
+/*
+ * QemuRecMutex cannot be a typedef of QemuMutex lest we have two
+ * compatible cases in _Generic.  See qemu/lockable.h.
+ */
+typedef struct QemuRecMutex {
+QemuMutex m;
+} QemuRecMutex;
+
 struct QemuCond {
 pthread_cond_t cond;
 bool initialized;
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index d990826ed8..fd9d714038 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -116,32 +116,32 @@ void qemu_rec_mutex_init(QemuRecMutex *mutex)
 
 pthread_mutexattr_init();
 pthread_mutexattr_settype(, PTHREAD_MUTEX_RECURSIVE);
-err = pthread_mutex_init(>lock, );
+err = pthread_mutex_init(>m.lock, );
 pthread_mutexattr_destroy();
 if (err) {
 error_exit(err, __func__);
 }
-mutex->initialized = true;
+mutex->m.initialized = true;
 }
 
 void qemu_rec_mutex_destroy(QemuRecMutex *mutex)
 {
-qemu_mutex_destroy(mutex);
+qemu_mutex_destroy(>m);
 }
 
 void qemu_rec_mutex_lock_impl(QemuRecMutex *mutex, const char *file, int line)
 {
-qemu_mutex_lock_impl(mutex, file, line);
+qemu_mutex_lock_impl(>m, file, line);
 }
 
 int qemu_rec_mutex_trylock_impl(QemuRecMutex *mutex, const char *file, int 
line)
 {
-return qemu_mutex_trylock_impl(mutex, file, line);
+return qemu_mutex_trylock_impl(>m, file, line);
 }
 
 void qemu_rec_mutex_unlock_impl(QemuRecMutex *mutex, const char *file, int 
line)
 {
-qemu_mutex_unlock_impl(mutex, file, line);
+qemu_mutex_unlock_impl(>m, file, line);
 }
 
 void qemu_cond_init(QemuCond *cond)
-- 
2.25.1

[PATCH 7/8] qemu/compiler: Remove QEMU_GENERIC

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

All previous users now use C11 _Generic.

Signed-off-by: Richard Henderson 
---
 include/qemu/compiler.h | 40 
 1 file changed, 40 deletions(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 091c45248b..5766d61589 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -173,46 +173,6 @@
 #define QEMU_ALWAYS_INLINE
 #endif
 
-/* Implement C11 _Generic via GCC builtins.  Example:
- *
- *QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
- *
- * The first argument is the discriminator.  The last is the default value.
- * The middle ones are tuples in "(type, expansion)" format.
- */
-
-/* First, find out the number of generic cases.  */
-#define QEMU_GENERIC(x, ...) \
-QEMU_GENERIC_(typeof(x), __VA_ARGS__, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
-
-/* There will be extra arguments, but they are not used.  */
-#define QEMU_GENERIC_(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, count, ...) \
-QEMU_GENERIC##count(x, a0, a1, a2, a3, a4, a5, a6, a7, a8, a9)
-
-/* Two more helper macros, this time to extract items from a parenthesized
- * list.
- */
-#define QEMU_FIRST_(a, b) a
-#define QEMU_SECOND_(a, b) b
-
-/* ... and a final one for the common part of the "recursion".  */
-#define QEMU_GENERIC_IF(x, type_then, else_)   
\
-__builtin_choose_expr(__builtin_types_compatible_p(x,  
\
-   QEMU_FIRST_ type_then), 
\
-  QEMU_SECOND_ type_then, else_)
-
-/* CPP poor man's "recursion".  */
-#define QEMU_GENERIC1(x, a0, ...) (a0)
-#define QEMU_GENERIC2(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC1(x, 
__VA_ARGS__))
-#define QEMU_GENERIC3(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC2(x, 
__VA_ARGS__))
-#define QEMU_GENERIC4(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC3(x, 
__VA_ARGS__))
-#define QEMU_GENERIC5(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC4(x, 
__VA_ARGS__))
-#define QEMU_GENERIC6(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC5(x, 
__VA_ARGS__))
-#define QEMU_GENERIC7(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC6(x, 
__VA_ARGS__))
-#define QEMU_GENERIC8(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC7(x, 
__VA_ARGS__))
-#define QEMU_GENERIC9(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC8(x, 
__VA_ARGS__))
-#define QEMU_GENERIC10(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC9(x, 
__VA_ARGS__))
-
 /**
  * qemu_build_not_reached()
  *
-- 
2.25.1

Re: [PATCH 0/8] configure: Change to -std=gnu11

2021-06-11 Thread Richard Henderson


On 6/11/21 4:33 PM, Richard Henderson wrote:

Now that we assume gcc 7.5 as a minimum, we have the option
of changing to a newer C standard.  The two major ones that
I think apply are _Generic and _Static_assert.


Poor editing there.  How about s/ones/new features/.


r~

[PATCH 1/8] configure: Use -std=gnu11

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

Now that the minimum gcc version is 7.5, we can use C11.
This will allow lots of cleanups to the code, currently
hidden behind macros in include/qemu/compiler.h.

Signed-off-by: Richard Henderson 
---
 configure   | 4 ++--
 meson.build | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 8dcb9965b2..0489864667 100755
--- a/configure
+++ b/configure
@@ -159,7 +159,7 @@ update_cxxflags() {
 # options which some versions of GCC's C++ compiler complain about
 # because they only make sense for C programs.
 QEMU_CXXFLAGS="$QEMU_CXXFLAGS -D__STDC_LIMIT_MACROS 
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS"
-CONFIGURE_CXXFLAGS=$(echo "$CONFIGURE_CFLAGS" | sed 
s/-std=gnu99/-std=gnu++11/)
+CONFIGURE_CXXFLAGS=$(echo "$CONFIGURE_CFLAGS" | sed 
s/-std=gnu11/-std=gnu++11/)
 for arg in $QEMU_CFLAGS; do
 case $arg in
 -Wstrict-prototypes|-Wmissing-prototypes|-Wnested-externs|\
@@ -538,7 +538,7 @@ QEMU_CFLAGS="-Wstrict-prototypes -Wredundant-decls 
$QEMU_CFLAGS"
 QEMU_CFLAGS="-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
$QEMU_CFLAGS"
 
 # Flags that are needed during configure but later taken care of by Meson
-CONFIGURE_CFLAGS="-std=gnu99 -Wall"
+CONFIGURE_CFLAGS="-std=gnu11 -Wall"
 CONFIGURE_LDFLAGS=
 
 
diff --git a/meson.build b/meson.build
index d2a9ce91f5..c070cb6aa7 100644
--- a/meson.build
+++ b/meson.build
@@ -1,5 +1,5 @@
 project('qemu', ['c'], meson_version: '>=0.55.0',
-default_options: ['warning_level=1', 'c_std=gnu99', 'cpp_std=gnu++11', 
'b_colorout=auto'] +
+default_options: ['warning_level=1', 'c_std=gnu11', 'cpp_std=gnu++11', 
'b_colorout=auto'] +
  (meson.version().version_compare('>=0.56.0') ? [ 
'b_staticpic=false' ] : []),
 version: run_command('head', meson.source_root() / 
'VERSION').stdout().strip())
 
-- 
2.25.1

[PATCH 0/8] configure: Change to -std=gnu11

2021-06-11 Thread Richard Henderson

Now that we assume gcc 7.5 as a minimum, we have the option
of changing to a newer C standard.  The two major ones that
I think apply are _Generic and _Static_assert.

While Paolo created a remarkably functional replacement for _Generic
using builtins, the error messages that you get out of the keyword
are *vastly* more intelligable, and the syntax is easier to read.

While I'd like to prefer _Static_assert over QEMU_BUILD_BUG_ON
going forward, and would like to convert existing uses, that is
a much bigger job.  Especially since the test condition is inverted.
In the meantime, can drop the configure detection.


r~


Richard Henderson (8):
  configure: Use -std=gnu11
  softfloat: Use _Generic instead of QEMU_GENERIC
  util: Use real functions for thread-posix QemuRecMutex
  util: Pass file+line to qemu_rec_mutex_unlock_impl
  util: Use unique type for QemuRecMutex in thread-posix.h
  include/qemu/lockable: Use _Generic instead of QEMU_GENERIC
  qemu/compiler: Remove QEMU_GENERIC
  configure: Remove probe for _Static_assert

 configure   | 22 +-
 meson.build |  2 +-
 include/qemu/compiler.h | 51 --
 include/qemu/lockable.h | 85 +++--
 include/qemu/thread-posix.h | 14 +++---
 include/qemu/thread-win32.h |  6 ---
 include/qemu/thread.h   | 15 ++-
 fpu/softfloat.c | 16 ---
 util/qemu-thread-posix.c| 24 ++-
 util/qemu-thread-win32.c|  2 +-
 10 files changed, 100 insertions(+), 137 deletions(-)

-- 
2.25.1

[PATCH 2/8] softfloat: Use _Generic instead of QEMU_GENERIC

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

Signed-off-by: Richard Henderson 
---
 fpu/softfloat.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1cb162882b..6f4aea7dee 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -686,11 +686,13 @@ static float128 float128_pack_raw(const FloatParts128 *p)
 #include "softfloat-specialize.c.inc"
 
 #define PARTS_GENERIC_64_128(NAME, P) \
-QEMU_GENERIC(P, (FloatParts128 *, parts128_##NAME), parts64_##NAME)
+_Generic((P), FloatParts64 *: parts64_##NAME, \
+  FloatParts128 *: parts128_##NAME)
 
 #define PARTS_GENERIC_64_128_256(NAME, P) \
-QEMU_GENERIC(P, (FloatParts256 *, parts256_##NAME), \
- (FloatParts128 *, parts128_##NAME), parts64_##NAME)
+_Generic((P), FloatParts64 *: parts64_##NAME, \
+  FloatParts128 *: parts128_##NAME, \
+  FloatParts256 *: parts256_##NAME)
 
 #define parts_default_nan(P, S)PARTS_GENERIC_64_128(default_nan, P)(P, S)
 #define parts_silence_nan(P, S)PARTS_GENERIC_64_128(silence_nan, P)(P, S)
@@ -892,11 +894,13 @@ static void parts128_log2(FloatParts128 *a, float_status 
*s, const FloatFmt *f);
  */
 
 #define FRAC_GENERIC_64_128(NAME, P) \
-QEMU_GENERIC(P, (FloatParts128 *, frac128_##NAME), frac64_##NAME)
+_Generic((P), FloatParts64 *: frac64_##NAME, \
+  FloatParts128 *: frac128_##NAME)
 
 #define FRAC_GENERIC_64_128_256(NAME, P) \
-QEMU_GENERIC(P, (FloatParts256 *, frac256_##NAME), \
- (FloatParts128 *, frac128_##NAME), frac64_##NAME)
+_Generic((P), FloatParts64 *: frac64_##NAME, \
+  FloatParts128 *: frac128_##NAME, \
+  FloatParts256 *: frac256_##NAME)
 
 static bool frac64_add(FloatParts64 *r, FloatParts64 *a, FloatParts64 *b)
 {
-- 
2.25.1

[PATCH 4/8] util: Pass file+line to qemu_rec_mutex_unlock_impl

2021-06-11 Thread Richard Henderson

From: Richard Henderson 

Create macros for file+line expansion in qemu_rec_mutex_unlock
like we have for qemu_mutex_unlock.

Signed-off-by: Richard Henderson 
---
 include/qemu/thread.h| 10 +-
 util/qemu-thread-posix.c |  4 ++--
 util/qemu-thread-win32.c |  2 +-
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 2c0d85f3bc..460568d67d 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -32,7 +32,7 @@ void qemu_rec_mutex_init(QemuRecMutex *mutex);
 void qemu_rec_mutex_destroy(QemuRecMutex *mutex);
 void qemu_rec_mutex_lock_impl(QemuRecMutex *mutex, const char *file, int line);
 int qemu_rec_mutex_trylock_impl(QemuRecMutex *mutex, const char *file, int 
line);
-void qemu_rec_mutex_unlock(QemuRecMutex *mutex);
+void qemu_rec_mutex_unlock_impl(QemuRecMutex *mutex, const char *file, int 
line);
 
 typedef void (*QemuMutexLockFunc)(QemuMutex *m, const char *f, int l);
 typedef int (*QemuMutexTrylockFunc)(QemuMutex *m, const char *f, int l);
@@ -110,6 +110,9 @@ extern QemuCondTimedWaitFunc qemu_cond_timedwait_func;
 #define qemu_mutex_unlock(mutex) \
 qemu_mutex_unlock_impl(mutex, __FILE__, __LINE__)
 
+#define qemu_rec_mutex_unlock(mutex) \
+qemu_rec_mutex_unlock_impl(mutex, __FILE__, __LINE__)
+
 static inline void (qemu_mutex_lock)(QemuMutex *mutex)
 {
 qemu_mutex_lock(mutex);
@@ -135,6 +138,11 @@ static inline int (qemu_rec_mutex_trylock)(QemuRecMutex 
*mutex)
 return qemu_rec_mutex_trylock(mutex);
 }
 
+static inline void (qemu_rec_mutex_unlock)(QemuRecMutex *mutex)
+{
+qemu_rec_mutex_unlock(mutex);
+}
+
 void qemu_cond_init(QemuCond *cond);
 void qemu_cond_destroy(QemuCond *cond);
 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 8e2b6653f5..d990826ed8 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -139,9 +139,9 @@ int qemu_rec_mutex_trylock_impl(QemuRecMutex *mutex, const 
char *file, int line)
 return qemu_mutex_trylock_impl(mutex, file, line);
 }
 
-void qemu_rec_mutex_unlock(QemuRecMutex *mutex)
+void qemu_rec_mutex_unlock_impl(QemuRecMutex *mutex, const char *file, int 
line)
 {
-qemu_mutex_unlock(mutex);
+qemu_mutex_unlock_impl(mutex, file, line);
 }
 
 void qemu_cond_init(QemuCond *cond)
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index cb5aa2018c..52eb19f351 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -105,7 +105,7 @@ int qemu_rec_mutex_trylock_impl(QemuRecMutex *mutex, const 
char *file, int line)
 return !TryEnterCriticalSection(>lock);
 }
 
-void qemu_rec_mutex_unlock(QemuRecMutex *mutex)
+void qemu_rec_mutex_unlock_impl(QemuRecMutex *mutex, const char *file, int 
line)
 {
 assert(mutex->initialized);
 LeaveCriticalSection(>lock);
-- 
2.25.1

[RFC] EventListener design for Python Async QMP library

2021-06-11 Thread John Snow

Hi all: I am continuing work on my asyncio-based QMP library for Python, 
which adds support for OOB executions, multiple simultaneous pending 
executions, and truly asynchronous event handling.


The library is what will fundamentally power the new qmp-shell that 
Niteesh is working on for his GSoC project this summer.


I would like to solicit feedback on one component of the design in 
particular: An interface I call the EventListener, which is an API 
designed to allow multiple concurrent coroutines to safely wait for and 
consume QMP events.


I have a document explaining their use on my GitLab fork. At the bottom 
of the document is a list of my own complaints about my design. If you'd 
like to take a peek at what I am cooking up and would like to offer 
feedback, now would be a pretty good time to do it before we get too far 
into development for the new qmp-shell.


The document is here: https://gitlab.com/jsnow/qemu/-/snippets/2133449

Any feedback, thoughts, etc are appreciated.

Thanks,
--js


Oh, and: The full library (Warning, with outdated docs, no tests, and 
quite a few TODO/FIXMEs scattered about) is here:


https://gitlab.com/jsnow/qemu/-/tree/python-async-qmp-aqmp/python/qemu/aqmp

But it's not in a state to ask for critique on the entire architecture 
just yet, there are still a few more suggestions from Stefan to 
implement from the last time I did so.

Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole

2021-06-11 Thread Eric Blake

On Sat, Jun 12, 2021 at 12:23:06AM +0300, Nir Soffer wrote:
> > Otherwise, you do have a point: "depth":1 in isolation is ambiguous
> > between "not allocated anywhere in this 1-element chain" and
> > "allocated at the first backing file in this chain of length 2 or
> > more".  At which point you can indeed use "qemu-img info" to determine
> > the backing chain depth.  How painful is that extra step?  Does it
> > justify the addition of a new optional "backing":true to any portion
> > of the file that was beyond the end of the chain (and omit that line
> > for all other regions, rather than printing "backing":false)?
> 
> Dealing with depth: N + 1 is not that painful, but also not great.
> 
> I think it is worth a little more effort, and it will save time in the long 
> term
> for users and for developers. Better APIs need simpler and shorter
> documentation and require less support.
> 
> I'm not sure about backing: false, maybe absent: true to match libnbd?

In the patch [1], I did "backing":true if the cluster was not found in
the chain, and omitted the bool altogether when the cluster is
present.  If we like the name "absent":true better than
"backing":true, that's an easy change.  The libnbd change for nbdinfo
to report 'absent' instead of 'unallocated' has not yet been released,
so we have some leeway on naming choices.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg03067.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole

2021-06-11 Thread Nir Soffer

On Fri, Jun 11, 2021 at 9:34 PM Eric Blake  wrote:
>
> On Fri, Jun 11, 2021 at 08:35:01PM +0300, Nir Soffer wrote:
> > On Fri, Jun 11, 2021 at 4:28 PM Eric Blake  wrote:
> > >
> > > On Fri, Jun 11, 2021 at 10:09:09AM +0200, Kevin Wolf wrote:
> > > > > Yes, that might work as well.  But we didn't previously document
> > > > > depth to be optional.  Removing something from output risks breaking
> > > > > more downstream tools that expect it to be non-optional, compared to
> > > > > providing a new value.
> > > >
> > > > A negative value isn't any less unexpected than a missing key. I don't
> > > > think any existing tool would be able to handle it. Encoding different
> > > > meanings in a single value isn't very QAPI-like either. Usually strings
> > > > that are parsed are the problem, but negative integers really isn't that
> > > > much different. I don't really like this solution.
> > > >
> > > > Leaving out the depth feels like a better suggestion to me.
> > > >
> > > > But anyway, this seems to only happen at the end of the backing chain.
> > > > So if the backing chain consistents of n images, why not report 'depth':
> > > > n + 1? So, in the above example, you would get 1. I think this has the
> > > > best chances of tools actually working correctly with the new output,
> > > > even though it's still not unlikely to break something.
> > >
> > > Ooh, I like that.  It is closer to reality - the file data really
> > > comes from the next depth, even if we have no filename at that depth.
> > > v2 of my patch coming up.
> >
> > How do you know the number of the layer? this info is not presented in
> > qemu-img map output.
...
> Otherwise, you do have a point: "depth":1 in isolation is ambiguous
> between "not allocated anywhere in this 1-element chain" and
> "allocated at the first backing file in this chain of length 2 or
> more".  At which point you can indeed use "qemu-img info" to determine
> the backing chain depth.  How painful is that extra step?  Does it
> justify the addition of a new optional "backing":true to any portion
> of the file that was beyond the end of the chain (and omit that line
> for all other regions, rather than printing "backing":false)?

Dealing with depth: N + 1 is not that painful, but also not great.

I think it is worth a little more effort, and it will save time in the long term
for users and for developers. Better APIs need simpler and shorter
documentation and require less support.

I'm not sure about backing: false, maybe absent: true to match libnbd?

Nir

Re: [PATCH 00/11] python: move /scripts/qmp/gemu-ga-client.py to qemu.qmp package

2021-06-11 Thread John Snow


On 6/4/21 11:55 AM, John Snow wrote:

Delint and type the qemu-ga-client and move it over into
/python/qemu/qmp/qemu_ga_client.py.

Based-on: <20210603003719.1321369-1-js...@redhat.com>
GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-package-qga
CI: https://gitlab.com/jsnow/qemu/-/pipelines/315122004

(Weakly based on "[PATCH v3 00/19] Python: move /scripts/qmp/qom* to
/python/qemu/qmp/qom*", for the purposes of avoiding context conflicts
in /python/setup.cfg, but is trivially rebased without it.)

Add a new console entrypoint to the package under "qemu-ga-client",
keeping the old name. (This makes a script named "qemu-ga-client"
available in your $PATH when you use pip to install the qemu.qmp
package.)

Add a forwarder shim back to /scripts/qmp/qemu-ga-client.py that
forwards to the new script, keeping functionality as it was in the old
location, at least for a little while. I intend to eventually deprecate
these forwarders, but not yet. (This allows you to use "qemu-ga-client"
from the scripts directory without needing to install the qemu.qmp
package.)

Now this script is protected against regressions against the qemu.qmp
package because it's part of it, and validated regularly by GitLab CI.

John Snow (11):
   scripts/qemu-ga-client: apply isort rules
   scripts/qemu-ga-client: apply (most) flake8 rules
   scripts/qemu-ga-client: Fix exception handling
   scripts/qemu-ga-client: replace deprecated optparse with argparse
   scripts/qemu-ga-client: add module docstring
   scripts/qemu-ga-client: apply (most) pylint rules
   python/qmp: Correct type of QMPReturnValue
   scripts/qemu-ga-client: add mypy type hints
   scripts/qemu-ga-client: move to python/qemu/qmp/qemu_ga_client.py
   python/qemu-ga-client: add entry point
   scripts/qemu-ga-client: Add forwarder shim

  python/qemu/qmp/__init__.py   |  25 ++-
  python/qemu/qmp/qemu_ga_client.py | 323 ++
  python/setup.cfg  |   1 +
  scripts/qmp/qemu-ga-client| 297 +--
  4 files changed, 341 insertions(+), 305 deletions(-)
  create mode 100644 python/qemu/qmp/qemu_ga_client.py



Thanks, preliminarily staged on my python branch:

https://gitlab.com/jsnow/qemu/-/commits/python

CI (covers this series and the scripts/qmp/qom* series):
https://gitlab.com/jsnow/qemu/-/pipelines/319584565

I intend to send a PR this coming Friday after staging the qmp-shell 
cleanup series.


--js

[PATCH v2] s390x/css: Selectively copy sense data to IRB

2021-06-11 Thread Eric Farman

The SCHIB.PMCW.CSENSE bit is used to determine whether the
IRB should be set up with sense data, but that bit only
indicates whether sense data is requested, not if it was
provided by the device. For virtual devices, this is fine.

For passthrough devices, hardware would present sense data
in IRB.ECW, but that field is only valid if IRB.SCSW.E and
IRB.ERW.S were also set.

Let's only build the sense data in the IRB if the first byte
of sense is nonzero (indicating it may have come from a virtual
device), or the IRB.SCSW.E bit is already set (indicating it
came from the hardware). That way, the guest driver can read
the sense data if valid, or respond with a Sense CCW to get
the sense data if it wants/needs.

Fixes: df1fe5bb4924 ("s390: Virtual channel subsystem support.")
Fixes: 334e76850bbb ("vfio/ccw: update sense data if a unit check is pending")
Signed-off-by: Eric Farman 
---

Notes:
v1->v2:
 - [MR] Add Fixes: tags
 - [CH] Reinstate the memcpy(sch->sense_data, irb.ecw) in vfio_ccw
 - [CH] Look at IRB.SCSW.E before copying sense into guest IRB

v1: 
https://lore.kernel.org/qemu-devel/20210610202011.391029-1-far...@linux.ibm.com/

 hw/s390x/css.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index bed46f5ec3..8935f948d5 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1659,9 +1659,15 @@ int css_do_tsch_get_irb(SubchDev *sch, IRB *target_irb, 
int *irb_len)
 } else {
 irb.esw[0] = 0x0080;
 }
-/* If a unit check is pending, copy sense data. */
+/*
+ * If a unit check is pending and concurrent sense
+ * is requested, copy the sense data if the sense
+ * data is plausibly valid.
+ */
 if ((schib->scsw.dstat & SCSW_DSTAT_UNIT_CHECK) &&
-(schib->pmcw.chars & PMCW_CHARS_MASK_CSENSE)) {
+(schib->pmcw.chars & PMCW_CHARS_MASK_CSENSE) &&
+((schib->scsw.flags & SCSW_FLAGS_MASK_ECTL) ||
+ (sch->sense_data[0] != 0))) {
 int i;
 
 irb.scsw.flags |= SCSW_FLAGS_MASK_ESWF | SCSW_FLAGS_MASK_ECTL;
-- 
2.25.1

Re: [PATCH 7/7] vhost-user-blk: Implement reconnection during realize

2021-06-11 Thread Raphael Norwitz

On Wed, Jun 09, 2021 at 05:46:58PM +0200, Kevin Wolf wrote:
> Commit dabefdd6 removed code that was supposed to try reconnecting
> during .realize(), but actually just crashed and had several design
> problems.
> 
> This adds the feature back without the crash in simple cases while also
> fixing some design problems: Reconnection is now only tried if there was
> a problem with the connection and not an error related to the content
> (which would fail again the same way in the next attempt). Reconnection
> is limited to three attempts (four with the initial attempt) so that we
> won't end up in an infinite loop if a problem is permanent. If the
> backend restarts three times in the very short time window of device
> initialisation, we have bigger problems and erroring out is the right
> course of action.
> 
> In the case that a connection error occurs and we reconnect, the error
> message is printed using error_report_err(), but otherwise ignored.
> 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index e49d2e4c83..f75a42bc62 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -455,8 +455,10 @@ static int vhost_user_blk_realize_connect(VHostUserBlk 
> *s, Error **errp)
>  
>  static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  {
> +ERRP_GUARD();
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +int retries;
>  int i, ret;
>  
>  if (!s->chardev.chr) {
> @@ -498,7 +500,17 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  s->inflight = g_new0(struct vhost_inflight, 1);
>  s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
>  
> -ret = vhost_user_blk_realize_connect(s, errp);
> +retries = 3;
> +assert(!*errp);
> +do {
> +if (*errp) {
> +error_prepend(errp, "Reconnecting after error: ");
> +error_report_err(*errp);
> +*errp = NULL;
> +}
> +ret = vhost_user_blk_realize_connect(s, errp);
> +} while (ret == -EPROTO && retries--);
> +
>  if (ret < 0) {
>  goto virtio_err;
>  }
> -- 
> 2.30.2
>

Re: [PATCH 6/7] vhost-user-blk: Factor out vhost_user_blk_realize_connect()

2021-06-11 Thread Raphael Norwitz

On Wed, Jun 09, 2021 at 05:46:57PM +0200, Kevin Wolf wrote:
> This function is the part that we will want to retry if the connection
> is lost during initialisation, so factor it out to keep the following
> patch simpler.
> 
> The error path for vhost_dev_get_config() forgot disconnecting the
> chardev, add this while touching the code.
> 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 48 ++-
>  1 file changed, 32 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 3770f715da..e49d2e4c83 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -423,6 +423,36 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>  }
>  }
>  
> +static int vhost_user_blk_realize_connect(VHostUserBlk *s, Error **errp)
> +{
> +DeviceState *dev = >parent_obj.parent_obj;
> +int ret;
> +
> +s->connected = false;
> +
> +ret = qemu_chr_fe_wait_connected(>chardev, errp);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +ret = vhost_user_blk_connect(dev, errp);
> +if (ret < 0) {
> +qemu_chr_fe_disconnect(>chardev);
> +return ret;
> +}
> +assert(s->connected);
> +
> +ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
> +   sizeof(struct virtio_blk_config), errp);
> +if (ret < 0) {
> +qemu_chr_fe_disconnect(>chardev);
> +vhost_dev_cleanup(>dev);
> +return ret;
> +}
> +
> +return 0;
> +}
> +
>  static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  {
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -467,22 +497,10 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  
>  s->inflight = g_new0(struct vhost_inflight, 1);
>  s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
> -s->connected = false;
> -
> -if (qemu_chr_fe_wait_connected(>chardev, errp) < 0) {
> -goto virtio_err;
> -}
>  
> -if (vhost_user_blk_connect(dev, errp) < 0) {
> -qemu_chr_fe_disconnect(>chardev);
> -goto virtio_err;
> -}
> -assert(s->connected);
> -
> -ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
> -   sizeof(struct virtio_blk_config), errp);
> +ret = vhost_user_blk_realize_connect(s, errp);
>  if (ret < 0) {
> -goto vhost_err;
> +goto virtio_err;
>  }
>  
>  /* we're fully initialized, now we can operate, so add the handler */
> @@ -491,8 +509,6 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>   NULL, true);
>  return;
>  
> -vhost_err:
> -vhost_dev_cleanup(>dev);
>  virtio_err:
>  g_free(s->vhost_vqs);
>  s->vhost_vqs = NULL;
> -- 
> 2.30.2
>

Re: [PATCH 5/7] vhost: Distinguish errors in vhost_dev_get_config()

2021-06-11 Thread Raphael Norwitz

On Wed, Jun 09, 2021 at 05:46:56PM +0200, Kevin Wolf wrote:
> Instead of just returning 0/-1 and letting the caller make up a
> meaningless error message, add an Error parameter to allow reporting the
> real error and switch to 0/-errno so that different kind of errors can
> be distinguished in the caller.
> 
> Signed-off-by: Kevin Wolf 

Just one commmit message suggestion.

Reviewed-by: Raphael Norwitz 

> ---
>  include/hw/virtio/vhost-backend.h |  2 +-
>  include/hw/virtio/vhost.h |  4 ++--
>  hw/block/vhost-user-blk.c |  9 +
>  hw/display/vhost-user-gpu.c   |  6 --
>  hw/input/vhost-user-input.c   |  6 --
>  hw/net/vhost_net.c|  2 +-
>  hw/virtio/vhost-user-vsock.c  |  9 +
>  hw/virtio/vhost-user.c| 24 
>  hw/virtio/vhost-vdpa.c|  2 +-
>  hw/virtio/vhost.c | 14 +++---
>  10 files changed, 46 insertions(+), 32 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost-backend.h 
> b/include/hw/virtio/vhost-backend.h
> index 728ebb0ed9..8475c5a29d 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -98,7 +98,7 @@ typedef int (*vhost_set_config_op)(struct vhost_dev *dev, 
> const uint8_t *data,
> uint32_t offset, uint32_t size,
> uint32_t flags);
>  typedef int (*vhost_get_config_op)(struct vhost_dev *dev, uint8_t *config,
> -   uint32_t config_len);
> +   uint32_t config_len, Error **errp);
>  
>  typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
>void *session_info,
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 2d7aaad67b..045d0fd9f2 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -130,8 +130,8 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
>struct vhost_vring_file *file);
>  
>  int vhost_device_iotlb_miss(struct vhost_dev *dev, uint64_t iova, int write);
> -int vhost_dev_get_config(struct vhost_dev *dev, uint8_t *config,
> - uint32_t config_len);
> +int vhost_dev_get_config(struct vhost_dev *hdev, uint8_t *config,
> + uint32_t config_len, Error **errp);
>  int vhost_dev_set_config(struct vhost_dev *dev, const uint8_t *data,
>   uint32_t offset, uint32_t size, uint32_t flags);
>  /* notifier callback in case vhost device config space changed
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index e9382e152a..3770f715da 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -91,11 +91,13 @@ static int vhost_user_blk_handle_config_change(struct 
> vhost_dev *dev)
>  int ret;
>  struct virtio_blk_config blkcfg;
>  VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
> +Error *local_err = NULL;
>  
>  ret = vhost_dev_get_config(dev, (uint8_t *),
> -   sizeof(struct virtio_blk_config));
> +   sizeof(struct virtio_blk_config),
> +   _err);
>  if (ret < 0) {
> -error_report("get config space failed");
> +error_report_err(local_err);
>  return -1;
>  }
>  
> @@ -478,9 +480,8 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  assert(s->connected);
>  
>  ret = vhost_dev_get_config(>dev, (uint8_t *)>blkcfg,
> -   sizeof(struct virtio_blk_config));
> +   sizeof(struct virtio_blk_config), errp);
>  if (ret < 0) {
> -error_setg(errp, "vhost-user-blk: get block config failed");
>  goto vhost_err;
>  }
>  
> diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
> index 6cdaa1c73b..389199e6ca 100644
> --- a/hw/display/vhost-user-gpu.c
> +++ b/hw/display/vhost-user-gpu.c
> @@ -415,14 +415,16 @@ vhost_user_gpu_get_config(VirtIODevice *vdev, uint8_t 
> *config_data)
>  VirtIOGPUBase *b = VIRTIO_GPU_BASE(vdev);
>  struct virtio_gpu_config *vgconfig =
>  (struct virtio_gpu_config *)config_data;
> +Error *local_err = NULL;
>  int ret;
>  
>  memset(config_data, 0, sizeof(struct virtio_gpu_config));
>  
>  ret = vhost_dev_get_config(>vhost->dev,
> -   config_data, sizeof(struct 
> virtio_gpu_config));
> +   config_data, sizeof(struct virtio_gpu_config),
> +   _err);
>  if (ret) {
> -error_report("vhost-user-gpu: get device config space failed");
> +error_report_err(local_err);
>  return;
>  }
>  
> diff --git a/hw/input/vhost-user-input.c b/hw/input/vhost-user-input.c
> index 63984a8ba7..273e96a7b1 100644
> ---

Re: [Virtio-fs] [PATCH v2 7/9] virtiofsd: Add inodes_by_handle hash table

2021-06-11 Thread Vivek Goyal

On Wed, Jun 09, 2021 at 05:55:49PM +0200, Max Reitz wrote:
> Currently, lo_inode.fhandle is always NULL and so always keep an O_PATH
> FD in lo_inode.fd.  Therefore, when the respective inode is unlinked,
> its inode ID will remain in use until we drop our lo_inode (and
> lo_inode_put() thus closes the FD).  Therefore, lo_find() can safely use
> the inode ID as an lo_inode key, because any inode with an inode ID we
> find in lo_data.inodes (on the same filesystem) must be the exact same
> file.
> 
> This will change when we start setting lo_inode.fhandle so we do not
> have to keep an O_PATH FD open.  Then, unlinking such an inode will
> immediately remove it, so its ID can then be reused by newly created
> files, even while the lo_inode object is still there[1].
> 
> So creating a new file can then reuse the old file's inode ID, and
> looking up the new file would lead to us finding the old file's
> lo_inode, which is not ideal.
> 
> Luckily, just as file handles cause this problem, they also solve it:  A
> file handle contains a generation ID, which changes when an inode ID is
> reused, so the new file can be distinguished from the old one.  So all
> we need to do is to add a second map besides lo_data.inodes that maps
> file handles to lo_inodes, namely lo_data.inodes_by_handle.  For
> clarity, lo_data.inodes is renamed to lo_data.inodes_by_ids.
> 
> Unfortunately, we cannot rely on being able to generate file handles
> every time.

Hi Max, 

What are the cases where we can not rely being able to generate file
handles?

> Therefore, we still enter every lo_inode object into
> inodes_by_ids, but having an entry in inodes_by_handle is optional.  A
> potential inodes_by_handle entry then has precedence, the inodes_by_ids
> entry is just a fallback.

If we have to keep inodes_by_ids around, then can we just add fhandle
to the lo_key. That way we can manage with single hash table and still
be able to detect if inode ID has been reused.

Thanks
Vivek

> 
> Note that we do not generate lo_fhandle objects yet, and so we also do
> not enter anything into the inodes_by_handle map yet.  Also, all lookups
> skip that map.  We might manually create file handles with some code
> that is immediately removed by the next patch again, but that would
> break the assumption in lo_find() that every lo_inode with a non-NULL
> .fhandle must have an entry in inodes_by_handle and vice versa.  So we
> leave actually using the inodes_by_handle map for the next patch.
> 
> [1] If some application in the guest still has the file open, there is
> going to be a corresponding FD mapping in lo_data.fd_map.  In such a
> case, the inode will only go away once every application in the guest
> has closed it.  The problem described only applies to cases where the
> guest does not have the file open, and it is just in the dentry cache,
> basically.
> 
> Signed-off-by: Max Reitz 
> Reviewed-by: Connor Kuehl 
> ---
>  tools/virtiofsd/passthrough_ll.c | 80 +---
>  1 file changed, 64 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index e665575401..793d2c333e 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -179,7 +179,8 @@ struct lo_data {
>  int announce_submounts;
>  bool use_statx;
>  struct lo_inode root;
> -GHashTable *inodes; /* protected by lo->mutex */
> +GHashTable *inodes_by_ids; /* protected by lo->mutex */
> +GHashTable *inodes_by_handle; /* protected by lo->mutex */
>  struct lo_map ino_map; /* protected by lo->mutex */
>  struct lo_map dirp_map; /* protected by lo->mutex */
>  struct lo_map fd_map; /* protected by lo->mutex */
> @@ -257,8 +258,9 @@ static struct {
>  /* That we loaded cap-ng in the current thread from the saved */
>  static __thread bool cap_loaded = 0;
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -uint64_t mnt_id);
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +const struct lo_fhandle *fhandle,
> +struct stat *st, uint64_t mnt_id);
>  static int xattr_map_client(const struct lo_data *lo, const char 
> *client_name,
>  char **out_name);
>  
> @@ -1032,18 +1034,39 @@ out_err:
>  fuse_reply_err(req, saverr);
>  }
>  
> -static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st,
> -uint64_t mnt_id)
> +static struct lo_inode *lo_find(struct lo_data *lo,
> +const struct lo_fhandle *fhandle,
> +struct stat *st, uint64_t mnt_id)
>  {
> -struct lo_inode *p;
> -struct lo_key key = {
> +struct lo_inode *p = NULL;
> +struct lo_key ids_key = {
>  .ino = st->st_ino,
>  .dev = st->st_dev,
>  .mnt_id = mnt_id,
>  };
>  
>

Re: [PATCH 4/7] vhost-user-blk: Add Error parameter to vhost_user_blk_start()

2021-06-11 Thread Raphael Norwitz

On Wed, Jun 09, 2021 at 05:46:55PM +0200, Kevin Wolf wrote:
> Instead of letting the caller make up a meaningless error message, add
> an Error parameter to allow reporting the real error.
> 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 31 +++
>  1 file changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 0cb56baefb..e9382e152a 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -113,7 +113,7 @@ const VhostDevConfigOps blk_ops = {
>  .vhost_dev_config_notifier = vhost_user_blk_handle_config_change,
>  };
>  
> -static int vhost_user_blk_start(VirtIODevice *vdev)
> +static int vhost_user_blk_start(VirtIODevice *vdev, Error **errp)
>  {
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> @@ -121,19 +121,19 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
>  int i, ret;
>  
>  if (!k->set_guest_notifiers) {
> -error_report("binding does not support guest notifiers");
> +error_setg(errp, "binding does not support guest notifiers");
>  return -ENOSYS;
>  }
>  
>  ret = vhost_dev_enable_notifiers(>dev, vdev);
>  if (ret < 0) {
> -error_report("Error enabling host notifiers: %d", -ret);
> +error_setg_errno(errp, -ret, "Error enabling host notifiers");
>  return ret;
>  }
>  
>  ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
>  if (ret < 0) {
> -error_report("Error binding guest notifier: %d", -ret);
> +error_setg_errno(errp, -ret, "Error binding guest notifier");
>  goto err_host_notifiers;
>  }
>  
> @@ -141,27 +141,27 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
>  
>  ret = vhost_dev_prepare_inflight(>dev, vdev);
>  if (ret < 0) {
> -error_report("Error set inflight format: %d", -ret);
> +error_setg_errno(errp, -ret, "Error setting inflight format");
>  goto err_guest_notifiers;
>  }
>  
>  if (!s->inflight->addr) {
>  ret = vhost_dev_get_inflight(>dev, s->queue_size, s->inflight);
>  if (ret < 0) {
> -error_report("Error get inflight: %d", -ret);
> +error_setg_errno(errp, -ret, "Error getting inflight");
>  goto err_guest_notifiers;
>  }
>  }
>  
>  ret = vhost_dev_set_inflight(>dev, s->inflight);
>  if (ret < 0) {
> -error_report("Error set inflight: %d", -ret);
> +error_setg_errno(errp, -ret, "Error setting inflight");
>  goto err_guest_notifiers;
>  }
>  
>  ret = vhost_dev_start(>dev, vdev);
>  if (ret < 0) {
> -error_report("Error starting vhost: %d", -ret);
> +error_setg_errno(errp, -ret, "Error starting vhost");
>  goto err_guest_notifiers;
>  }
>  s->started_vu = true;
> @@ -214,6 +214,7 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, 
> uint8_t status)
>  {
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  bool should_start = virtio_device_started(vdev, status);
> +Error *local_err = NULL;
>  int ret;
>  
>  if (!vdev->vm_running) {
> @@ -229,10 +230,9 @@ static void vhost_user_blk_set_status(VirtIODevice 
> *vdev, uint8_t status)
>  }
>  
>  if (should_start) {
> -ret = vhost_user_blk_start(vdev);
> +ret = vhost_user_blk_start(vdev, _err);
>  if (ret < 0) {
> -error_report("vhost-user-blk: vhost start failed: %s",
> - strerror(-ret));
> +error_reportf_err(local_err, "vhost-user-blk: vhost start 
> failed: ");
>  qemu_chr_fe_disconnect(>chardev);
>  }
>  } else {
> @@ -270,6 +270,7 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
> *vdev,
>  static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>  {
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +Error *local_err = NULL;
>  int i, ret;
>  
>  if (!vdev->start_on_kick) {
> @@ -287,10 +288,9 @@ static void vhost_user_blk_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>  /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>   * vhost here instead of waiting for .set_status().
>   */
> -ret = vhost_user_blk_start(vdev);
> +ret = vhost_user_blk_start(vdev, _err);
>  if (ret < 0) {
> -error_report("vhost-user-blk: vhost start failed: %s",
> - strerror(-ret));
> +error_reportf_err(local_err, "vhost-user-blk: vhost start failed: ");
>  qemu_chr_fe_disconnect(>chardev);
>  return;
>  }
> @@ -340,9 +340,8 @@ static int vhost_user_blk_connect(DeviceState *dev, Error 
> **errp)
>  
>  /* restore vhost state */
>  if (virtio_device_started(vdev, vdev->status)) {
> -ret = vhost_user_blk_start(vdev);
> +ret =

Re: [PATCH 3/7] vhost: Return 0/-errno in vhost_dev_init()

2021-06-11 Thread Raphael Norwitz

On Wed, Jun 09, 2021 at 05:46:54PM +0200, Kevin Wolf wrote:
> Instead of just returning 0/-1 and letting the caller make up a
> meaningless error message, switch to 0/-errno so that different kinds of
> errors can be distinguished in the caller.
> 
> This involves changing a few more callbacks in VhostOps to return
> 0/-errno: .vhost_set_owner(), .vhost_get_features() and
> .vhost_virtqueue_set_busyloop_timeout(). The implementations of these
> functions are trivial as they generally just send a message to the
> backend.
> 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/virtio/vhost-backend.c |  4 +++-
>  hw/virtio/vhost-user.c| 10 +++---
>  hw/virtio/vhost-vdpa.c|  4 +++-
>  hw/virtio/vhost.c |  8 
>  4 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index f4f71cf58a..594d770b75 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -24,10 +24,12 @@ static int vhost_kernel_call(struct vhost_dev *dev, 
> unsigned long int request,
>   void *arg)
>  {
>  int fd = (uintptr_t) dev->opaque;
> +int ret;
>  
>  assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_KERNEL);
>  
> -return ioctl(fd, request, arg);
> +ret = ioctl(fd, request, arg);
> +return ret < 0 ? -errno : ret;
>  }
>  
>  static int vhost_kernel_init(struct vhost_dev *dev, void *opaque, Error 
> **errp)
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 024cb201bb..889559d86a 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1353,7 +1353,11 @@ static int vhost_user_get_u64(struct vhost_dev *dev, 
> int request, uint64_t *u64)
>  
>  static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
>  {
> -return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
> +if (vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features) < 0) {
> +return -EPROTO;
> +}
> +
> +return 0;
>  }
>  
>  static int vhost_user_set_owner(struct vhost_dev *dev)
> @@ -1364,7 +1368,7 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
>  };
>  
>  if (vhost_user_write(dev, , NULL, 0) < 0) {
> -return -1;
> +return -EPROTO;
>  }
>  
>  return 0;
> @@ -1872,7 +1876,7 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque,
>  
>  err = vhost_user_get_features(dev, );
>  if (err < 0) {
> -return -EPROTO;
> +return err;
>  }
>  
>  if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) {
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index c2aadb57cb..71897c1a01 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -253,10 +253,12 @@ static int vhost_vdpa_call(struct vhost_dev *dev, 
> unsigned long int request,
>  {
>  struct vhost_vdpa *v = dev->opaque;
>  int fd = v->device_fd;
> +int ret;
>  
>  assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>  
> -return ioctl(fd, request, arg);
> +ret = ioctl(fd, request, arg);
> +return ret < 0 ? -errno : ret;
>  }
>  
>  static void vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index fd13135706..c7f9d8bb06 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1309,13 +1309,13 @@ int vhost_dev_init(struct vhost_dev *hdev, void 
> *opaque,
>  
>  r = hdev->vhost_ops->vhost_set_owner(hdev);
>  if (r < 0) {
> -error_setg(errp, "vhost_set_owner failed");
> +error_setg_errno(errp, -r, "vhost_set_owner failed");
>  goto fail;
>  }
>  
>  r = hdev->vhost_ops->vhost_get_features(hdev, );
>  if (r < 0) {
> -error_setg(errp, "vhost_get_features failed");
> +error_setg_errno(errp, -r, "vhost_get_features failed");
>  goto fail;
>  }
>  
> @@ -1332,7 +1332,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>  r = vhost_virtqueue_set_busyloop_timeout(hdev, hdev->vq_index + 
> i,
>   busyloop_timeout);
>  if (r < 0) {
> -error_setg(errp, "Failed to set busyloop timeout");
> +error_setg_errno(errp, -r, "Failed to set busyloop timeout");
>  goto fail_busyloop;
>  }
>  }
> @@ -1391,7 +1391,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>  if (used_memslots > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) 
> {
>  error_setg(errp, "vhost backend memory slots limit is less"
> " than current number of present memory slots");
> -r = -1;
> +r = -EINVAL;
>  goto fail_busyloop;
>  }
>  
> -- 
> 2.30.2
>

Re: QEMU | USB Ethernet device (RNDIS) does not work on several tested operating systems (#198)

2021-06-11 Thread Paul Zimmerman

I will take a look. Might take me a couple of days to get to it though.

On Fri, Jun 11, 2021 at 4:10 AM Philippe Mathieu-Daudé  wrote:
>
> Cc'ing Paul Zimmerman for the hcd-dwc2 model.
>
> On Thu, Jun 10, 2021 at 2:10 PM programmingkidx (@programmingkidx) wrote:
> > 虎游 commented:
> >
> > The same problem occurred in my Raspberry Pi 3b+ emulation.
> >
> > Host: Debian 10 x86_64. Guest rootfs: 
> > https://download.canaan-creative.com/avalon921/openwrt/latest/rpi3-modelb/openwrt-brcm2708-bcm2710-rpi-3-ext4-sdcard.img.gz
> >  Guest kernel & dtd: 
> > https://github.com/dhruvvyas90/qemu-rpi-kernel/tree/master/native-emulation
> >
> > Command:
> >
> > /usr/local/bin/qemu-system-aarch64 -M raspi3 -append "rw earlyprintk 
> > loglevel=8 console=ttyAMA0,115200 dwc_otg.lpm_enable=0 root=/dev/mmcblk0p2 
> > rootdelay=1" -dtb 
> > ../qemu-rpi-kernel/native-emulation/dtbs/bcm2710-rpi-3-b-plus.dtb -drive 
> > file=avalon.img,format=raw,if=sd,id=root -kernel 
> > ../qemu-rpi-kernel/native-emulation/5.4.51\ kernels/kernel8.img -m 1G -smp 
> > 4 -usb -device usb-mouse -device usb-kbd -nographic -no-reboot -device 
> > usb-net,netdev=eth0 -netdev 
> > tap,id=eth0,ifname=avalon,script=no,downscript=no Output:
> >
> > [ 0.00] Booting Linux on physical CPU 0x00 [0x410fd034] [ 
> > 0.00] Linux version 5.4.51-v8+ (dom@buildbot) (gcc version 5.4.0 
> > 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9)) #1333 SMP PREEMPT Mon Aug 
> > 10 16:58:35 BST 2020 [ 0.00] Machine model: Raspberry Pi 3 Model B+
> >
> > ...
> >
> > usbnet: failed control transaction: request 0x8006 value 0x600 index 0x0 
> > length 0xa usbnet: failed control transaction: request 0x8006 value 0x600 
> > index 0x0 length 0xa usbnet: failed control transaction: request 0x8006 
> > value 0x600 index 0x0 length 0xa [ 3.688532] usb 1-1.3: New USB device 
> > found, idVendor=0525, idProduct=a4a2, bcdDevice= 0.00 [ 3.688739] usb 
> > 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=10 [ 
> > 3.689454] usb 1-1.3: Product: RNDIS/QEMU USB Network Device [ 3.689563] usb 
> > 1-1.3: Manufacturer: QEMU [ 3.689639] usb 1-1.3: SerialNumber: 1-1.3
> >
> > ...
> >
> > root@OpenWrt:/# ifconfig -a lo Link encap:Local Loopback inet 
> > addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX 
> > packets:156 errors:0 dropped:0 overruns:0 frame:0 TX packets:156 errors:0 
> > dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9836 
> > (9.6 KiB) TX bytes:9836 (9.6 KiB)
> >
> > root@OpenWrt:/# ip a 1: lo:  mtu 65536 qdisc noqueue 
> > state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 
> > 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever 
> > preferred_lft forever Finally, there is no nic available in guest OS.

Re: [PATCH V3 12/22] vfio-pci: cpr part 1

2021-06-11 Thread Steven Sistare

On 6/11/2021 2:15 PM, Steven Sistare wrote:
> On 5/24/2021 2:29 PM, Steven Sistare wrote:
>> On 5/21/2021 6:24 PM, Alex Williamson wrote:> On Fri,  7 May 2021 05:25:10 
>> -0700
>>> Steve Sistare  wrote:
>>>
 [...]
 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 7a4fb6c..f7ac9f03 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -29,6 +29,8 @@
  #include "hw/qdev-properties.h"
  #include "hw/qdev-properties-system.h"
  #include "migration/vmstate.h"
 +#include "migration/cpr.h"
 +#include "qemu/env.h"
  #include "qemu/error-report.h"
  #include "qemu/main-loop.h"
  #include "qemu/module.h"
 @@ -1612,6 +1614,14 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice 
 *vdev, bool enabled)
  }
  }
  
 +static void vfio_config_sync(VFIOPCIDevice *vdev, uint32_t offset, size_t 
 len)
 +{
 +if (pread(vdev->vbasedev.fd, vdev->pdev.config + offset, len,
 +  vdev->config_offset + offset) != len) {
 +error_report("vfio_config_sync pread failed");
 +}
 +}
 +
  static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
  {
  VFIOBAR *bar = >bars[nr];
 @@ -1652,6 +1662,7 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
  {
  VFIOBAR *bar = >bars[nr];
 +PCIDevice *pdev = >pdev;
  char *name;
  
  if (!bar->size) {
 @@ -1672,7 +1683,10 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, 
 int nr)
  }
  }
  
 -pci_register_bar(>pdev, nr, bar->type, bar->mr);
 +pci_register_bar(pdev, nr, bar->type, bar->mr);
 +if (pdev->reused) {
 +vfio_config_sync(vdev, pci_bar(pdev, nr), 8);
>>>
>>> Assuming 64-bit BARs?  This might be the first case where we actually
>>> rely on the kernel BAR values, IIRC we usually use QEMU's emulation.
>>
>> No asssumptions.  vfio_config_sync() preads a piece of config space using a 
>> single 
>> system call, copying directly to the qemu buffer, not looking at words or 
>> calling any
>> action functions.
>>
>> [...] 
 @@ -3098,6 +3115,11 @@ static void vfio_realize(PCIDevice *pdev, Error 
 **errp)
  vfio_register_req_notifier(vdev);
  vfio_setup_resetfn_quirk(vdev);
  
 +vfio_config_sync(vdev, pdev->msix_cap + PCI_MSIX_FLAGS, 2);
 +if (pdev->reused) {
 +pci_update_mappings(pdev);
 +}
 +
>>>
>>> Are the msix flag sync and mapping update related?  They seem
>>> independent to me.  A blank line and comment would be helpful.
>>
>> OK.
>>
>>> I expect we'd need to call msix_enabled() somewhere for the msix flag
>>> sync to be effective.
>>
>> Yes, vfio_pci_post_load in cpr part 2 calls msix_enabled.
>>
>>> Is there an assumption here of msi-x only support or is it not needed
>>> for msi or intx?
>>
>> The code supports msi-x and msi.  However, I should only be sync'ing 
>> PCI_MSIX_FLAGS
>> if pdev->cap_present & QEMU_PCI_CAP_MSIX.  And, I am missing a sync for 
>> PCI_MSI_FLAGS.
>> I'll fix that.
> 
> Hi Alex, FYI, I am making more changes here.  The calls to vfio_config_sync 
> fix pdev->config[]
> words that are initialized during vfio_realize(), by pread'ing from the live 
> kernel config.
> However, it makes more sense to suppress the undesired re-initialization, 
> rather than undo
> the damage later.  Thus I will add a few more 'if (!pdev->reused)' guards in 
> msix and pci bar
> init functions, and delete vfio_config_sync.
> 
> Most of the config is preserved in the kernel across restart.  However, the 
> bits that are
> purely emulated (indicated by the emulated_config_bits mask) may be rejected 
> when they 
> are written through to the kernel, and thus are currently lost on restart.  I 
> need to save 
> pdev->config[] in the vmstate file, and in vfio_pci_post_load, merge it with 
> the kernel 
> config using emulated_config_bits.
> 
> Sound sane?

Furthermore, there is no need to check reused and suppress initialization of 
msix and pci bar, 
as the vmstate loader fixes them up.

- Steve

Re: [PATCH 2/7] vhost: Distinguish errors in vhost_backend_init()

2021-06-11 Thread Raphael Norwitz

On Wed, Jun 09, 2021 at 05:46:53PM +0200, Kevin Wolf wrote:
> Instead of just returning 0/-1 and letting the caller make up a
> meaningless error message, add an Error parameter to allow reporting the
> real error and switch to 0/-errno so that different kind of errors can
> be distinguished in the caller.
> 
> Specifically, in vhost-user, EPROTO is used for all errors that relate
> to the connection itself, whereas other error codes are used for errors
> relating to the content of the connection. This will allow us later to
> automatically reconnect when the connection goes away, without ending up
> in an endless loop if it's a permanent error in the configuration.
> 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Raphael Norwitz 

> ---
>  include/hw/virtio/vhost-backend.h |  3 ++-
>  hw/virtio/vhost-backend.c |  2 +-
>  hw/virtio/vhost-user.c| 41 ---
>  hw/virtio/vhost-vdpa.c|  2 +-
>  hw/virtio/vhost.c | 13 +-
>  5 files changed, 32 insertions(+), 29 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost-backend.h 
> b/include/hw/virtio/vhost-backend.h
> index 8a6f8e2a7a..728ebb0ed9 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -37,7 +37,8 @@ struct vhost_scsi_target;
>  struct vhost_iotlb_msg;
>  struct vhost_virtqueue;
>  
> -typedef int (*vhost_backend_init)(struct vhost_dev *dev, void *opaque);
> +typedef int (*vhost_backend_init)(struct vhost_dev *dev, void *opaque,
> +  Error **errp);
>  typedef int (*vhost_backend_cleanup)(struct vhost_dev *dev);
>  typedef int (*vhost_backend_memslots_limit)(struct vhost_dev *dev);
>  
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 31b33bde37..f4f71cf58a 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -30,7 +30,7 @@ static int vhost_kernel_call(struct vhost_dev *dev, 
> unsigned long int request,
>  return ioctl(fd, request, arg);
>  }
>  
> -static int vhost_kernel_init(struct vhost_dev *dev, void *opaque)
> +static int vhost_kernel_init(struct vhost_dev *dev, void *opaque, Error 
> **errp)
>  {
>  assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_KERNEL);
>  
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index ee57abe045..024cb201bb 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1856,7 +1856,8 @@ static int 
> vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
>  return 0;
>  }
>  
> -static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
> +static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque,
> +   Error **errp)
>  {
>  uint64_t features, protocol_features, ram_slots;
>  struct vhost_user *u;
> @@ -1871,7 +1872,7 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque)
>  
>  err = vhost_user_get_features(dev, );
>  if (err < 0) {
> -return err;
> +return -EPROTO;
>  }
>  
>  if (virtio_has_feature(features, VHOST_USER_F_PROTOCOL_FEATURES)) {
> @@ -1880,7 +1881,7 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque)
>  err = vhost_user_get_u64(dev, VHOST_USER_GET_PROTOCOL_FEATURES,
>   _features);
>  if (err < 0) {
> -return err;
> +return -EPROTO;
>  }
>  
>  dev->protocol_features =
> @@ -1891,14 +1892,14 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque)
>  dev->protocol_features &= ~(1ULL << 
> VHOST_USER_PROTOCOL_F_CONFIG);
>  } else if (!(protocol_features &
>  (1ULL << VHOST_USER_PROTOCOL_F_CONFIG))) {
> -error_report("Device expects VHOST_USER_PROTOCOL_F_CONFIG "
> -"but backend does not support it.");
> -return -1;
> +error_setg(errp, "Device expects VHOST_USER_PROTOCOL_F_CONFIG "
> +   "but backend does not support it.");
> +return -EINVAL;
>  }
>  
>  err = vhost_user_set_protocol_features(dev, dev->protocol_features);
>  if (err < 0) {
> -return err;
> +return -EPROTO;
>  }
>  
>  /* query the max queues we support if backend supports Multiple 
> Queue */
> @@ -1906,12 +1907,12 @@ static int vhost_user_backend_init(struct vhost_dev 
> *dev, void *opaque)
>  err = vhost_user_get_u64(dev, VHOST_USER_GET_QUEUE_NUM,
>   >max_queues);
>  if (err < 0) {
> -return err;
> +return -EPROTO;
>  }
>  }
>  if (dev->num_queues && dev->max_queues < dev->num_queues) {
> -error_report("The maximum number of queues supported by the "
> - "backend is %" PRIu64,

Re: [PATCH v2 0/9] virtiofsd: Allow using file handles instead of O_PATH FDs

2021-06-11 Thread Vivek Goyal

On Wed, Jun 09, 2021 at 05:55:42PM +0200, Max Reitz wrote:
> Hi,
> 
> v1 cover letter for an overview:
> https://listman.redhat.com/archives/virtio-fs/2021-June/msg00033.html

Hi Max,

What's the impact of these patches on performance? Just trying to 
get some idea what to expect. Performance remains more or less
same or we expect a hit.

Thanks
Vivek

> 
> In v2, I (tried to) fix the bug Dave found, which is that
> get_file_handle() indiscriminately opened the given dirfd/name
> combination to get an O_RDONLY fd without checking whether we’re
> actually allowed to open dirfd/name; namely, we don’t allow ourselves to
> open files that aren’t regular files or directories.
> 
> So that openat(.., O_RDONLY) is changed to an openat(..., O_PATH), and
> then check the file type with the statx() we’re doing anyway.  If the
> file is OK to open, we reopen it O_RDONLY with the help of
> /proc/self/fd, like we always do.
> 
> (This only affects patch 8.)
> 
> 
> git-backport-diff against v1:
> 
> Key:
> [] : patches are identical
> [] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, 
> respectively
> 
> 001/9:[] [--] 'virtiofsd: Add TempFd structure'
> 002/9:[] [--] 'virtiofsd: Use lo_inode_open() instead of openat()'
> 003/9:[] [--] 'virtiofsd: Add lo_inode_fd() helper'
> 004/9:[] [--] 'virtiofsd: Let lo_fd() return a TempFd'
> 005/9:[] [--] 'virtiofsd: Let lo_inode_open() return a TempFd'
> 006/9:[] [--] 'virtiofsd: Add lo_inode.fhandle'
> 007/9:[] [--] 'virtiofsd: Add inodes_by_handle hash table'
> 008/9:[0045] [FC] 'virtiofsd: Optionally fill lo_inode.fhandle'
> 009/9:[] [--] 'virtiofsd: Add lazy lo_do_find()'
> 
> 
> Max Reitz (9):
>   virtiofsd: Add TempFd structure
>   virtiofsd: Use lo_inode_open() instead of openat()
>   virtiofsd: Add lo_inode_fd() helper
>   virtiofsd: Let lo_fd() return a TempFd
>   virtiofsd: Let lo_inode_open() return a TempFd
>   virtiofsd: Add lo_inode.fhandle
>   virtiofsd: Add inodes_by_handle hash table
>   virtiofsd: Optionally fill lo_inode.fhandle
>   virtiofsd: Add lazy lo_do_find()
> 
>  tools/virtiofsd/helper.c  |   3 +
>  tools/virtiofsd/passthrough_ll.c  | 836 +-
>  tools/virtiofsd/passthrough_seccomp.c |   2 +
>  3 files changed, 694 insertions(+), 147 deletions(-)
> 
> -- 
> 2.31.1
> 
>

Re: [PATCH 1/7] vhost: Add Error parameter to vhost_dev_init()

2021-06-11 Thread Raphael Norwitz

On Wed, Jun 09, 2021 at 05:46:52PM +0200, Kevin Wolf wrote:
> This allows callers to return better error messages instead of making
> one up while the real error ends up on stderr. Most callers can
> immediately make use of this because they already have an Error
> parameter themselves. The others just keep printing the error with
> error_report_err().
> 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Raphael Norwitz 

> ---
>  include/hw/virtio/vhost.h|  2 +-
>  backends/cryptodev-vhost.c   |  5 -
>  backends/vhost-user.c|  4 ++--
>  hw/block/vhost-user-blk.c|  4 ++--
>  hw/net/vhost_net.c   |  6 +-
>  hw/scsi/vhost-scsi.c |  4 +---
>  hw/scsi/vhost-user-scsi.c|  4 +---
>  hw/virtio/vhost-user-fs.c|  3 +--
>  hw/virtio/vhost-user-vsock.c |  3 +--
>  hw/virtio/vhost-vsock.c  |  3 +--
>  hw/virtio/vhost.c| 16 ++--
>  11 files changed, 29 insertions(+), 25 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 21a9a52088..2d7aaad67b 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -104,7 +104,7 @@ struct vhost_net {
>  
>  int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> VhostBackendType backend_type,
> -   uint32_t busyloop_timeout);
> +   uint32_t busyloop_timeout, Error **errp);
>  void vhost_dev_cleanup(struct vhost_dev *hdev);
>  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>  void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
> index 8231e7f1bc..bc13e466b4 100644
> --- a/backends/cryptodev-vhost.c
> +++ b/backends/cryptodev-vhost.c
> @@ -52,6 +52,7 @@ cryptodev_vhost_init(
>  {
>  int r;
>  CryptoDevBackendVhost *crypto;
> +Error *local_err = NULL;
>  
>  crypto = g_new(CryptoDevBackendVhost, 1);
>  crypto->dev.max_queues = 1;
> @@ -66,8 +67,10 @@ cryptodev_vhost_init(
>  /* vhost-user needs vq_index to initiate a specific queue pair */
>  crypto->dev.vq_index = crypto->cc->queue_index * crypto->dev.nvqs;
>  
> -r = vhost_dev_init(>dev, options->opaque, options->backend_type, 
> 0);
> +r = vhost_dev_init(>dev, options->opaque, options->backend_type, 
> 0,
> +   _err);
>  if (r < 0) {
> +error_report_err(local_err);
>  goto fail;
>  }
>  
> diff --git a/backends/vhost-user.c b/backends/vhost-user.c
> index b366610e16..10b39992d2 100644
> --- a/backends/vhost-user.c
> +++ b/backends/vhost-user.c
> @@ -48,9 +48,9 @@ vhost_user_backend_dev_init(VhostUserBackend *b, 
> VirtIODevice *vdev,
>  b->dev.nvqs = nvqs;
>  b->dev.vqs = g_new0(struct vhost_virtqueue, nvqs);
>  
> -ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 
> 0);
> +ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 0,
> + errp);
>  if (ret < 0) {
> -error_setg_errno(errp, -ret, "vhost initialization failed");
>  return -1;
>  }
>  
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index c6210fad0c..0cb56baefb 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -332,9 +332,9 @@ static int vhost_user_blk_connect(DeviceState *dev, Error 
> **errp)
>  
>  vhost_dev_set_config_notifier(>dev, _ops);
>  
> -ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 
> 0);
> +ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 0,
> + errp);
>  if (ret < 0) {
> -error_setg_errno(errp, -ret, "vhost initialization failed");
>  return ret;
>  }
>  
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 44c1ed92dc..447b119f85 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -22,6 +22,7 @@
>  #include "standard-headers/linux/vhost_types.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
> +#include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  
> @@ -157,6 +158,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
>  bool backend_kernel = options->backend_type == VHOST_BACKEND_TYPE_KERNEL;
>  struct vhost_net *net = g_new0(struct vhost_net, 1);
>  uint64_t features = 0;
> +Error *local_err = NULL;
>  
>  if (!options->net_backend) {
>  fprintf(stderr, "vhost-net requires net backend to be setup\n");
> @@ -187,8 +189,10 @@ struct vhost_net *vhost_net_init(VhostNetOptions 
> *options)
>  }
>  
>  r = vhost_dev_init(>dev, options->opaque,
> -   options->backend_type, options->busyloop_timeout);
> +   options->backend_type, options->busyloop_timeout,
> +   _err);
>  if (r < 0) {
> +error_report_err(local_err);
>  goto fail;
>  }
>  if (backend_kernel)

[PATCH] target/arm: Implement MTE3

2021-06-11 Thread Peter Collingbourne

MTE3 introduces an asymmetric tag checking mode, in which loads are
checked synchronously and stores are checked asynchronously. Add
support for it.

Signed-off-by: Peter Collingbourne 
---
 target/arm/cpu64.c  |  2 +-
 target/arm/mte_helper.c | 83 ++---
 2 files changed, 53 insertions(+), 32 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 1c23187d1a..c7a1626bec 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -683,7 +683,7 @@ static void aarch64_max_initfn(Object *obj)
  * during realize if the board provides no tag memory, much like
  * we do for EL2 with the virtualization=on property.
  */
-t = FIELD_DP64(t, ID_AA64PFR1, MTE, 2);
+t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);
 cpu->isar.id_aa64pfr1 = t;
 
 t = cpu->isar.id_aa64mmfr0;
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index 166b9d260f..7b76d871ff 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -538,13 +538,51 @@ void HELPER(stzgm_tags)(CPUARMState *env, uint64_t ptr, 
uint64_t val)
 }
 }
 
+static void mte_sync_check_fail(CPUARMState *env, uint32_t desc,
+uint64_t dirty_ptr, uintptr_t ra)
+{
+int is_write, syn;
+
+env->exception.vaddress = dirty_ptr;
+
+is_write = FIELD_EX32(desc, MTEDESC, WRITE);
+syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0, is_write,
+0x11);
+raise_exception_ra(env, EXCP_DATA_ABORT, syn, exception_target_el(env), 
ra);
+g_assert_not_reached();
+}
+
+static void mte_async_check_fail(CPUARMState *env, uint32_t desc,
+ uint64_t dirty_ptr, uintptr_t ra,
+ ARMMMUIdx arm_mmu_idx, int el)
+{
+int select;
+
+if (regime_has_2_ranges(arm_mmu_idx)) {
+select = extract64(dirty_ptr, 55, 1);
+} else {
+select = 0;
+}
+env->cp15.tfsr_el[el] |= 1 << select;
+#ifdef CONFIG_USER_ONLY
+/*
+ * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
+ * which then sends a SIGSEGV when the thread is next scheduled.
+ * This cpu will return to the main loop at the end of the TB,
+ * which is rather sooner than "normal".  But the alternative
+ * is waiting until the next syscall.
+ */
+qemu_cpu_kick(env_cpu(env));
+#endif
+}
+
 /* Record a tag check failure.  */
 static void mte_check_fail(CPUARMState *env, uint32_t desc,
uint64_t dirty_ptr, uintptr_t ra)
 {
 int mmu_idx = FIELD_EX32(desc, MTEDESC, MIDX);
 ARMMMUIdx arm_mmu_idx = core_to_aa64_mmu_idx(mmu_idx);
-int el, reg_el, tcf, select, is_write, syn;
+int el, reg_el, tcf;
 uint64_t sctlr;
 
 reg_el = regime_el(env, arm_mmu_idx);
@@ -564,14 +602,8 @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 switch (tcf) {
 case 1:
 /* Tag check fail causes a synchronous exception. */
-env->exception.vaddress = dirty_ptr;
-
-is_write = FIELD_EX32(desc, MTEDESC, WRITE);
-syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
-is_write, 0x11);
-raise_exception_ra(env, EXCP_DATA_ABORT, syn,
-   exception_target_el(env), ra);
-/* noreturn, but fall through to the assert anyway */
+mte_sync_check_fail(env, desc, dirty_ptr, ra);
+break;
 
 case 0:
 /*
@@ -583,30 +615,19 @@ static void mte_check_fail(CPUARMState *env, uint32_t 
desc,
 
 case 2:
 /* Tag check fail causes asynchronous flag set.  */
-if (regime_has_2_ranges(arm_mmu_idx)) {
-select = extract64(dirty_ptr, 55, 1);
-} else {
-select = 0;
-}
-env->cp15.tfsr_el[el] |= 1 << select;
-#ifdef CONFIG_USER_ONLY
-/*
- * Stand in for a timer irq, setting _TIF_MTE_ASYNC_FAULT,
- * which then sends a SIGSEGV when the thread is next scheduled.
- * This cpu will return to the main loop at the end of the TB,
- * which is rather sooner than "normal".  But the alternative
- * is waiting until the next syscall.
- */
-qemu_cpu_kick(env_cpu(env));
-#endif
+mte_async_check_fail(env, desc, dirty_ptr, ra, arm_mmu_idx, el);
 break;
 
-default:
-/* Case 3: Reserved. */
-qemu_log_mask(LOG_GUEST_ERROR,
-  "Tag check failure with SCTLR_EL%d.TCF%s "
-  "set to reserved value %d\n",
-  reg_el, el ? "" : "0", tcf);
+case 3:
+/*
+ * Tag check fail causes asynchronous flag set for stores, or
+ * a synchronous exception for loads.
+ */
+if (FIELD_EX32(desc, MTEDESC, WRITE)) {
+mte_async_check_fail(env, desc, dirty_ptr, ra, arm_mmu_idx, el);
+} else {
+

[PATCH v2 2/1] qemu-img: Add "backing":true to unallocated map segments

2021-06-11 Thread Eric Blake

To save the user from having to check 'qemu-img info --backing-chain'
or other followup command to determine which "depth":n goes beyond the
chain, add a boolean field "backing" that is set only for unallocated
portions of the disk.

Signed-off-by: Eric Blake 
---

Touches the same iotest output as 1/1.  If we decide that switching to
"depth":n+1 is too risky, and that the mere addition of "backing":true
while keeping "depth":n is good enough, then we'd have just one patch,
instead of this double churn.  Preferences?

 docs/tools/qemu-img.rst|  3 ++
 qapi/block-core.json   |  7 ++-
 qemu-img.c | 15 +-
 tests/qemu-iotests/122.out | 34 +++---
 tests/qemu-iotests/154.out | 96 +++---
 tests/qemu-iotests/179.out | 66 +-
 tests/qemu-iotests/223.out | 24 +-
 tests/qemu-iotests/244.out |  6 +--
 tests/qemu-iotests/252.out |  4 +-
 tests/qemu-iotests/274.out | 16 +++
 tests/qemu-iotests/291.out |  8 ++--
 tests/qemu-iotests/309.out |  4 +-
 12 files changed, 150 insertions(+), 133 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index c155b1bf3cc8..fbc623b645c3 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -601,6 +601,9 @@ Command description:
 a ``depth``; for example, a depth of 2 refers to the backing file
 of the backing file of *FILENAME*.  Depth will be one larger than
 the chain length if no file in the chain provides the data.
+  - an optional ``backing`` field is present with value true if no
+file in the backing chain provides the data (making it easier to
+identify when ``depth`` exceeds the chain length).

   In JSON format, the ``offset`` field is optional; it is absent in
   cases where ``human`` format would omit the entry or exit with an error.
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2ea294129e08..cebe12ba16a0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -264,6 +264,9 @@
 # @offset: if present, the image file stores the data for this range
 #  in raw format at the given (host) offset
 #
+# @backing: if present, the range is not allocated within the backing
+#   chain (since 6.1)
+#
 # @filename: filename that is referred to by @offset
 #
 # Since: 2.6
@@ -271,8 +274,8 @@
 ##
 { 'struct': 'MapEntry',
   'data': {'start': 'int', 'length': 'int', 'data': 'bool',
-   'zero': 'bool', 'depth': 'int', '*offset': 'int',
-   '*filename': 'str' } }
+   'zero': 'bool', 'depth': 'int', '*backing': 'bool',
+   '*offset': 'int', '*filename': 'str' } }

 ##
 # @BlockdevCacheInfo:
diff --git a/qemu-img.c b/qemu-img.c
index 33a5cd012b8b..4d357f534803 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2977,8 +2977,13 @@ static int dump_map_entry(OutputFormat output_format, 
MapEntry *e,
 break;
 case OFORMAT_JSON:
 printf("{ \"start\": %"PRId64", \"length\": %"PRId64","
-   " \"depth\": %"PRId64", \"zero\": %s, \"data\": %s",
-   e->start, e->length, e->depth,
+   " \"depth\": %"PRId64, e->start, e->length, e->depth);
+if (e->has_backing) {
+/* Backing should only be set at the end of the chain */
+assert(e->backing && e->depth > 0);
+printf(", \"backing\": true");
+}
+printf(", \"zero\": %s, \"data\": %s",
e->zero ? "true" : "false",
e->data ? "true" : "false");
 if (e->has_offset) {
@@ -2999,6 +3004,7 @@ static int get_block_status(BlockDriverState *bs, int64_t 
offset,
 {
 int ret;
 int depth;
+bool backing = false;
 BlockDriverState *file;
 bool has_offset;
 int64_t map;
@@ -3037,6 +3043,7 @@ static int get_block_status(BlockDriverState *bs, int64_t 
offset,
 }
 if (!(ret & BDRV_BLOCK_ALLOCATED)) {
 depth++;
+backing = true;
 }

 *e = (MapEntry) {
@@ -3047,6 +3054,8 @@ static int get_block_status(BlockDriverState *bs, int64_t 
offset,
 .offset = map,
 .has_offset = has_offset,
 .depth = depth,
+.has_backing = backing,
+.backing = backing,
 .has_filename = filename,
 .filename = filename,
 };
@@ -3072,6 +3081,8 @@ static inline bool entry_mergeable(const MapEntry *curr, 
const MapEntry *next)
 if (curr->has_offset && curr->offset + curr->length != next->offset) {
 return false;
 }
+/* backing should only ever be set for identical depth */
+assert(curr->backing == next->backing);
 return true;
 }

diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 779dab4847f0..c5aa2c9866f1 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -68,11 +68,11 @@ read 65536/65536 bytes at offset 4194304
 read 65536/65536 bytes at offset 8388608
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 65536,

[PATCH] docs: Add '-device intel-iommu' entry

2021-06-11 Thread Peter Xu

The parameters of intel-iommu device are non-trivial to understand.  Add an
entry for it so that people can reference to it when using.

There're actually a few more options there, but I hide them explicitly because
they shouldn't be used by normal QEMU users.

Cc: Chao Yang 
Cc: Lei Yang 
Cc: Jing Zhao 
Cc: Jason Wang 
Cc: Michael S. Tsirkin 
Cc: Alex Williamson 
Signed-off-by: Peter Xu 
---
 qemu-options.hx | 32 
 1 file changed, 32 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 14258784b3a..4bb04243907 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -926,6 +926,38 @@ SRST
 
 ``-device pci-ipmi-bt,bmc=id``
 Like the KCS interface, but defines a BT interface on the PCI bus.
+
+``-device intel-iommu[,option=...]``
+This is only supported by ``-machine q35``, which will enable Intel VT-d
+emulation within the guest.  It supports below options:
+
+``intremap=on|off`` (default: auto)
+This enables interrupt remapping feature in the guest.  It's required
+to enable complete x2apic.  Currently it only supports kvm
+kernel-irqchip modes ``off`` or ``split``.  Full kernel-irqchip is not
+yet supported.
+
+``caching-mode=on|off`` (default: off)
+This enables caching mode for the VT-d emulated device.  When
+caching-mode is enabled, each guest DMA buffer mapping will generate an
+IOTLB invalidation from the guest IOMMU driver to the vIOMMU device in
+a synchronous way.  It is required for ``-device vfio-pci`` to work
+with the VT-d device, because host assigned devices requires to setup
+the DMA mapping on the host before guest DMA starts.
+
+``device-iotlb=on|off`` (default: off)
+This enables device-iotlb capability for the emulated VT-d device.  So
+far virtio/vhost should be the only real user for this parameter,
+paired with ats=on configured for the device.
+
+``aw-bits=39|48`` (default: 39)
+This decides the address width of IOVA address space.  The address
+space has 39 bits width for 3-level IOMMU page tables, and 48 bits for
+4-level IOMMU page tables.
+
+Please also refer to the wiki page for general scenarios of VT-d
+emulation in QEMU: https://wiki.qemu.org/Features/VT-d.
+
 ERST
 
 DEF("name", HAS_ARG, QEMU_OPTION_name,
-- 
2.31.1

[PATCH v4 5/6] ACPI ERST: qtest for ERST

2021-06-11 Thread Eric DeVolder

This change provides a qtest that locates and then does a simple
interrogation of the ERST feature within the guest.

Signed-off-by: Eric DeVolder 
---
 tests/qtest/erst-test.c | 109 
 tests/qtest/meson.build |   2 +
 2 files changed, 111 insertions(+)
 create mode 100644 tests/qtest/erst-test.c

diff --git a/tests/qtest/erst-test.c b/tests/qtest/erst-test.c
new file mode 100644
index 000..7d97d62
--- /dev/null
+++ b/tests/qtest/erst-test.c
@@ -0,0 +1,109 @@
+/*
+ * QTest testcase for ACPI ERST
+ *
+ * Copyright (c) 2021 Oracle
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+#include "qemu/uuid.h"
+#include "hw/acpi/acpi-defs.h"
+#include "boot-sector.h"
+#include "acpi-utils.h"
+#include "libqos/libqtest.h"
+#include "qapi/qmp/qdict.h"
+
+#define RSDP_ADDR_INVALID 0x10 /* RSDP must be below this address */
+
+static uint64_t acpi_find_erst(QTestState *qts)
+{
+uint32_t rsdp_offset;
+uint8_t rsdp_table[36 /* ACPI 2.0+ RSDP size */];
+uint32_t rsdt_len, table_length;
+uint8_t *rsdt, *ent;
+uint64_t base = 0;
+
+/* Wait for guest firmware to finish and start the payload. */
+boot_sector_test(qts);
+
+/* Tables should be initialized now. */
+rsdp_offset = acpi_find_rsdp_address(qts);
+
+g_assert_cmphex(rsdp_offset, <, RSDP_ADDR_INVALID);
+
+acpi_fetch_rsdp_table(qts, rsdp_offset, rsdp_table);
+acpi_fetch_table(qts, , _len, _table[16 /* RsdtAddress */],
+ 4, "RSDT", true);
+
+ACPI_FOREACH_RSDT_ENTRY(rsdt, rsdt_len, ent, 4 /* Entry size */) {
+uint8_t *table_aml;
+acpi_fetch_table(qts, _aml, _length, ent, 4, NULL, true);
+if (!memcmp(table_aml + 0 /* Header Signature */, "ERST", 4)) {
+/*
+ * Picking up ERST base address from the Register Region
+ * specified as part of the first Serialization Instruction
+ * Action (which is a Begin Write Operation).
+ */
+memcpy(, _aml[56], sizeof(base));
+g_free(table_aml);
+break;
+}
+g_free(table_aml);
+}
+g_free(rsdt);
+return base;
+}
+
+static char disk[] = "tests/erst-test-disk-XX";
+
+#define ERST_CMD()  \
+"-accel kvm -accel tcg "\
+"-object memory-backend-file," \
+  "id=erstram,mem-path=tests/acpi-erst-XX,size=0x1,share=on " \
+"-device acpi-erst,memdev=erstram,bus=pci.0 " \
+"-drive id=hd0,if=none,file=%s,format=raw " \
+"-device ide-hd,drive=hd0 ", disk
+
+static void erst_get_error_log_address_range(void)
+{
+QTestState *qts;
+uint64_t log_address_range = 0;
+
+qts = qtest_initf(ERST_CMD());
+
+uint64_t base = acpi_find_erst(qts);
+g_assert(base != 0);
+
+/* Issue GET_ERROR_LOG_ADDRESS_RANGE command */
+qtest_writel(qts, base + 0, 0xD);
+/* Read GET_ERROR_LOG_ADDRESS_RANGE result */
+log_address_range = qtest_readq(qts, base + 8);\
+
+/* Check addr_range is offset of base */
+g_assert((base + 16) == log_address_range);
+
+qtest_quit(qts);
+}
+
+int main(int argc, char **argv)
+{
+int ret;
+
+ret = boot_sector_init(disk);
+if (ret) {
+return ret;
+}
+
+g_test_init(, , NULL);
+
+qtest_add_func("/erst/get-error-log-address-range",
+   erst_get_error_log_address_range);
+
+ret = g_test_run();
+boot_sector_cleanup(disk);
+
+return ret;
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 0c76738..deae443 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -66,6 +66,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_RTL8139_PCI') ? ['rtl8139-test'] : []) + 
 \
   (config_all_devices.has_key('CONFIG_E1000E_PCI_EXPRESS') ? 
['fuzz-e1000e-test'] : []) +   \
   (config_all_devices.has_key('CONFIG_ESP_PCI') ? ['am53c974-test'] : []) +
 \
+  (config_all_devices.has_key('CONFIG_ACPI') ? ['erst-test'] : []) +   
  \
   qtests_pci + 
 \
   ['fdc-test',
'ide-test',
@@ -237,6 +238,7 @@ qtests = {
   'bios-tables-test': [io, 'boot-sector.c', 'acpi-utils.c', 'tpm-emu.c'],
   'cdrom-test': files('boot-sector.c'),
   'dbus-vmstate-test': files('migration-helpers.c') + dbus_vmstate1,
+  'erst-test': files('erst-test.c', 'boot-sector.c', 'acpi-utils.c'),
   'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
   'migration-test': files('migration-helpers.c'),
   'pxe-test': files('boot-sector.c'),
-- 
1.8.3.1

[PATCH v4 4/6] ACPI ERST: create ACPI ERST table for pc/x86 machines.

2021-06-11 Thread Eric DeVolder

This change exposes ACPI ERST support for x86 guests.

Signed-off-by: Eric DeVolder 
---
 hw/i386/acpi-build.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index de98750..d8cae69 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -43,6 +43,7 @@
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "hw/acpi/vmgenid.h"
+#include "hw/acpi/erst.h"
 #include "hw/boards.h"
 #include "sysemu/tpm_backend.h"
 #include "hw/rtc/mc146818rtc_regs.h"
@@ -2388,6 +2389,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 ACPI_DEVICE_IF(x86ms->acpi_dev), x86ms->oem_id,
 x86ms->oem_table_id);
 
+acpi_add_table(table_offsets, tables_blob);
+build_erst(tables_blob, tables->linker,
+   x86ms->oem_id, x86ms->oem_table_id);
+
 vmgenid_dev = find_vmgenid_dev();
 if (vmgenid_dev) {
 acpi_add_table(table_offsets, tables_blob);
-- 
1.8.3.1

[PATCH v4 3/6] ACPI ERST: support for ACPI ERST feature

2021-06-11 Thread Eric DeVolder

This change implements the support for the ACPI ERST feature[1,2].

To utilize ACPI ERST, a memory-backend-file object and acpi-erst
device must be created, for example:

 qemu ...
 -object memory-backend-file,id=erstram,mem-path=acpi-erst.backing,
  size=0x1,shared=on
 -device acpi-erst,memdev=erstram,bus=pcie.0

For proper operation, the ACPI ERST device needs a memory-backend-file
object with the following parameters mem-path, size, and shared.

 - id: The id of the memory-backend-file object is used to associate
   this memory with the acpi-erst device.

 - size: The size of the ACPI ERST backing storage. This parameter is
   required.
 - mem-path: The location of the ACPI ERST backing storage file. This
   parameter is also required.

 - shared: The shared=on parameter is required so that updates to the
   ERST back store are written to the file immediately as well. Without
   it, updates the the backing file are unpredictable and may not
   properly persist (eg. if qemu should crash).

The ACPI ERST device is a simple PCI device, and requires these two
parameters:

 - memdev: Is the object id of the memory-backend-file.

 - bus: The name of the pci bus to which to connect.

This change also includes erst.c in the build of general ACPI support.

[1] "Advanced Configuration and Power Interface Specification",
version 6.2, May 2017.
https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

[2] "Unified Extensible Firmware Interface Specification",
version 2.8, March 2019.
https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf

Signed-off-by: Eric DeVolder 
---
 hw/acpi/erst.c  | 880 
 hw/acpi/meson.build |   1 +
 2 files changed, 881 insertions(+)
 create mode 100644 hw/acpi/erst.c

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
new file mode 100644
index 000..1a72fad
--- /dev/null
+++ b/hw/acpi/erst.c
@@ -0,0 +1,880 @@
+/*
+ * ACPI Error Record Serialization Table, ERST, Implementation
+ *
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * See ACPI specification,
+ * "ACPI Platform Error Interfaces" : "Error Serialization"
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include 
+#include 
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/qdev-core.h"
+#include "exec/memory.h"
+#include "qom/object.h"
+#include "hw/pci/pci.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "exec/address-spaces.h"
+#include "sysemu/hostmem.h"
+#include "hw/acpi/erst.h"
+
+#ifdef _ERST_DEBUG
+#define erst_debug(fmt, ...) \
+do { fprintf(stderr, fmt, ## __VA_ARGS__); fflush(stderr); } while (0)
+#else
+#define erst_debug(fmt, ...) do { } while (0)
+#endif
+
+/* See UEFI spec, Appendix N Common Platform Error Record */
+/* UEFI CPER allows for an OSPM book keeping area in the record */
+#define UEFI_CPER_RECORD_MIN_SIZE 128U
+#define UEFI_CPER_SIZE_OFFSET 20U
+#define UEFI_CPER_RECORD_ID_OFFSET 96U
+#define IS_UEFI_CPER_RECORD(ptr) \
+(((ptr)[0] == 'C') && \
+ ((ptr)[1] == 'P') && \
+ ((ptr)[2] == 'E') && \
+ ((ptr)[3] == 'R'))
+#define THE_UEFI_CPER_RECORD_ID(ptr) \
+(*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
+
+#define ERST_INVALID_RECORD_ID (~0UL)
+#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
+#define ERST_CSR_ACTION (0UL << 3) /* action (cmd) */
+#define ERST_CSR_VALUE  (1UL << 3) /* argument/value (data) */
+
+/*
+ * As ERST_IOMEM_SIZE is used to map the ERST into the guest,
+ * it should/must be an integer multiple of PAGE_SIZE.
+ * NOTE that any change to this value will make any pre-
+ * existing backing files, not of the same ERST_IOMEM_SIZE,
+ * unusable to the guest.
+ */
+#define ERST_IOMEM_SIZE (2UL * 4096)
+
+/*
+ * This implementation is an ACTION (cmd) and VALUE (data)
+ * interface consisting of just two 64-bit registers.
+ */
+#define ERST_REG_LEN (2UL * sizeof(uint64_t))
+
+/*
+ * The space not utilized by the register interface is the
+ * buffer for exchanging ERST record contents.
+ */
+#define ERST_RECORD_SIZE (ERST_IOMEM_SIZE - ERST_REG_LEN)
+
+/*
+ * Mode to be used for backing file
+ */
+#define ACPIERST(obj) \
+

[PATCH v4 1/6] ACPI ERST: bios-tables-test.c steps 1 and 2

2021-06-11 Thread Eric DeVolder

Following the guidelines in tests/qtest/bios-tables-test.c, this
change adds empty placeholder files per step 1 for the new ERST
table, and excludes resulting changed files in bios-tables-test-allowed-diff.h
per step 2.

Signed-off-by: Eric DeVolder 
---
 tests/data/acpi/microvm/ERST| 0
 tests/data/acpi/pc/ERST | 0
 tests/data/acpi/q35/ERST| 0
 tests/qtest/bios-tables-test-allowed-diff.h | 4 
 4 files changed, 4 insertions(+)
 create mode 100644 tests/data/acpi/microvm/ERST
 create mode 100644 tests/data/acpi/pc/ERST
 create mode 100644 tests/data/acpi/q35/ERST

diff --git a/tests/data/acpi/microvm/ERST b/tests/data/acpi/microvm/ERST
new file mode 100644
index 000..e69de29
diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
new file mode 100644
index 000..e69de29
diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
new file mode 100644
index 000..e69de29
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523..e004c71 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,5 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/ERST",
+"tests/data/acpi/q35/ERST",
+"tests/data/acpi/microvm/ERST",
+
-- 
1.8.3.1

[PATCH v4 6/6] ACPI ERST: step 6 of bios-tables-test.c

2021-06-11 Thread Eric DeVolder

Following the guidelines in tests/qtest/bios-tables-test.c, this
is step 6, the re-generated ACPI tables binary blobs.

Signed-off-by: Eric DeVolder 
---
 tests/data/acpi/pc/ERST | Bin 0 -> 976 bytes
 tests/data/acpi/q35/ERST| Bin 0 -> 976 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   4 
 3 files changed, 4 deletions(-)

diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..7236018951f9d111d8cacaa93ee07a8dc3294f18
 100644
GIT binary patch
literal 976
zcmaKqSq_3Q6h#Y^dE9^rOK=GWV{BUvZ#VzQD#NN_})srw9ZqJfYH@yl5T!0*@ExA}PsmTmxA~y7^f
z`=C;Mzb#Jlr!;>?JY$ah@BPi%Ksot29-5NhSucQ
c)srw9ZqJfYH@yl5T!0*@ExA}PsmTmxA~y7^f
z`=C;Mzb#Jlr!;>?JY$ah@BPi%Ksot29-5NhSucQ
c

Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole

2021-06-11 Thread Eric Blake

On Fri, Jun 11, 2021 at 08:35:01PM +0300, Nir Soffer wrote:
> On Fri, Jun 11, 2021 at 4:28 PM Eric Blake  wrote:
> >
> > On Fri, Jun 11, 2021 at 10:09:09AM +0200, Kevin Wolf wrote:
> > > > Yes, that might work as well.  But we didn't previously document
> > > > depth to be optional.  Removing something from output risks breaking
> > > > more downstream tools that expect it to be non-optional, compared to
> > > > providing a new value.
> > >
> > > A negative value isn't any less unexpected than a missing key. I don't
> > > think any existing tool would be able to handle it. Encoding different
> > > meanings in a single value isn't very QAPI-like either. Usually strings
> > > that are parsed are the problem, but negative integers really isn't that
> > > much different. I don't really like this solution.
> > >
> > > Leaving out the depth feels like a better suggestion to me.
> > >
> > > But anyway, this seems to only happen at the end of the backing chain.
> > > So if the backing chain consistents of n images, why not report 'depth':
> > > n + 1? So, in the above example, you would get 1. I think this has the
> > > best chances of tools actually working correctly with the new output,
> > > even though it's still not unlikely to break something.
> >
> > Ooh, I like that.  It is closer to reality - the file data really
> > comes from the next depth, even if we have no filename at that depth.
> > v2 of my patch coming up.
> 
> How do you know the number of the layer? this info is not presented in
> qemu-img map output.

qemu-img map has two output formats.

In --output=human, areas of the disk reading as zero are elided (and
this happens to include ALL areas that were not allocated anywhere in
the chain); all other areas list the filename of the element in the
chain where the data was found.  This mode also fails if compression
or encryption prevents easy access to actual data.  In other words,
it's fragile, so no one uses it for anything programmatic, even though
it's the default.

In --output=json, no file names are output.  Instead, "depth":N tells
you how deep in the backing chain you must traverse to find the data.
"depth":0 is obvious: the file you mapped (other than the bug that
this patch is fixing where we mistakenly used "depth":0 also for
unallocated regions).  If you use "backing":null to force a 1-layer
depth, then "depth":1 is unambiguous meaning the (non-present) backing
file.

Otherwise, you do have a point: "depth":1 in isolation is ambiguous
between "not allocated anywhere in this 1-element chain" and
"allocated at the first backing file in this chain of length 2 or
more".  At which point you can indeed use "qemu-img info" to determine
the backing chain depth.  How painful is that extra step?  Does it
justify the addition of a new optional "backing":true to any portion
of the file that was beyond the end of the chain (and omit that line
for all other regions, rather than printing "backing":false)?

> 
> Users will have to run "qemu-img info --backing-chain" to understand the
> output of qemu-img map.

At any rate, it should be easy enough to output an additional field,
followup patch coming soon...

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH v4 2/6] ACPI ERST: header file for ERST

2021-06-11 Thread Eric DeVolder

This change introduces the defintions for ACPI ERST.

Signed-off-by: Eric DeVolder 
---
 include/hw/acpi/erst.h | 79 ++
 1 file changed, 79 insertions(+)
 create mode 100644 include/hw/acpi/erst.h

diff --git a/include/hw/acpi/erst.h b/include/hw/acpi/erst.h
new file mode 100644
index 000..a18d58e
--- /dev/null
+++ b/include/hw/acpi/erst.h
@@ -0,0 +1,79 @@
+/*
+ * ACPI Error Record Serialization Table, ERST, Implementation
+ *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ *
+ * See ACPI specification, "ACPI Platform Error Interfaces"
+ *  "Error Serialization"
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+#ifndef HW_ACPI_ERST_H
+#define HW_ACPI_ERST_H
+
+void build_erst(GArray *table_data, BIOSLinker *linker,
+const char *oem_id, const char *oem_table_id);
+
+#define TYPE_ACPI_ERST "acpi-erst"
+#define ACPI_ERST_MEMDEV_PROP "memdev"
+
+#define ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION 0x0
+#define ACPI_ERST_ACTION_BEGIN_READ_OPERATION  0x1
+#define ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION 0x2
+#define ACPI_ERST_ACTION_END_OPERATION 0x3
+#define ACPI_ERST_ACTION_SET_RECORD_OFFSET 0x4
+#define ACPI_ERST_ACTION_EXECUTE_OPERATION 0x5
+#define ACPI_ERST_ACTION_CHECK_BUSY_STATUS 0x6
+#define ACPI_ERST_ACTION_GET_COMMAND_STATUS0x7
+#define ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER 0x8
+#define ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER 0x9
+#define ACPI_ERST_ACTION_GET_RECORD_COUNT  0xA
+#define ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
+#define ACPI_ERST_ACTION_RESERVED  0xC
+#define ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
+#define ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
+#define ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
+#define ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
+#define ACPI_ERST_MAX_ACTIONS \
+(ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS + 1)
+
+#define ACPI_ERST_STATUS_SUCCESS0x00
+#define ACPI_ERST_STATUS_NOT_ENOUGH_SPACE   0x01
+#define ACPI_ERST_STATUS_HARDWARE_NOT_AVAILABLE 0x02
+#define ACPI_ERST_STATUS_FAILED 0x03
+#define ACPI_ERST_STATUS_RECORD_STORE_EMPTY 0x04
+#define ACPI_ERST_STATUS_RECORD_NOT_FOUND   0x05
+
+#define ACPI_ERST_INST_READ_REGISTER 0x00
+#define ACPI_ERST_INST_READ_REGISTER_VALUE   0x01
+#define ACPI_ERST_INST_WRITE_REGISTER0x02
+#define ACPI_ERST_INST_WRITE_REGISTER_VALUE  0x03
+#define ACPI_ERST_INST_NOOP  0x04
+#define ACPI_ERST_INST_LOAD_VAR1 0x05
+#define ACPI_ERST_INST_LOAD_VAR2 0x06
+#define ACPI_ERST_INST_STORE_VAR10x07
+#define ACPI_ERST_INST_ADD   0x08
+#define ACPI_ERST_INST_SUBTRACT  0x09
+#define ACPI_ERST_INST_ADD_VALUE 0x0A
+#define ACPI_ERST_INST_SUBTRACT_VALUE0x0B
+#define ACPI_ERST_INST_STALL 0x0C
+#define ACPI_ERST_INST_STALL_WHILE_TRUE  0x0D
+#define ACPI_ERST_INST_SKIP_NEXT_INSTRUCTION_IF_TRUE 0x0E
+#define ACPI_ERST_INST_GOTO  0x0F
+#define ACPI_ERST_INST_SET_SRC_ADDRESS_BASE  0x10
+#define ACPI_ERST_INST_SET_DST_ADDRESS_BASE  0x11
+#define ACPI_ERST_INST_MOVE_DATA 0x12
+
+#endif
+
-- 
1.8.3.1

[PATCH v4 0/6] acpi: Error Record Serialization Table, ERST, support for QEMU

2021-06-11 Thread Eric DeVolder

This patchset introduces support for the ACPI Error Record
Serialization Table, ERST.

Linux uses the persistent storage filesystem, pstore, to record
information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
independent of, and runs before, kdump.  In certain scenarios (ie.
hosts/guests with root filesystems on NFS/iSCSI where networking
software and/or hardware fails), pstore may contain the only
information available for post-mortem debugging.

Two common storage backends for the pstore filesystem are ACPI ERST
and UEFI. Most BIOS implement ACPI ERST; however, ACPI ERST is not
currently supported in QEMU, and UEFI is not utilized in all guests.
By implementing ACPI ERST within QEMU, then the ACPI ERST becomes a
viable pstore storage backend for virtual machines (as it is now for
bare metal machines).

Enabling support for ACPI ERST facilitates a consistent method to
capture kernel panic information in a wide range of guests: from
resource-constrained microvms to very large guests, and in
particular, in direct-boot environments (which would lack UEFI
run-time services).

Note that Microsoft Windows also utilizes the ACPI ERST for certain
crash information, if available.

The ACPI ERST persistent storage is contained within a single backing
file. The size and location of the backing file is specified upon
QEMU startup of the ACPI ERST device.

The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
(APEI)", and specifically subsection "Error Serialization", outlines
a method for storing error records into persistent storage.

[1] "Advanced Configuration and Power Interface Specification",
version 6.2, May 2017.
https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

[2] "Unified Extensible Firmware Interface Specification",
version 2.8, March 2019.
https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf

Suggested-by: Konrad Wilk 
Signed-off-by: Eric DeVolder 

---
v4: 11jun2021
 - Converted to a PCI device, per Igor.
 - Updated qtest.

v3: 28may2021
 - Converted to using a TYPE_MEMORY_BACKEND_FILE object rather than
   internal array with explicit file operations, per Igor.
 - Changed the way the qdev and base address are handled, allowing
   ERST to be disabled at run-time. Also aligns better with other
   existing code.

v2: 8feb2021
 - Added qtest/smoke test per Paolo Bonzini
 - Split patch into smaller chunks, per Igo Mammedov
 - Did away with use of ACPI packed structures, per Igo Mammedov

v1: 26oct2020
 - initial post

---
Eric DeVolder (6):
  ACPI ERST: bios-tables-test.c steps 1 and 2
  ACPI ERST: header file for ERST
  ACPI ERST: support for ACPI ERST feature
  ACPI ERST: create ACPI ERST table for pc/x86 machines.
  ACPI ERST: qtest for ERST
  ACPI ERST: step 6 of bios-tables-test.c

 hw/acpi/erst.c   | 880 +++
 hw/acpi/meson.build  |   1 +
 hw/i386/acpi-build.c |   5 +
 include/hw/acpi/erst.h   |  79 
 tests/data/acpi/microvm/ERST |   0
 tests/data/acpi/pc/ERST  | Bin 0 -> 976 bytes
 tests/data/acpi/q35/ERST | Bin 0 -> 976 bytes
 tests/qtest/erst-test.c  | 109 ++
 tests/qtest/meson.build  |   2 +
 9 files changed, 1076 insertions(+)
 create mode 100644 hw/acpi/erst.c
 create mode 100644 include/hw/acpi/erst.h
 create mode 100644 tests/data/acpi/microvm/ERST
 create mode 100644 tests/data/acpi/pc/ERST
 create mode 100644 tests/data/acpi/q35/ERST
 create mode 100644 tests/qtest/erst-test.c

-- 
1.8.3.1

Re: [PATCH v3 0/7] tests/acceptance: Handle tests with "cpu" tag

2021-06-11 Thread Wainer dos Santos Moschetta


Ping.

Only patches 06 and 07 did not get any review.

The series touches many files and it was last rebased months ago, so 
likely I will need to resolve rebase conflicts. But I would like to have 
the reviews of those patches before.


Thanks!

- Wainer

On 4/30/21 10:34 AM, Wainer dos Santos Moschetta wrote:

Currently the acceptance tests tagged with "machine" have the "-M TYPE"
automatically added to the list of arguments of the QEMUMachine object.
In other words, that option is passed to the launched QEMU. On this
series it is implemented the same feature but instead for tests marked
with "cpu".

There is a caveat, however, in case the test needs additional arguments to
the CPU type they cannot be passed via tag, because the tags parser split
values by comma (limitation which Avocado plans to address, see
https://github.com/avocado-framework/avocado/issues/45410). For example, in
tests/acceptance/x86_cpu_model_versions.py, there are cases where:

   * -cpu is set to 
"Cascadelake-Server,x-force-features=on,check=off,enforce=off"
   * if it was tagged like 
"cpu:Cascadelake-Server,x-force-features=on,check=off,enforce=off"
 then the parser would break it into 4 tags ("cpu:Cascadelake-Server",
 "x-force-features=on", "check=off", "enforce=off")
   * resulting on "-cpu Cascadelake-Server" and the remaining arguments are 
ignored.

It was introduced the avocado_qemu.Test.set_vm_arg() method to deal with
cases like the example above, so that one can tag it as "cpu:Cascadelake-Server"
AND call self.set_vm_args('-cpu', 
"Cascadelake-Server,x-force-features=on,check=off,enforce=off"),
and that results on the reset of the initial value of -cpu.

This series was tested on CI 
(https://gitlab.com/wainersm/qemu/-/pipelines/294640198)
and with the following code:

from avocado_qemu import Test

class CPUTest(Test):
 def test_cpu(self):
 """
 :avocado: tags=cpu:host
 """
 # The cpu property is set to the tag value, or None on its absence
 self.assertEqual(self.cpu, "host")
 # The created VM has the '-cpu host' option
 self.assertIn("-cpu host", " ".join(self.vm._args))
 self.vm.launch()

 def test_cpu_none(self):
 self.assertEqual(self.cpu, None)
 self.assertNotIn('-cpu', self.vm._args)

 def test_cpu_reset(self):
 """
 :avocado: tags=cpu:host
 """
 self.assertIn("-cpu host", " ".join(self.vm._args))
 self.set_vm_arg("-cpu", "Cascadelake-Server,x-force-features=on")
 self.assertNotIn("-cpu host", " ".join(self.vm._args))
 self.assertIn("-cpu Cascadelake-Server,x-force-features=on", " 
".join(self.vm._args))

Changes:
  - v2 -> v3:
- The arg and value parameters of set_vm_arg() are now mandatories and
  fixed an index out of bounds bug [crosa]
- Rebased. Needed to adapt the (new) boot_xen.py test (patch 03)
  - v1 -> v2:
- Recognize the cpu value passed via test parameter [crosa]
- Fixed tags (patch 02) on preparation to patch 03 [crosa]
- Added QEMUMachine.args property (patch 04) so that _args could be handled
  without pylint complaining (protected property)
- Added Test.set_vm_arg() (patch 05) to handle the corner case [crosa]

Wainer dos Santos Moschetta (7):
   tests/acceptance: Automatic set -cpu to the test vm
   tests/acceptance: Fix mismatch on cpu tagged tests
   tests/acceptance: Let the framework handle "cpu:VALUE" tagged tests
   tests/acceptance: Tagging tests with "cpu:VALUE"
   python/qemu: Add args property to the QEMUMachine class
   tests/acceptance: Add set_vm_arg() to the Test class
   tests/acceptance: Handle cpu tag on x86_cpu_model_versions tests

  docs/devel/testing.rst | 17 +
  python/qemu/machine.py |  5 +++
  tests/acceptance/avocado_qemu/__init__.py  | 26 ++
  tests/acceptance/boot_linux.py |  3 --
  tests/acceptance/boot_linux_console.py | 16 +
  tests/acceptance/boot_xen.py   |  1 -
  tests/acceptance/machine_mips_malta.py |  7 ++--
  tests/acceptance/pc_cpu_hotplug_props.py   |  2 +-
  tests/acceptance/replay_kernel.py  | 17 -
  tests/acceptance/reverse_debugging.py  |  2 +-
  tests/acceptance/tcg_plugins.py| 15 
  tests/acceptance/virtio-gpu.py |  4 +--
  tests/acceptance/x86_cpu_model_versions.py | 40 +-
  13 files changed, 112 insertions(+), 43 deletions(-)

Re: tb_flush() calls causing long Windows XP boot times

2021-06-11 Thread Alex Bennée

Paolo Bonzini  writes:

> On 11/06/21 17:01, Programmingkid wrote:
>> Hello Alex,
>> The good news is the source code to Windows XP is available
>> online:https://github.com/cryptoAlgorithm/nt5src
>
> It's leaked, so I doubt anybody who's paid to work on Linux or QEMU
> would touch that with a ten-foot pole.

Indeed.

Anyway what the OP could do is run QEMU with gdb and -d nochain and
stick a breakpoint (sic) in breakpoint_invalidate. Then each time it
hits you can examine the backtrace to cpu_loop_exec_tb and collect the
data from tb->pc. Then you will have a bunch of addresses in Windows
that keep triggering the behaviour. You can then re-run with -dfilter
and -d in_asm,cpu to get some sort of idea of what Windows is up to.

-- 
Alex Bennée

Re: [PATCH V3 12/22] vfio-pci: cpr part 1

2021-06-11 Thread Steven Sistare

On 5/24/2021 2:29 PM, Steven Sistare wrote:
> On 5/21/2021 6:24 PM, Alex Williamson wrote:> On Fri,  7 May 2021 05:25:10 
> -0700
>> Steve Sistare  wrote:
>>
>>>[...]
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 7a4fb6c..f7ac9f03 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -29,6 +29,8 @@
>>>  #include "hw/qdev-properties.h"
>>>  #include "hw/qdev-properties-system.h"
>>>  #include "migration/vmstate.h"
>>> +#include "migration/cpr.h"
>>> +#include "qemu/env.h"
>>>  #include "qemu/error-report.h"
>>>  #include "qemu/main-loop.h"
>>>  #include "qemu/module.h"
>>> @@ -1612,6 +1614,14 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice 
>>> *vdev, bool enabled)
>>>  }
>>>  }
>>>  
>>> +static void vfio_config_sync(VFIOPCIDevice *vdev, uint32_t offset, size_t 
>>> len)
>>> +{
>>> +if (pread(vdev->vbasedev.fd, vdev->pdev.config + offset, len,
>>> +  vdev->config_offset + offset) != len) {
>>> +error_report("vfio_config_sync pread failed");
>>> +}
>>> +}
>>> +
>>>  static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
>>>  {
>>>  VFIOBAR *bar = >bars[nr];
>>> @@ -1652,6 +1662,7 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
>>>  static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
>>>  {
>>>  VFIOBAR *bar = >bars[nr];
>>> +PCIDevice *pdev = >pdev;
>>>  char *name;
>>>  
>>>  if (!bar->size) {
>>> @@ -1672,7 +1683,10 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, 
>>> int nr)
>>>  }
>>>  }
>>>  
>>> -pci_register_bar(>pdev, nr, bar->type, bar->mr);
>>> +pci_register_bar(pdev, nr, bar->type, bar->mr);
>>> +if (pdev->reused) {
>>> +vfio_config_sync(vdev, pci_bar(pdev, nr), 8);
>>
>> Assuming 64-bit BARs?  This might be the first case where we actually
>> rely on the kernel BAR values, IIRC we usually use QEMU's emulation.
> 
> No asssumptions.  vfio_config_sync() preads a piece of config space using a 
> single 
> system call, copying directly to the qemu buffer, not looking at words or 
> calling any
> action functions.
> 
>[...] 
>>> @@ -3098,6 +3115,11 @@ static void vfio_realize(PCIDevice *pdev, Error 
>>> **errp)
>>>  vfio_register_req_notifier(vdev);
>>>  vfio_setup_resetfn_quirk(vdev);
>>>  
>>> +vfio_config_sync(vdev, pdev->msix_cap + PCI_MSIX_FLAGS, 2);
>>> +if (pdev->reused) {
>>> +pci_update_mappings(pdev);
>>> +}
>>> +
>>
>> Are the msix flag sync and mapping update related?  They seem
>> independent to me.  A blank line and comment would be helpful.
> 
> OK.
> 
>> I expect we'd need to call msix_enabled() somewhere for the msix flag
>> sync to be effective.
> 
> Yes, vfio_pci_post_load in cpr part 2 calls msix_enabled.
> 
>> Is there an assumption here of msi-x only support or is it not needed
>> for msi or intx?
> 
> The code supports msi-x and msi.  However, I should only be sync'ing 
> PCI_MSIX_FLAGS
> if pdev->cap_present & QEMU_PCI_CAP_MSIX.  And, I am missing a sync for 
> PCI_MSI_FLAGS.
> I'll fix that.

Hi Alex, FYI, I am making more changes here.  The calls to vfio_config_sync fix 
pdev->config[]
words that are initialized during vfio_realize(), by pread'ing from the live 
kernel config.
However, it makes more sense to suppress the undesired re-initialization, 
rather than undo
the damage later.  Thus I will add a few more 'if (!pdev->reused)' guards in 
msix and pci bar
init functions, and delete vfio_config_sync.

Most of the config is preserved in the kernel across restart.  However, the 
bits that are
purely emulated (indicated by the emulated_config_bits mask) may be rejected 
when they 
are written through to the kernel, and thus are currently lost on restart.  I 
need to save 
pdev->config[] in the vmstate file, and in vfio_pci_post_load, merge it with 
the kernel 
config using emulated_config_bits.

Sound sane?

- Steve

Re: [PATCH v2] qemu-img: Make unallocated part of backing chain obvious in map

2021-06-11 Thread Nir Soffer

On Fri, Jun 11, 2021 at 5:59 PM Eric Blake  wrote:
>
> On Fri, Jun 11, 2021 at 05:35:12PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > An obvious solution is to make 'qemu-img map --output=json'
> > > distinguish between clusters that have a local allocation from those
> > > that are found nowhere in the chain.  We already have a one-off
> > > mismatch between qemu-img map and NBD qemu:allocation-depth (the
> > > former chose 0, and the latter 1 for the local layer), so exposing the
> > > latter's choice of 0 for unallocated in the entire chain would mean
> > > using using "depth":-1 in the former, but a negative depth may confuse
> > > existing tools.  But there is an easy out: for any chain of length N,
> > > we can simply represent an unallocated cluster as "depth":N+1.  This
> > > does have a slight risk of confusing any tool that might try to
> > > dereference NULL when finding the backing image for the last file in
> > > the backing chain, but that risk sseems worth the more precise output.
> > > The iotests have several examples where this distinction demonstrates
> > > the additional accuracy.
> > >
> > > Signed-off-by: Eric Blake 
> > > ---
> > >
> > > Replaces v1: 20210610213906.1313440-1-ebl...@redhat.com
> > > (qemu-img: Use "depth":-1 to make backing probes obvious)
> > >
> > > Use N+1 instead of -1 for unallocated [Kevin]
> > >
> >
> > Bit in contrast with -1, or with separate boolean flag, you lose the 
> > possibility to distinguish case when we have 3 layers and the cluster is 
> > absent in all of them, and the case when we have 4 layers and the cluster 
> > is absent in top 3 but in 4 it is qcow2 UNALLOCATED_ZERO cluster.
>
> Using just 'qemu-img map --output-json', you only see depth numbers.
> You also have to use 'qemu-img info --backing-chain' to see what file
> those depth numbers correspond to, at which point it becomes obvious
> whether "depth":4 meant unallocated (because the chain was length 3)
> or allocated at depth 4 (because the chain was length 4 or longer).
> But that's no worse than pre-patch, where you had to use qemu-img info
> --backing-chain to learn which file a particular "depth" maps to.
>
> >
> > So, if someone use this API to reconstruct the chain, then for original 3 
> > empty layers he will create 3 empty layers and 4rd additional ZERO layer. 
> > And such reconstructed chain would not be equal to original chain (as if we 
> > take these two chains and add additional backing file as a new bottom 
> > layer, effect would be different).. I'm not sure is it a problem in the 
> > task you are solving :\
>
> It should be fairly easy to optimize the case of a backing chain where
> EVERY listed cluster at the final depth was "data":false,"zero":true
> to omit that file after all.
>
> And in oVirt's case, Nir pointed out that we have one more tool at our
> disposal in recreating a backing chain: if you use
> json:{"driver":"qcow2", "backing":null, ...} as your image file, you
> don't have to worry about arbitrary files in the backing chain, only
> about recreating the top-most layer of a chain.  And in that case, it
> becomes very obvious that "depth":0 is something you must recreate,
> and "depth":1 would be a non-existent backing file because you just
> passed "backing":null.

Note that oVirt does not use qemu-img map, we use qemu-nbd to get
image extents, since it is used only in context we already connect to
qemu-nbd server or run qemu-nbd.

Management tools already know the image format (they should avoid
doing format probing anyway), and using a json uri allows single command
to get the needed info when you inspect a single layer.

But this change introduces a risk that some program using qemu-img map
will interrupt the result in the wrong way, assuming that there is N+1 layer.

I think adding a new flag for absent extents is better. It cannot break
any user and it is easier to understand and use.

Nir

Re: [PATCH v4 1/8] hw/intc: GICv3 ITS initial framework

2021-06-11 Thread Eric Auger

Hi,

On 6/2/21 8:00 PM, Shashi Mallela wrote:
> Added register definitions relevant to ITS,implemented overall
> ITS device framework with stubs for ITS control and translater
> regions read/write,extended ITS common to handle mmio init between
> existing kvm device and newer qemu device.
> 
> Signed-off-by: Shashi Mallela 
> ---
>  hw/intc/arm_gicv3_its.c| 240 +
>  hw/intc/arm_gicv3_its_common.c |   8 +-
>  hw/intc/arm_gicv3_its_kvm.c|   2 +-
>  hw/intc/gicv3_internal.h   |  88 +++--
>  hw/intc/meson.build|   1 +
>  include/hw/intc/arm_gicv3_its_common.h |   9 +-
>  6 files changed, 331 insertions(+), 17 deletions(-)
>  create mode 100644 hw/intc/arm_gicv3_its.c
> 
> diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
> new file mode 100644
> index 00..545cda3665
> --- /dev/null
> +++ b/hw/intc/arm_gicv3_its.c
> @@ -0,0 +1,240 @@
> +/*
> + * ITS emulation for a GICv3-based system
> + *
> + * Copyright Linaro.org 2021
> + *
> + * Authors:
> + *  Shashi Mallela 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at 
> your
> + * option) any later version.  See the COPYING file in the top-level 
> directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/intc/arm_gicv3_its_common.h"
> +#include "gicv3_internal.h"
> +#include "qom/object.h"
> +
> +typedef struct GICv3ITSClass GICv3ITSClass;
> +/* This is reusing the GICv3ITSState typedef from ARM_GICV3_ITS_COMMON */
> +DECLARE_OBJ_CHECKERS(GICv3ITSState, GICv3ITSClass,
> + ARM_GICV3_ITS, TYPE_ARM_GICV3_ITS)
> +
> +struct GICv3ITSClass {
> +GICv3ITSCommonClass parent_class;
> +void (*parent_reset)(DeviceState *dev);
> +};
> +
> +static MemTxResult gicv3_its_translation_write(void *opaque, hwaddr offset,
> +   uint64_t data, unsigned size,
> +   MemTxAttrs attrs)
> +{
> +MemTxResult result = MEMTX_OK;
> +
> +return result;
> +}
> +
> +static MemTxResult its_writel(GICv3ITSState *s, hwaddr offset,
> +  uint64_t value, MemTxAttrs attrs)
> +{
> +MemTxResult result = MEMTX_OK;
> +
> +return result;
> +}
> +
> +static MemTxResult its_readl(GICv3ITSState *s, hwaddr offset,
> + uint64_t *data, MemTxAttrs attrs)
> +{
> +MemTxResult result = MEMTX_OK;
> +
> +return result;
> +}
> +
> +static MemTxResult its_writell(GICv3ITSState *s, hwaddr offset,
> +   uint64_t value, MemTxAttrs attrs)
> +{
> +MemTxResult result = MEMTX_OK;
> +
> +return result;
> +}
> +
> +static MemTxResult its_readll(GICv3ITSState *s, hwaddr offset,
> +  uint64_t *data, MemTxAttrs attrs)
> +{
> +MemTxResult result = MEMTX_OK;
> +
> +return result;
> +}
> +
> +static MemTxResult gicv3_its_read(void *opaque, hwaddr offset, uint64_t 
> *data,
> +  unsigned size, MemTxAttrs attrs)
> +{
> +GICv3ITSState *s = (GICv3ITSState *)opaque;
> +MemTxResult result;
> +
> +switch (size) {
> +case 4:
> +result = its_readl(s, offset, data, attrs);
> +break;
> +case 8:
> +result = its_readll(s, offset, data, attrs);
> +break;
> +default:
> +result = MEMTX_ERROR;
> +break;
> +}
> +
> +if (result == MEMTX_ERROR) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: invalid guest read at offset " TARGET_FMT_plx
> +  "size %u\n", __func__, offset, size);
> +/*
> + * The spec requires that reserved registers are RAZ/WI;
> + * so use MEMTX_ERROR returns from leaf functions as a way to
> + * trigger the guest-error logging but don't return it to
> + * the caller, or we'll cause a spurious guest data abort.
> + */
> +result = MEMTX_OK;
> +*data = 0;
> +}
> +return result;
> +}
> +
> +static MemTxResult gicv3_its_write(void *opaque, hwaddr offset, uint64_t 
> data,
> +   unsigned size, MemTxAttrs attrs)
> +{
> +GICv3ITSState *s = (GICv3ITSState *)opaque;
> +MemTxResult result;
> +
> +switch (size) {
> +case 4:
> +result = its_writel(s, offset, data, attrs);
> +break;
> +case 8:
> +result = its_writell(s, offset, data, attrs);
> +break;
> +default:
> +result = MEMTX_ERROR;
> +break;
> +}
> +
> +if (result == MEMTX_ERROR) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: invalid guest write at offset " TARGET_FMT_plx
> +  "size %u\n", __func__, offset, size);
> +/*
> + * The spec requires that reserved registers are RAZ/WI;
> + * so use MEMTX_ERROR returns

Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole

2021-06-11 Thread Nir Soffer

On Fri, Jun 11, 2021 at 4:28 PM Eric Blake  wrote:
>
> On Fri, Jun 11, 2021 at 10:09:09AM +0200, Kevin Wolf wrote:
> > > Yes, that might work as well.  But we didn't previously document
> > > depth to be optional.  Removing something from output risks breaking
> > > more downstream tools that expect it to be non-optional, compared to
> > > providing a new value.
> >
> > A negative value isn't any less unexpected than a missing key. I don't
> > think any existing tool would be able to handle it. Encoding different
> > meanings in a single value isn't very QAPI-like either. Usually strings
> > that are parsed are the problem, but negative integers really isn't that
> > much different. I don't really like this solution.
> >
> > Leaving out the depth feels like a better suggestion to me.
> >
> > But anyway, this seems to only happen at the end of the backing chain.
> > So if the backing chain consistents of n images, why not report 'depth':
> > n + 1? So, in the above example, you would get 1. I think this has the
> > best chances of tools actually working correctly with the new output,
> > even though it's still not unlikely to break something.
>
> Ooh, I like that.  It is closer to reality - the file data really
> comes from the next depth, even if we have no filename at that depth.
> v2 of my patch coming up.

How do you know the number of the layer? this info is not presented in
qemu-img map output.

Users will have to run "qemu-img info --backing-chain" to understand the
output of qemu-img map.

Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.

2021-06-11 Thread Daniel P . Berrangé

On Fri, Jun 11, 2021 at 07:49:21PM +0300, Andrew Melnichenko wrote:
> Hi,
> 
> > So I think the series is for unprivileged_bpf disabled. If I'm not
> > wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> > via LSM.
> >
> 
> The main idea is to run eBPF RSS with qemu without any permission.
> Libvirt should handle everything and pass proper eBPF file descriptors.
> For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> also required, and in the future may be other permissions.
> 
> I'm not sure this is the best. We have several examples that let libvirt
> > to involve. Examples:
> >
> > 1) create TAP device (and the TUN_SETIFF)
> >
> > 2) open vhost devices
> >
> 
> Technically TAP/vhost not related to a particular qemu emulator. So common
> TAP creation should fit any modern qemu. eBPF fds(program and maps) should
> suit the interface for current qemu, g.e. some qemu builds may have
> different map
> structures or their count. It's necessary that the qemu got fds prepared by
> the helper
> that was built with the qemu.
> 
> I think we need an example on the detail steps for how libvirt is
> > expected to use this.
> >
> 
> The simplified workflow looks like this:
> 
>1. Libvirt got "emulator" from domain document.
>2. Libvirt queries for qemu capabilities.
>3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
>4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
>5. If required, the "qemu-ebpf-rss-helper" called and fds are received
>through unix fd.
>6. Those fds are for eBPF RSS, which passed to child process - qemu.
>7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".

So this basically works in the same way as the qemu bridge
helper, with the extra advantage that we can actually query
QEMU for the right helper instead of libvirt hardcoding te
helper path.  We should make your QMP query command also
return the paths for the existing QEMU helpers (bridge helper,
and pr helper) too.

Anyway, this approach is obviously viable for libvirt, since
it matches what we already do for other features.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v4 1/8] hw/intc: GICv3 ITS initial framework

2021-06-11 Thread Shashi Mallela



On Jun 11 2021, at 12:21 pm, Eric Auger  wrote:
> Hi,
>
> On 6/2/21 8:00 PM, Shashi Mallela wrote:
> > Added register definitions relevant to ITS,implemented overall
> > ITS device framework with stubs for ITS control and translater
> > regions read/write,extended ITS common to handle mmio init between
> > existing kvm device and newer qemu device.
> >
> > Signed-off-by: Shashi Mallela 
> > ---
> > hw/intc/arm_gicv3_its.c | 240 +
> > hw/intc/arm_gicv3_its_common.c | 8 +-
> > hw/intc/arm_gicv3_its_kvm.c | 2 +-
> > hw/intc/gicv3_internal.h | 88 +++--
> > hw/intc/meson.build | 1 +
> > include/hw/intc/arm_gicv3_its_common.h | 9 +-
> > 6 files changed, 331 insertions(+), 17 deletions(-)
> > create mode 100644 hw/intc/arm_gicv3_its.c
> >
> > diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
> > new file mode 100644
> > index 00..545cda3665
> > --- /dev/null
> > +++ b/hw/intc/arm_gicv3_its.c
> > @@ -0,0 +1,240 @@
> > +/*
> > + * ITS emulation for a GICv3-based system
> > + *
> > + * Copyright Linaro.org 2021
> > + *
> > + * Authors:
> > + * Shashi Mallela 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at 
> > your
> > + * option) any later version. See the COPYING file in the top-level 
> > directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "hw/qdev-properties.h"
> > +#include "hw/intc/arm_gicv3_its_common.h"
> > +#include "gicv3_internal.h"
> > +#include "qom/object.h"
> > +
> > +typedef struct GICv3ITSClass GICv3ITSClass;
> > +/* This is reusing the GICv3ITSState typedef from ARM_GICV3_ITS_COMMON */
> > +DECLARE_OBJ_CHECKERS(GICv3ITSState, GICv3ITSClass,
> > + ARM_GICV3_ITS, TYPE_ARM_GICV3_ITS)
> > +
> > +struct GICv3ITSClass {
> > + GICv3ITSCommonClass parent_class;
> > + void (*parent_reset)(DeviceState *dev);
> > +};
> > +
> > +static MemTxResult gicv3_its_translation_write(void *opaque, hwaddr offset,
> > + uint64_t data, unsigned size,
> > + MemTxAttrs attrs)
> > +{
> > + MemTxResult result = MEMTX_OK;
> > +
> > + return result;
> > +}
> > +
> > +static MemTxResult its_writel(GICv3ITSState *s, hwaddr offset,
> > + uint64_t value, MemTxAttrs attrs)
> > +{
> > + MemTxResult result = MEMTX_OK;
> > +
> > + return result;
> > +}
> > +
> > +static MemTxResult its_readl(GICv3ITSState *s, hwaddr offset,
> > + uint64_t *data, MemTxAttrs attrs)
> > +{
> > + MemTxResult result = MEMTX_OK;
> > +
> > + return result;
> > +}
> > +
> > +static MemTxResult its_writell(GICv3ITSState *s, hwaddr offset,
> > + uint64_t value, MemTxAttrs attrs)
> > +{
> > + MemTxResult result = MEMTX_OK;
> > +
> > + return result;
> > +}
> > +
> > +static MemTxResult its_readll(GICv3ITSState *s, hwaddr offset,
> > + uint64_t *data, MemTxAttrs attrs)
> > +{
> > + MemTxResult result = MEMTX_OK;
> > +
> > + return result;
> > +}
> > +
> > +static MemTxResult gicv3_its_read(void *opaque, hwaddr offset, uint64_t 
> > *data,
> > + unsigned size, MemTxAttrs attrs)
> > +{
> > + GICv3ITSState *s = (GICv3ITSState *)opaque;
> > + MemTxResult result;
> > +
> > + switch (size) {
> > + case 4:
> > + result = its_readl(s, offset, data, attrs);
> > + break;
> > + case 8:
> > + result = its_readll(s, offset, data, attrs);
> > + break;
> > + default:
> > + result = MEMTX_ERROR;
> > + break;
> > + }
> > +
> > + if (result == MEMTX_ERROR) {
> > + qemu_log_mask(LOG_GUEST_ERROR,
> > + "%s: invalid guest read at offset " TARGET_FMT_plx
> > + "size %u\n", __func__, offset, size);
> > + /*
> > + * The spec requires that reserved registers are RAZ/WI;
> > + * so use MEMTX_ERROR returns from leaf functions as a way to
> > + * trigger the guest-error logging but don't return it to
> > + * the caller, or we'll cause a spurious guest data abort.
> > + */
> > + result = MEMTX_OK;
> > + *data = 0;
> > + }
> > + return result;
> > +}
> > +
> > +static MemTxResult gicv3_its_write(void *opaque, hwaddr offset, uint64_t 
> > data,
> > + unsigned size, MemTxAttrs attrs)
> > +{
> > + GICv3ITSState *s = (GICv3ITSState *)opaque;
> > + MemTxResult result;
> > +
> > + switch (size) {
> > + case 4:
> > + result = its_writel(s, offset, data, attrs);
> > + break;
> > + case 8:
> > + result = its_writell(s, offset, data, attrs);
> > + break;
> > + default:
> > + result = MEMTX_ERROR;
> > + break;
> > + }
> > +
> > + if (result == MEMTX_ERROR) {
> > + qemu_log_mask(LOG_GUEST_ERROR,
> > + "%s: invalid guest write at offset " TARGET_FMT_plx
> > + "size %u\n", __func__, offset, size);
> > + /*
> > + * The spec requires that reserved registers are RAZ/WI;
> > + * so use MEMTX_ERROR returns from leaf functions as a way to
> > + * trigger the guest-error logging but don't return it to
> > + * the caller, or we'll cause a spurious guest data abort.
> > + */
> > + result = MEMTX_OK;
> > + }
> > + return result;
> > +}
> > +
> > +static const MemoryRegionOps gicv3_its_control_ops = {
> > + .read_with_attrs = gicv3_its_read,
> > + .write_with_attrs =

Re: [PATCH v4 00/32] block/nbd: rework client connection

2021-06-11 Thread Vladimir Sementsov-Ogievskiy


11.06.2021 18:55, Eric Blake wrote:

On Thu, Jun 10, 2021 at 01:07:30PM +0300, Vladimir Sementsov-Ogievskiy wrote:

v4:

Now based on new Paolo's patch:
Based-on: <20210609122234.544153-1-pbonz...@redhat.com>

Also, I've dropped patch 33 for now, it's too much for this series.
I'll resend it later on top of this.

The series is also available at tag up-nbd-client-connection-v4 in
git https://src.openvz.org/scm/~vsementsov/qemu.git


I think everything has R-b now, so I'll queue this through my NBD tree
including folding in the grammar tweaks where I spotted them.



Thanks a lot!

--
Best regards,
Vladimir

Re: [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.

2021-06-11 Thread Daniel P . Berrangé

On Fri, Jun 11, 2021 at 09:15:52AM -0500, Eric Blake wrote:
> On Wed, Jun 09, 2021 at 01:04:56PM +0300, Andrew Melnychenko wrote:
> > New qmp command to query ebpf helper.
> > It's crucial that qemu and helper are in sync and in touch.
> > Technically helper should pass eBPF fds that qemu may accept.
> > And different qemu's builds may have different eBPF programs and helpers.
> > Qemu returns helper that should "fit" to virtio-net.
> > 
> > Signed-off-by: Andrew Melnychenko 
> > ---
> >  monitor/qmp-cmds.c | 78 ++
> >  qapi/misc.json | 29 +
> >  2 files changed, 107 insertions(+)
> > 
> > diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> > index f7d64a6457..5dd2a58ea2 100644
> > --- a/monitor/qmp-cmds.c
> > +++ b/monitor/qmp-cmds.c
> > @@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg, 
> > Error **errp)
> >  abort();
> >  }
> >  }
> > +
> > +#ifdef CONFIG_LINUX
> > +
> > +static const char *get_dirname(char *path)
> > +{
> > +char *sep;
> > +
> > +sep = strrchr(path, '/');
> > +if (sep == path) {
> > +return "/";
> > +} else if (sep) {
> > +*sep = 0;
> > +return path;
> > +}
> > +return ".";
> > +}
> 
> Seems like this function is duplicating what glib should already be
> able to do.
> 
> > +
> > +static char *find_helper(const char *name)
> > +{
> > +char qemu_exec[PATH_MAX];
> 
> Stack-allocating a PATH_MAX array for readlink() is poor practice.
> Better is to use g_file_read_link().
> 
> > +const char *qemu_dir = NULL;
> > +char *helper = NULL;
> > +
> > +if (name == NULL) {
> > +return NULL;
> > +}
> > +
> > +if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
> > +qemu_dir = get_dirname(qemu_exec);
> > +
> > +helper = g_strdup_printf("%s/%s", qemu_dir, name);
> > +if (access(helper, F_OK) == 0) {
> > +return helper;
> > +}
> > +g_free(helper);
> > +}
> > +
> > +helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);
> 
> Could we use a compile-time determination of where we were (supposed)
> to be installed, and therefore where our helper should be installed,
> rather than the dynamic /proc/self/exe munging?

Yeah I think avoiding /proc/self/exe is desirable, because I can
imagine scenarios where this can lead to picking the wrong helper.
Better to always use the compile time install directory.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 2/6] qapi/parser: Allow empty QAPIDoc Sections

2021-06-11 Thread John Snow


On 6/11/21 10:40 AM, Markus Armbruster wrote:

John Snow  writes:


On 5/21/21 1:35 AM, Markus Armbruster wrote:

Does not fire for qga/qapi-schema.json.  Can you help?


Odd.

I did:

if self._section:
 ...
else:
 raise QAPIWhicheverErrorItWas(...)

and then did a full build and found it to fail on QGA stuff. You may
need --enable-docs to make it happen.

It later failed on test cases, too.


PEBKAC: I tested with a --disable-docs tree.  Disabled, because the
conversion to reST restored the "touch anything, rebuild everything" for
docs, which slows me down too much when I mess with the schema.

This snippet triggers the error:

 ##
 # @GuestExec:
 # @pid: pid of child process in guest OS
 #
 # Since: 2.5
 ##
 { 'struct': 'GuestExec',
   'data': { 'pid': 'int'} }

This one doesn't:

 ##
 # @GuestExec:
 #
 # @pid: pid of child process in guest OS
 #
 # Since: 2.5
 ##
 { 'struct': 'GuestExec',
   'data': { 'pid': 'int'} }

The code dealing with sections is pretty impenetrable.



Yeah, that's what I thought too. I might need (or want?) to touch this 
soon to do the cross-reference Sphinx stuff, so I figured I'd be coming 
back here "soon".


I could make a gitlab issue for me to track to remind myself to come 
back to it if you think that's acceptable.


--js

Re: [PATCH 0/4] modules: add support for target-specific modules.

2021-06-11 Thread Paolo Bonzini


On 11/06/21 10:29, Gerd Hoffmann wrote:


Are there any pending patches to handle the remaining tcg dependencies
in qemu?  When trying to build tcg modular (more than only
tcg-accel-ops*) I get lots of unresolved symbols to tcg bits which are
referenced directly (in cpu.c, gdbstub.c, monitor, ...).


I suggest that you create a wiki page with a list.  Then we can either 
see if Claudio's makefile patches tackled them, or go through them one 
by one.


Paolo

Re: tb_flush() calls causing long Windows XP boot times

2021-06-11 Thread Paolo Bonzini


On 11/06/21 17:01, Programmingkid wrote:

Hello Alex,

The good news is the source code to Windows XP is available 
online:https://github.com/cryptoAlgorithm/nt5src


It's leaked, so I doubt anybody who's paid to work on Linux or QEMU 
would touch that with a ten-foot pole.


Paolo

Re: [PATCH v2] Test comment for git-publish

2021-06-11 Thread Mauro Matteo Cascella

On Fri, Jun 11, 2021 at 6:43 PM Mauro Matteo Cascella
 wrote:
>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index 84ae8024fc..e229c19564 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -427,7 +427,7 @@ static void pvrdma_regs_write(void *opaque, hwaddr addr, 
> uint64_t val,
>  case PVRDMA_REG_REQUEST:
>  if (val == 0) {
>  trace_pvrdma_regs_write(addr, val, "REQUEST", "");
> -pvrdma_exec_cmd(dev);
> +pvrdma_exec_cmd(dev); // this is a test comment
>  }
>  break;
>  default:
> --
> 2.31.1
>

Again, sorry for the spam. Can someone please explain how to *not* use
the profiles defined in .gitpublish? I used the following command,
evidently with no success:

$ git publish --override-cc --override-to --to=mcasc...@redhat.com

-- 
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0

Re: [PATCH] configure: map x32 to cpu_family x86_64 for meson

2021-06-11 Thread Paolo Bonzini


On 09/06/21 14:28, David Michael wrote:

The meson.build file defines supported_cpus which does not contain
x32, and x32 is not one of meson's stable built-in values:
https://mesonbuild.com/Reference-tables.html#cpu-families

Signed-off-by: David Michael 
---

Hi,

QEMU fails to build for x32 due to that cpu_family not being defined in
supported_cpus.  Can something like this be applied?

Alternatively, maybe it could be added to supported_cpus and accepted
everywhere that matches x86 in meson.build, but upstream meson does not
define a CPU type for x32.

Thanks.

David

Queued, thanks.

Paolo


  configure | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 8dcb9965b2..4478f3889a 100755
--- a/configure
+++ b/configure
@@ -6384,7 +6384,7 @@ if test "$skip_meson" = no; then
  i386)
  echo "cpu_family = 'x86'" >> $cross
  ;;
-x86_64)
+x86_64|x32)
  echo "cpu_family = 'x86_64'" >> $cross
  ;;
  ppc64le)

Re: [PATCH v2] semihosting/arm-compat: remove heuristic softmmu SYS_HEAPINFO

2021-06-11 Thread Alex Bennée



Peter Maydell  writes:

> On Thu, 10 Jun 2021 at 15:16, Alex Bennée  wrote:
>>
>>
>> Peter Maydell  writes:
>> > I'm told that the Arm C compiler C library always assumes that
>> > the "stack base" value is what it should set SP to, so reporting 0
>> > for that will break binaries that were built with it.
>> >
>> > As the TODO comment notes, the "heap base" is a bit of a guess,
>> > but putting stackbase at top-of-RAM seems generally sensible.
>> >
>> > What bug are we trying to fix here?
>>
>> Having newlib use a value that's wrong and therefor plant it's heap in
>> the middle of the loaded code.
>>
>> > I think one possible implementation that might not be too
>> > hard to make work would be:
>> >
>> >  (1) find the guest physical address of the main machine
>> >  RAM (machine->ram). You can do this with flatview_for_each_range()
>> >  similar to what rom_ptr_for_as() does. (It might be mapped
>> >  more than once, we could just pick the first one.)
>>
>> Currently this is done by common_semi_find_region_base which pokes
>> around get_system_memory()->subregions to find a region containing an
>> initialised register pointer.
>
> Yes. I am suggesting we throw that code away, since (a) assuming
> any register happens to point in to the main RAM is dubious and
> (b) iterating through the subregions of get_system_memory() is
> not guaranteed to work either (consider the case where the system
> memory is inside a container MR rather than a direct child of the
> system memory MR).
>
>> >  (2) find the largest contiguous extent of that RAM which
>> >  is not covered by a ROM blob, by iterating through the
>> >  ROM blob data. (This sounds like one of those slightly
>> >  irritating but entirely tractable algorithms questions :-))
>>
>> Does that assume that any rom blob (so anything from -kernel, -pflash or
>> -generic-loader?) will have also included space for guest data and bss?
>
> Yes; the elf loader code creates rom blobs whose rom->romsize
> covers both initialized data from the ELF file and space to
> be zeroed.

Hmm I'm not seeing the RAM get bifurcated by the loader. The flatview
only has one RAM block in my test case and it covers the whole of RAM.

  Semihosting Heap Info Test
  find_heap_cb: rom:1 romd_mode:1 ram:0 
/:400
  find_heap_cb: rom:1 romd_mode:1 ram:0 
0400/0400:400
  find_heap_cb: rom:0 romd_mode:1 ram:0 0800/0800:1000
  find_heap_cb: rom:0 romd_mode:1 ram:0 0801/0801:2000
  find_heap_cb: rom:0 romd_mode:1 ram:0 0802/0802:1000
  find_heap_cb: rom:0 romd_mode:1 ram:0 0900/0900:1000
  find_heap_cb: rom:0 romd_mode:1 ram:0 0901/0901:1000
  find_heap_cb: rom:0 romd_mode:1 ram:0 0902/0902:8
  find_heap_cb: rom:0 romd_mode:1 ram:0 09020008/09020008:2
  find_heap_cb: rom:0 romd_mode:1 ram:0 09020010/09020010:8
  find_heap_cb: rom:0 romd_mode:1 ram:0 0903/0903:1000
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a00/0a00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a000200/0a000200:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a000400/0a000400:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a000600/0a000600:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a000800/0a000800:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a000a00/0a000a00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a000c00/0a000c00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a000e00/0a000e00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001000/0a001000:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001200/0a001200:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001400/0a001400:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001600/0a001600:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001800/0a001800:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001a00/0a001a00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001c00/0a001c00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a001e00/0a001e00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002000/0a002000:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002200/0a002200:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002400/0a002400:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002600/0a002600:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002800/0a002800:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002a00/0a002a00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002c00/0a002c00:200
  find_heap_cb: rom:0 romd_mode:1 ram:0 0a002e00/0a002e00:200

Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.

2021-06-11 Thread Andrew Melnichenko

Hi,

> So I think the series is for unprivileged_bpf disabled. If I'm not
> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> via LSM.
>

The main idea is to run eBPF RSS with qemu without any permission.
Libvirt should handle everything and pass proper eBPF file descriptors.
For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
also required, and in the future may be other permissions.

I'm not sure this is the best. We have several examples that let libvirt
> to involve. Examples:
>
> 1) create TAP device (and the TUN_SETIFF)
>
> 2) open vhost devices
>

Technically TAP/vhost not related to a particular qemu emulator. So common
TAP creation should fit any modern qemu. eBPF fds(program and maps) should
suit the interface for current qemu, g.e. some qemu builds may have
different map
structures or their count. It's necessary that the qemu got fds prepared by
the helper
that was built with the qemu.

I think we need an example on the detail steps for how libvirt is
> expected to use this.
>

The simplified workflow looks like this:

   1. Libvirt got "emulator" from domain document.
   2. Libvirt queries for qemu capabilities.
   3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
   4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
   5. If required, the "qemu-ebpf-rss-helper" called and fds are received
   through unix fd.
   6. Those fds are for eBPF RSS, which passed to child process - qemu.
   7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".


On Fri, Jun 11, 2021 at 8:36 AM Jason Wang  wrote:

>
> 在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang  wrote:
> >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> >>> Libvirt usually launches qemu with strict permissions.
> >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
> >> A silly question:
> >>
> >> Kernel had the following permission checks in bpf syscall:
> >>
> >>  if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> >>   return -EPERM;
> >> ...
> >>
> >>   err = security_bpf(cmd, , size);
> >>   if (err < 0)
> >>   return err;
> >>
> >> So if I understand the code correctly, bpf syscall can only be done if:
> >>
> >> 1) unprivileged_bpf is enabled or
> >> 2) has the capability  and pass the LSM checks
> >>
> >> So I think the series is for unprivileged_bpf disabled. If I'm not
> >> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> >> via LSM.
> >>
> >> If this is correct, need to describe it in the commit log.
> >>
> >>
> >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
> >>> initialize eBPF RSS context with passed program & maps fds.
> >>>
> >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> >>> context and passes fds through unix socket.
> >>> Libvirt should call the helper and pass fds to qemu through
> >>> "ebpf_rss_fds" property.
> >>>
> >>> Added explicit target OS check for libbpf dependency in meson.
> >>> eBPF RSS works only with Linux TAP, so there is no reason to
> >>> build eBPF loader/helper for non-Linux.
> >>>
> >>> Overall, libvirt process should not be aware of the "interface"
> >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> >>> their quantity.
> >> I'm not sure this is the best. We have several examples that let libvirt
> >> to involve. Examples:
> >>
> >> 1) create TAP device (and the TUN_SETIFF)
> >>
> >> 2) open vhost devices
> >>
> >>
> >>>That's why qemu and the helper should be from
> >>> the same build and be "synchronized". Technically each qemu may
> >>> have its own helper. That's why "query-helper-paths" qmp command
> >>> was added. Qemu should return the path to the helper that suits
> >>> and libvirt should use "that" helper for "that" emulator.
> >>>
> >>> qmp sample:
> >>> C: { "execute": "query-helper-paths" }
> >>> S: { "return": [
> >>>{
> >>>  "name": "qemu-ebpf-rss-helper",
> >>>  "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> >>>}
> >>>   ]
> >>>  }
> >> I think we need an example on the detail steps for how libvirt is
> >> expected to use this.
> > The preliminary patches for libvirt are at
> > https://github.com/daynix/libvirt/tree/RSSv1
>
>
> Will have a look but it would be better if the assumption of the
> management is detailed here to ease the reviewers.
>
> Thanks
>
>
> >
>
>

Re: [PATCH v3] docs/devel: Explain in more detail the TB chaining mechanisms

2021-06-11 Thread Richard Henderson


On 6/1/21 5:51 AM, Luis Pires wrote:

Signed-off-by: Luis Pires
---
v3:
  - Dropped "most common" from the sentence introducing the chaining mechanisms
  - Changed wording about using the TB address returned by exit_tb


Thanks, queued.


r~

1 2 3 >

1 - 100 of 209 matches

Mail list logo