Re: [PATCH v6 5/8] device/virtio-nsm: Support for Nitro Secure Module device

2024-09-15 Thread Michael S. Tsirkin
On Fri, Sep 06, 2024 at 01:57:32AM +0600, Dorjoy Chowdhury wrote:
> +const struct nsm_cmd nsm_cmds[] = {
> +{ "GetRandom",   CBOR_ROOT_TYPE_STRING,  handle_GetRandom },
> +{ "DescribeNSM", CBOR_ROOT_TYPE_STRING,  handle_DescribeNSM },
> +{ "DescribePCR", CBOR_ROOT_TYPE_MAP, handle_DescribePCR },
> +{ "ExtendPCR",   CBOR_ROOT_TYPE_MAP, handle_ExtendPCR },
> +{ "LockPCR", CBOR_ROOT_TYPE_MAP, handle_LockPCR },
> +{ "LockPCRs",CBOR_ROOT_TYPE_MAP, handle_LockPCRs },
> +{ "Attestation", CBOR_ROOT_TYPE_MAP, handle_Attestation },
> +};

I think we should stick to the coding style and avoid camel case
for functions. I know, it is tempting to stick to what
some spec says, but they are all inconsistent. Put the spec
name in a code comment before the function, should be
good enough.




Re: [PATCH v6 5/8] device/virtio-nsm: Support for Nitro Secure Module device

2024-09-15 Thread Michael S. Tsirkin
On Mon, Sep 16, 2024 at 01:46:52AM +0600, Dorjoy Chowdhury wrote:
> > +    len = cbor_serialize(root, response->iov_base, response->iov_len);
> 
> As far as I can tell, all these also need to be switched to use
> iov_from_buf.
> 
> 
> Sorry I didn't understand this. The iovecs passed in these functions are not
> the iovecs from virtqueue. We make an iovec for the response and then pass it
> down. We do the "iov_from_buf" after calling "get_nsm_request_response" in
> "handle_input" function. Am I missing something?
> 
> Regards,
> Dorjoy


Oh, I misunderstood. Passing in a pointer and length might be clearer.
Not critical.

-- 
MST




Re: [PATCH v6 5/8] device/virtio-nsm: Support for Nitro Secure Module device

2024-09-15 Thread Michael S. Tsirkin
On Fri, Sep 06, 2024 at 01:57:32AM +0600, Dorjoy Chowdhury wrote:
> Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves[2]
> for stripped down TPM functionality like cryptographic attestation.
> The requests to and responses from NSM device are CBOR[3] encoded.
> 
> This commit adds support for NSM device in QEMU. Although related to
> AWS Nitro Enclaves, the virito-nsm device is independent and can be
> used in other machine types as well. The libcbor[4] library has been
> used for the CBOR encoding and decoding functionalities.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
> [2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> [3] http://cbor.io/
> [4] https://libcbor.readthedocs.io/en/latest/
> 
> Signed-off-by: Dorjoy Chowdhury 
> ---
>  MAINTAINERS  |   10 +
>  hw/virtio/Kconfig|5 +
>  hw/virtio/cbor-helpers.c |  326 ++
>  hw/virtio/meson.build|6 +
>  hw/virtio/virtio-nsm-pci.c   |   73 ++
>  hw/virtio/virtio-nsm.c   | 1665 ++
>  include/hw/virtio/cbor-helpers.h |   46 +
>  include/hw/virtio/virtio-nsm.h   |   59 ++
>  meson.build  |2 +
>  9 files changed, 2192 insertions(+)
>  create mode 100644 hw/virtio/cbor-helpers.c
>  create mode 100644 hw/virtio/virtio-nsm-pci.c
>  create mode 100644 hw/virtio/virtio-nsm.c
>  create mode 100644 include/hw/virtio/cbor-helpers.h
>  create mode 100644 include/hw/virtio/virtio-nsm.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c14ac014e2..b371c24747 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2342,6 +2342,16 @@ F: include/sysemu/rng*.h
>  F: backends/rng*.c
>  F: tests/qtest/virtio-rng-test.c
>  
> +virtio-nsm
> +M: Alexander Graf 
> +M: Dorjoy Chowdhury 
> +S: Maintained
> +F: hw/virtio/cbor-helpers.c
> +F: hw/virtio/virtio-nsm.c
> +F: hw/virtio/virtio-nsm-pci.c
> +F: include/hw/virtio/cbor-helpers.h
> +F: include/hw/virtio/virtio-nsm.h
> +
>  vhost-user-stubs
>  M: Alex Bennée 
>  S: Maintained
> diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
> index aa63ff7fd4..29fee32035 100644
> --- a/hw/virtio/Kconfig
> +++ b/hw/virtio/Kconfig
> @@ -6,6 +6,11 @@ config VIRTIO_RNG
>  default y
>  depends on VIRTIO
>  
> +config VIRTIO_NSM
> +   bool
> +   default y
> +   depends on VIRTIO
> +
>  config VIRTIO_IOMMU
>  bool
>  default y
> diff --git a/hw/virtio/cbor-helpers.c b/hw/virtio/cbor-helpers.c
> new file mode 100644
> index 00..a0e58d6862
> --- /dev/null
> +++ b/hw/virtio/cbor-helpers.c
> @@ -0,0 +1,326 @@
> +/*
> + * QEMU CBOR helpers
> + *
> + * Copyright (c) 2024 Dorjoy Chowdhury 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#include "hw/virtio/cbor-helpers.h"
> +
> +bool qemu_cbor_map_add(cbor_item_t *map, cbor_item_t *key, cbor_item_t 
> *value)
> +{
> +bool success = false;
> +struct cbor_pair pair = (struct cbor_pair) {
> +.key = cbor_move(key),
> +.value = cbor_move(value)
> +};
> +
> +success = cbor_map_add(map, pair);
> +if (!success) {
> +cbor_incref(pair.key);
> +cbor_incref(pair.value);
> +}
> +
> +return success;
> +}
> +
> +bool qemu_cbor_array_push(cbor_item_t *array, cbor_item_t *value)
> +{
> +bool success = false;
> +
> +success = cbor_array_push(array, cbor_move(value));
> +if (!success) {
> +cbor_incref(value);
> +}
> +
> +return success;
> +}
> +
> +bool qemu_cbor_add_bool_to_map(cbor_item_t *map, const char *key, bool value)
> +{
> +cbor_item_t *key_cbor = NULL;
> +cbor_item_t *value_cbor = NULL;
> +
> +key_cbor = cbor_build_string(key);
> +if (!key_cbor) {
> +goto cleanup;
> +}
> +value_cbor = cbor_build_bool(value);
> +if (!value_cbor) {
> +goto cleanup;
> +}
> +if (!qemu_cbor_map_add(map, key_cbor, value_cbor)) {
> +goto cleanup;
> +}
> +
> +return true;
> +
> + cleanup:
> +if (key_cbor) {
> +cbor_decref(&key_cbor);
> +}
> +if (value_cbor) {
> +cbor_decref(&value_cbor);
> +}
> +return false;
> +}
> +
> +bool qemu_cbor_add_uint8_to_map(cbor_item_t *map, const char *key,
> +uint8_t value)
> +{
> +cbor_item_t *key_cbor = NULL;
> +cbor_item_t *value_cbor = NULL;
> +
> +key_cbor = cbor_build_string(key);
> +if (!key_cbor) {
> +goto cleanup;
> +}
> +value_cbor = cbor_build_uint8(value);
> +if (!value_cbor) {
> +goto cleanup;
> +}
> +if (!qemu_cbor_map_add(map, key_cbor, value_cbor)) {
> +goto cleanup;
> +}
> +
> +return true;
> +
> + cleanup:
> +if (key_cbor) {
> +cbor_decref(&key_cbor);
> +}
> +if (value_cbor) {
> +cbor_decref(&value_cbor);
> +

[PULL 10/18] tests/acpi: pc: allow DSDT acpi table changes

2024-09-11 Thread Michael S. Tsirkin
From: Ricardo Ribalda 

Signed-off-by: Ricardo Ribalda 
Message-Id: <20240814115736.1580337-2-riba...@chromium.org>
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..f81f4e2469 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,16 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/x86/pc/DSDT",
+"tests/data/acpi/x86/pc/DSDT.acpierst",
+"tests/data/acpi/x86/pc/DSDT.acpihmat",
+"tests/data/acpi/x86/pc/DSDT.bridge",
+"tests/data/acpi/x86/pc/DSDT.cphp",
+"tests/data/acpi/x86/pc/DSDT.dimmpxm",
+"tests/data/acpi/x86/pc/DSDT.hpbridge",
+"tests/data/acpi/x86/pc/DSDT.hpbrroot",
+"tests/data/acpi/x86/pc/DSDT.ipmikcs",
+"tests/data/acpi/x86/pc/DSDT.memhp",
+"tests/data/acpi/x86/pc/DSDT.nohpet",
+"tests/data/acpi/x86/pc/DSDT.numamem",
+"tests/data/acpi/x86/pc/DSDT.roothp",
+"tests/data/acpi/x86/q35/DSDT.cxl",
+"tests/data/acpi/x86/q35/DSDT.viot",
-- 
MST




[PULL 09/18] intel_iommu: Make PASID-cache and PIOTLB type invalid in legacy mode

2024-09-11 Thread Michael S. Tsirkin
From: Zhenzhong Duan 

In vtd_process_inv_desc(), VTD_INV_DESC_PC and VTD_INV_DESC_PIOTLB are
bypassed without scalable mode check. These two types are not valid
in legacy mode and we should report error.

Fixes: 4a4f219e8a10 ("intel_iommu: add scalable-mode option to make scalable 
mode work")
Suggested-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Clément Mathieu--Drif
Reviewed-by: Yi Liu 
Message-Id: <20240814071321.2621384-3-zhenzhong.d...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/intel_iommu.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 68cb72a481..90cd4e5044 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2763,17 +2763,6 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
-/*
- * TODO: the entity of below two cases will be implemented in future 
series.
- * To make guest (which integrates scalable mode support patch set in
- * iommu driver) work, just return true is enough so far.
- */
-case VTD_INV_DESC_PC:
-break;
-
-case VTD_INV_DESC_PIOTLB:
-break;
-
 case VTD_INV_DESC_WAIT:
 trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
 if (!vtd_process_wait_desc(s, &inv_desc)) {
@@ -2795,6 +2784,17 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
+/*
+ * TODO: the entity of below two cases will be implemented in future 
series.
+ * To make guest (which integrates scalable mode support patch set in
+ * iommu driver) work, just return true is enough so far.
+ */
+case VTD_INV_DESC_PC:
+case VTD_INV_DESC_PIOTLB:
+if (s->scalable_mode) {
+break;
+}
+/* fallthrough */
 default:
 error_report_once("%s: invalid inv desc: hi=%"PRIx64", lo=%"PRIx64
   " (unknown type)", __func__, inv_desc.hi,
-- 
MST




[PULL 03/18] hw: Move declaration of IRQState to header and add init function

2024-09-11 Thread Michael S. Tsirkin
From: BALATON Zoltan 

To allow embedding a qemu_irq in a struct move its definition to the
header and add a function to init it in place without allocating it.

Signed-off-by: BALATON Zoltan 
Message-Id: 

Signed-off-by: BALATON Zoltan 
---
 include/hw/irq.h | 18 ++
 hw/core/irq.c| 25 +++--
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/hw/irq.h b/include/hw/irq.h
index 645b73d251..c861c1debd 100644
--- a/include/hw/irq.h
+++ b/include/hw/irq.h
@@ -1,9 +1,20 @@
 #ifndef QEMU_IRQ_H
 #define QEMU_IRQ_H
 
+#include "qom/object.h"
+
 /* Generic IRQ/GPIO pin infrastructure.  */
 
 #define TYPE_IRQ "irq"
+OBJECT_DECLARE_SIMPLE_TYPE(IRQState, IRQ)
+
+struct IRQState {
+Object parent_obj;
+
+qemu_irq_handler handler;
+void *opaque;
+int n;
+};
 
 void qemu_set_irq(qemu_irq irq, int level);
 
@@ -23,6 +34,13 @@ static inline void qemu_irq_pulse(qemu_irq irq)
 qemu_set_irq(irq, 0);
 }
 
+/*
+ * Init a single IRQ. The irq is assigned with a handler, an opaque data
+ * and the interrupt number.
+ */
+void qemu_init_irq(IRQState *irq, qemu_irq_handler handler, void *opaque,
+   int n);
+
 /* Returns an array of N IRQs. Each IRQ is assigned the argument handler and
  * opaque data.
  */
diff --git a/hw/core/irq.c b/hw/core/irq.c
index 3f14e2dda7..db95ffc18f 100644
--- a/hw/core/irq.c
+++ b/hw/core/irq.c
@@ -26,16 +26,6 @@
 #include "hw/irq.h"
 #include "qom/object.h"
 
-OBJECT_DECLARE_SIMPLE_TYPE(IRQState, IRQ)
-
-struct IRQState {
-Object parent_obj;
-
-qemu_irq_handler handler;
-void *opaque;
-int n;
-};
-
 void qemu_set_irq(qemu_irq irq, int level)
 {
 if (!irq)
@@ -44,6 +34,15 @@ void qemu_set_irq(qemu_irq irq, int level)
 irq->handler(irq->opaque, irq->n, level);
 }
 
+void qemu_init_irq(IRQState *irq, qemu_irq_handler handler, void *opaque,
+   int n)
+{
+object_initialize(irq, sizeof(*irq), TYPE_IRQ);
+irq->handler = handler;
+irq->opaque = opaque;
+irq->n = n;
+}
+
 qemu_irq *qemu_extend_irqs(qemu_irq *old, int n_old, qemu_irq_handler handler,
void *opaque, int n)
 {
@@ -69,10 +68,8 @@ qemu_irq qemu_allocate_irq(qemu_irq_handler handler, void 
*opaque, int n)
 {
 IRQState *irq;
 
-irq = IRQ(object_new(TYPE_IRQ));
-irq->handler = handler;
-irq->opaque = opaque;
-irq->n = n;
+irq = g_new(IRQState, 1);
+qemu_init_irq(irq, handler, opaque, n);
 
 return irq;
 }
-- 
MST




[PULL 05/18] pci: don't skip function 0 occupancy verification for devfn auto assign

2024-09-11 Thread Michael S. Tsirkin
From: Dongli Zhang 

When the devfn is already assigned in the command line, the
do_pci_register_device() may verify if the function 0 is already occupied.

However, when devfn < 0, the verification is skipped because it is part of
the last "else if".

For instance, suppose there is already a device at addr=00.00 of a port.

-device pcie-root-port,bus=pcie.0,chassis=115,id=port01,addr=0e.00 \
-device virtio-net-pci,bus=port01,id=vnet01,addr=00.00 \

When 'addr' is specified for the 2nd device, the hotplug is denied.

(qemu) device_add virtio-net-pci,bus=port01,id=vnet02,addr=01.00
Error: PCI: slot 0 function 0 already occupied by virtio-net-pci, new func 
virtio-net-pci cannot be exposed to guest.

When 'addr' is automatically assigned, the hotplug is not denied. This is
because the verification is skipped.

(qemu) device_add virtio-net-pci,bus=port01,id=vnet02
warning: PCI: slot 1 is not valid for virtio-net-pci, parent device only allows 
plugging into slot 0.

Fix the issue by moving the verification into an independent 'if'
statement.

Fixes: 3f1e1478db2d ("enable multi-function hot-add")
Reported-by: Aswin Unnikrishnan 
Signed-off-by: Dongli Zhang 
Message-Id: <20240708041056.54504-1-dongli.zh...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pci.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d2caf3ee8b..87da35ca9b 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1181,14 +1181,15 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev,
PCI_SLOT(devfn), PCI_FUNC(devfn), name,
bus->devices[devfn]->name, bus->devices[devfn]->qdev.id);
 return NULL;
-} /*
-   * Populating function 0 triggers a scan from the guest that
-   * exposes other non-zero functions. Hence we need to ensure that
-   * function 0 wasn't added yet.
-   */
-else if (dev->hotplugged &&
- !pci_is_vf(pci_dev) &&
- pci_get_function_0(pci_dev)) {
+}
+
+/*
+ * Populating function 0 triggers a scan from the guest that
+ * exposes other non-zero functions. Hence we need to ensure that
+ * function 0 wasn't added yet.
+ */
+if (dev->hotplugged && !pci_is_vf(pci_dev) &&
+pci_get_function_0(pci_dev)) {
 error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
" new func %s cannot be exposed to guest.",
PCI_SLOT(pci_get_function_0(pci_dev)->devfn),
-- 
MST




[PULL 08/18] intel_iommu: Fix invalidation descriptor type field

2024-09-11 Thread Michael S. Tsirkin
From: Zhenzhong Duan 

According to spec, invalidation descriptor type is 7bits which is
concatenation of bits[11:9] and bits[3:0] of invalidation descriptor.

Currently we only pick bits[3:0] as the invalidation type and treat
bits[11:9] as reserved zero. This is not a problem for now as bits[11:9]
is zero for all current invalidation types. But it will break if newer
type occupies bits[11:9].

Fix it by taking bits[11:9] into type and make reserved bits check accurate.

Suggested-by: Clément Mathieu--Drif
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Yi Liu 
Reviewed-by: Clément Mathieu--Drif
Message-Id: <20240814071321.2621384-2-zhenzhong.d...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/intel_iommu_internal.h | 11 ++-
 hw/i386/intel_iommu.c  |  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5f32c36943..13d5d129ae 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -356,7 +356,8 @@ union VTDInvDesc {
 typedef union VTDInvDesc VTDInvDesc;
 
 /* Masks for struct VTDInvDesc */
-#define VTD_INV_DESC_TYPE   0xf
+#define VTD_INV_DESC_TYPE(val)  val) >> 5) & 0x70ULL) | \
+ ((val) & 0xfULL))
 #define VTD_INV_DESC_CC 0x1 /* Context-cache Invalidate Desc */
 #define VTD_INV_DESC_IOTLB  0x2
 #define VTD_INV_DESC_DEVICE 0x3
@@ -372,7 +373,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_WAIT_IF(1ULL << 4)
 #define VTD_INV_DESC_WAIT_FN(1ULL << 6)
 #define VTD_INV_DESC_WAIT_DATA_SHIFT32
-#define VTD_INV_DESC_WAIT_RSVD_LO   0Xff80ULL
+#define VTD_INV_DESC_WAIT_RSVD_LO   0Xf180ULL
 #define VTD_INV_DESC_WAIT_RSVD_HI   3ULL
 
 /* Masks for Context-cache Invalidation Descriptor */
@@ -383,7 +384,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_CC_DID(val)(((val) >> 16) & VTD_DOMAIN_ID_MASK)
 #define VTD_INV_DESC_CC_SID(val)(((val) >> 32) & 0xUL)
 #define VTD_INV_DESC_CC_FM(val) (((val) >> 48) & 3UL)
-#define VTD_INV_DESC_CC_RSVD0xfffcffc0ULL
+#define VTD_INV_DESC_CC_RSVD0xfffcf1c0ULL
 
 /* Masks for IOTLB Invalidate Descriptor */
 #define VTD_INV_DESC_IOTLB_G(3ULL << 4)
@@ -393,7 +394,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_IOTLB_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK)
 #define VTD_INV_DESC_IOTLB_ADDR(val)((val) & ~0xfffULL)
 #define VTD_INV_DESC_IOTLB_AM(val)  ((val) & 0x3fULL)
-#define VTD_INV_DESC_IOTLB_RSVD_LO  0xff00ULL
+#define VTD_INV_DESC_IOTLB_RSVD_LO  0xf100ULL
 #define VTD_INV_DESC_IOTLB_RSVD_HI  0xf80ULL
 #define VTD_INV_DESC_IOTLB_PASID_PASID  (2ULL << 4)
 #define VTD_INV_DESC_IOTLB_PASID_PAGE   (3ULL << 4)
@@ -406,7 +407,7 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_DEVICE_IOTLB_SIZE(val) ((val) & 0x1)
 #define VTD_INV_DESC_DEVICE_IOTLB_SID(val) (((val) >> 32) & 0xULL)
 #define VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI 0xffeULL
-#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffe0fff8
+#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffe0f1f0
 
 /* Rsvd field masks for spte */
 #define VTD_SPTE_SNP 0x800ULL
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 16d2885fcc..68cb72a481 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2744,7 +2744,7 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 return false;
 }
 
-desc_type = inv_desc.lo & VTD_INV_DESC_TYPE;
+desc_type = VTD_INV_DESC_TYPE(inv_desc.lo);
 /* FIXME: should update at first or at last? */
 s->iq_last_desc_type = desc_type;
 
-- 
MST




[PULL 18/18] hw/acpi/ich9: Add periodic and swsmi timer

2024-09-11 Thread Michael S. Tsirkin
From: Dominic Prinz 

This patch implements the periodic and the swsmi ICH9 chipset timers. They are
especially useful when prototyping UEFI firmware (e.g. with EDK2's OVMF)
using QEMU.

For backwards compatibility, the compat properties "x-smi-swsmi-timer",
and "x-smi-periodic-timer" are introduced.

Additionally, writes to the SMI_STS register are enabled for the
corresponding two bits using a write mask to make future work easier.

Signed-off-by: Dominic Prinz 
Message-Id: 
<1d90ea69e01ab71a0f2ced116801dc78e04f4448.1725991505.git....@dprinz.de>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/ich9.h|  6 +++
 include/hw/acpi/ich9_timer.h  | 23 +
 include/hw/southbridge/ich9.h |  4 ++
 hw/acpi/ich9.c| 23 +
 hw/acpi/ich9_timer.c  | 93 +++
 hw/i386/pc.c  |  5 +-
 hw/isa/lpc_ich9.c | 14 ++
 hw/acpi/meson.build   |  2 +-
 8 files changed, 168 insertions(+), 2 deletions(-)
 create mode 100644 include/hw/acpi/ich9_timer.h
 create mode 100644 hw/acpi/ich9_timer.c

diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index 2faf7f0cae..245fe08dc2 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -46,6 +46,7 @@ typedef struct ICH9LPCPMRegs {
 uint32_t smi_en;
 uint32_t smi_en_wmask;
 uint32_t smi_sts;
+uint32_t smi_sts_wmask;
 
 qemu_irq irq;  /* SCI */
 
@@ -68,6 +69,11 @@ typedef struct ICH9LPCPMRegs {
 bool smm_compat;
 bool enable_tco;
 TCOIORegs tco_regs;
+
+bool swsmi_timer_enabled;
+bool periodic_timer_enabled;
+QEMUTimer *swsmi_timer;
+QEMUTimer *periodic_timer;
 } ICH9LPCPMRegs;
 
 #define ACPI_PM_PROP_TCO_ENABLED "enable_tco"
diff --git a/include/hw/acpi/ich9_timer.h b/include/hw/acpi/ich9_timer.h
new file mode 100644
index 00..5112df4385
--- /dev/null
+++ b/include/hw/acpi/ich9_timer.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU ICH9 Timer emulation
+ *
+ * Copyright (c) 2024 Dominic Prinz 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_ACPI_ICH9_TIMER_H
+#define HW_ACPI_ICH9_TIMER_H
+
+#include "hw/acpi/ich9.h"
+
+void ich9_pm_update_swsmi_timer(ICH9LPCPMRegs *pm, bool enable);
+
+void ich9_pm_swsmi_timer_init(ICH9LPCPMRegs *pm);
+
+void ich9_pm_update_periodic_timer(ICH9LPCPMRegs *pm, bool enable);
+
+void ich9_pm_periodic_timer_init(ICH9LPCPMRegs *pm);
+
+#endif
diff --git a/include/hw/southbridge/ich9.h b/include/hw/southbridge/ich9.h
index fd01649d04..6c60017024 100644
--- a/include/hw/southbridge/ich9.h
+++ b/include/hw/southbridge/ich9.h
@@ -196,8 +196,12 @@ struct ICH9LPCState {
 #define ICH9_PMIO_GPE0_LEN  16
 #define ICH9_PMIO_SMI_EN0x30
 #define ICH9_PMIO_SMI_EN_APMC_EN(1 << 5)
+#define ICH9_PMIO_SMI_EN_SWSMI_EN   (1 << 6)
 #define ICH9_PMIO_SMI_EN_TCO_EN (1 << 13)
+#define ICH9_PMIO_SMI_EN_PERIODIC_EN(1 << 14)
 #define ICH9_PMIO_SMI_STS   0x34
+#define ICH9_PMIO_SMI_STS_SWSMI_STS (1 << 6)
+#define ICH9_PMIO_SMI_STS_PERIODIC_STS  (1 << 14)
 #define ICH9_PMIO_TCO_RLD   0x60
 #define ICH9_PMIO_TCO_LEN   32
 
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 02d8546bd3..c15e5b8281 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -35,6 +35,7 @@
 #include "sysemu/runstate.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/ich9_tco.h"
+#include "hw/acpi/ich9_timer.h"
 
 #include "hw/southbridge/ich9.h"
 #include "hw/mem/pc-dimm.h"
@@ -108,6 +109,18 @@ static void ich9_smi_writel(void *opaque, hwaddr addr, 
uint64_t val,
 }
 pm->smi_en &= ~pm->smi_en_wmask;
 pm->smi_en |= (val & pm->smi_en_wmask);
+if (pm->swsmi_timer_enabled) {
+ich9_pm_update_swsmi_timer(pm, pm->smi_en &
+   ICH9_PMIO_SMI_EN_SWSMI_EN);
+}
+if (pm->periodic_timer_enabled) {
+ich9_pm_update_periodic_timer(pm, pm->smi_en &
+  
ICH9_PMIO_SMI_EN_PERIODIC_EN);
+}
+break;
+case 4:
+pm->smi_sts &= ~pm->smi_sts_wmask;
+pm->smi_sts |= (val & pm->smi_sts_wmask);
 break;
 }
 }
@@ -286,6 +299,8 @@ static void pm_powerdown_req(Notifier *n, void *opaque)
 
 void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, qemu_irq sci_irq)
 {
+pm->smi_sts_wmask = 0;
+
 memory_region_init(&pm->io, OBJECT(lpc_pci), "ich9-pm", ICH9_PMIO_SIZE);
 memory_region_set_enabled(&pm->io, false);
 memor

[PULL 17/18] virtio-mem: don't warn about THP sizes on a kernel without THP support

2024-09-11 Thread Michael S. Tsirkin
From: David Hildenbrand 

If the config directory in sysfs does not exist at all, we are dealing
with a system that does not support THPs. Simply use 1 MiB block size
then, instead of warning "Could not detect THP size, falling back to
..." and falling back to the default THP size.

Cc: "Michael S. Tsirkin" 
Cc: Gavin Shan 
Cc: Juraj Marcin 
Signed-off-by: David Hildenbrand 
Message-Id: <20240910163433.2100295-1-da...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-mem.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index ef64bf1b4a..4075f3d4ce 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -88,6 +88,7 @@ static uint32_t virtio_mem_default_thp_size(void)
 static uint32_t thp_size;
 
 #define HPAGE_PMD_SIZE_PATH 
"/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
+#define HPAGE_PATH "/sys/kernel/mm/transparent_hugepage/"
 static uint32_t virtio_mem_thp_size(void)
 {
 gchar *content = NULL;
@@ -98,6 +99,12 @@ static uint32_t virtio_mem_thp_size(void)
 return thp_size;
 }
 
+/* No THP -> no restrictions. */
+if (!g_file_test(HPAGE_PATH, G_FILE_TEST_EXISTS)) {
+thp_size = VIRTIO_MEM_MIN_BLOCK_SIZE;
+return thp_size;
+}
+
 /*
  * Try to probe the actual THP size, fallback to (sane but eventually
  * incorrect) default sizes.
-- 
MST




[PULL 11/18] hw/i386/acpi-build: Return a pre-computed _PRT table

2024-09-11 Thread Michael S. Tsirkin
From: Ricardo Ribalda 

When qemu runs without kvm acceleration the ACPI executions take a great
amount of time. If they take more than the default time (30sec), the
ACPI calls fail and the system might not behave correctly.

Now the _PRT table is computed on the fly. We can drastically reduce the
execution of the _PRT method if we return a pre-computed table.

Without this patch:
[   51.343484] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error 
(AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[   51.527032] ACPI Error: Method execution failed \_SB.PCI0._PRT due to 
previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[   51.530049] virtio-pci :00:02.0: can't derive routing for PCI INT A
[   51.530797] virtio-pci :00:02.0: PCI INT A: no GSI
[   81.922901] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error 
(AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[   82.103534] ACPI Error: Method execution failed \_SB.PCI0._PRT due to 
previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[   82.106088] virtio-pci :00:04.0: can't derive routing for PCI INT A
[   82.106761] virtio-pci :00:04.0: PCI INT A: no GSI
[  112.192568] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error 
(AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[  112.486687] ACPI Error: Method execution failed \_SB.PCI0._PRT due to 
previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[  112.489554] virtio-pci :00:05.0: can't derive routing for PCI INT A
[  112.490027] virtio-pci :00:05.0: PCI INT A: no GSI
[  142.559448] ACPI Error: Aborting method \_SB.PCI0._PRT due to previous error 
(AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
[  142.718596] ACPI Error: Method execution failed \_SB.PCI0._PRT due to 
previous error (AE_AML_LOOP_TIMEOUT) (20230628/uteval-68)
[  142.722889] virtio-pci :00:06.0: can't derive routing for PCI INT A
[  142.724578] virtio-pci :00:06.0: PCI INT A: no GSI

With this patch:
[   22.938076] ACPI: \_SB_.LNKB: Enabled at IRQ 10
[   24.214002] ACPI: \_SB_.LNKD: Enabled at IRQ 11
[   25.465170] ACPI: \_SB_.LNKA: Enabled at IRQ 10
[   27.944920] ACPI: \_SB_.LNKC: Enabled at IRQ 11

ACPI disassembly:
Scope (PCI0)
{
Method (_PRT, 0, NotSerialized)  // _PRT: PCI Routing Table
{
Return (Package (0x80)
{
Package (0x04)
{
0x,
Zero,
LNKD,
Zero
},

Package (0x04)
{
0x,
One,
LNKA,
Zero
},

Package (0x04)
{
0x,
0x02,
LNKB,
Zero
},

Package (0x04)
{
0x,
0x03,
LNKC,
Zero
},

Package (0x04)
{
0x0001,
Zero,
LNKS,
Zero
},
Context: 
https://lore.kernel.org/virtualization/20240417145544.38d7b...@imammedo.users.ipa.redhat.com/T/#t

Signed-off-by: Ricardo Ribalda 
Reviewed-by: Igor Mammedov 
Reviewed-by: Richard Henderson 
Message-Id: <20240814115736.1580337-3-riba...@chromium.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/acpi-build.c | 118 ---
 1 file changed, 21 insertions(+), 97 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 5d4bd2b710..4967aa7459 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -724,120 +724,44 @@ static Aml *aml_pci_pdsm(void)
 return method;
 }
 
-/**
- * build_prt_entry:
- * @link_name: link name for PCI route entry
- *
- * build AML package containing a PCI route entry for @link_name
- */
-static Aml *build_prt_entry(const char *link_name)
-{
-Aml *a_zero = aml_int(0);
-Aml *pkg = aml_package(4);
-aml_append(pkg, a_zero);
-aml_append(pkg, a_zero);
-aml_append(pkg, aml_name("%s", link_name));
-aml_append(pkg, a_zero);
-return pkg;
-}
-
 /*
- * initialize_route - Initialize the interrupt routing rule
- * through a specific LINK:
- *  if (lnk_idx == idx)
- *  route using link 'link_name'
- */
-static Aml *initialize_route(Aml *route, const char *link_name,
- Aml *lnk_idx, int idx)
-{
-Aml *if_ctx = aml_if(aml_equal(lnk_idx, aml_int(idx)));
-Aml *pkg = build_prt_entry(link_name);
-
-aml_append(if_ctx, aml_store(pkg, route));
-
-return if_ctx;
-}
-

[PULL 04/18] hw/isa/vt82c686.c: Embed i8259 irq in device state instead of allocating

2024-09-11 Thread Michael S. Tsirkin
From: BALATON Zoltan 

To avoid a warning about unfreed qemu_irq embed the i8259 irq in the
device state instead of allocating it.

Signed-off-by: BALATON Zoltan 
Message-Id: 

Signed-off-by: BALATON Zoltan 
---
 hw/isa/vt82c686.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 505b44c4e6..82591e3e07 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -592,6 +592,8 @@ OBJECT_DECLARE_SIMPLE_TYPE(ViaISAState, VIA_ISA)
 
 struct ViaISAState {
 PCIDevice dev;
+
+IRQState i8259_irq;
 qemu_irq cpu_intr;
 qemu_irq *isa_irqs_in;
 uint16_t irq_state[ISA_NUM_IRQS];
@@ -715,13 +717,12 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
 ViaISAState *s = VIA_ISA(d);
 DeviceState *dev = DEVICE(d);
 PCIBus *pci_bus = pci_get_bus(d);
-qemu_irq *isa_irq;
 ISABus *isa_bus;
 int i;
 
 qdev_init_gpio_out_named(dev, &s->cpu_intr, "intr", 1);
 qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);
-isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
+qemu_init_irq(&s->i8259_irq, via_isa_request_i8259_irq, s, 0);
 isa_bus = isa_bus_new(dev, pci_address_space(d), pci_address_space_io(d),
   errp);
 
@@ -729,7 +730,7 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
 return;
 }
 
-s->isa_irqs_in = i8259_init(isa_bus, *isa_irq);
+s->isa_irqs_in = i8259_init(isa_bus, &s->i8259_irq);
 isa_bus_register_input_irqs(isa_bus, s->isa_irqs_in);
 i8254_pit_init(isa_bus, 0x40, 0, NULL);
 i8257_dma_init(OBJECT(d), isa_bus, 0);
-- 
MST




[PULL 02/18] virtio: Always reset vhost devices

2024-09-11 Thread Michael S. Tsirkin
From: Hanna Czenczek 

Requiring `vhost_started` to be true for resetting vhost devices in
`virtio_reset()` seems like the wrong condition: Most importantly, the
preceding `virtio_set_status(vdev, 0)` call will (for vhost devices) end
up in `vhost_dev_stop()` (through vhost devices' `.set_status`
implementations), setting `vdev->vhost_started = false`.  Therefore, the
gated `vhost_reset_device()` call is unreachable.

`vhost_started` is not documented, so it is hard to say what exactly it
is supposed to mean, but judging from the fact that `vhost_dev_start()`
sets it and `vhost_dev_stop()` clears it, it seems like it indicates
whether there is a vhost back-end, and whether that back-end is
currently running and processing virtio requests.

Making a reset conditional on whether the vhost back-end is processing
virtio requests seems wrong; in fact, it is probably better to reset it
only when it is not currently processing requests, which is exactly the
current order of operations in `virtio_reset()`: First, the back-end is
stopped through `virtio_set_status(vdev, 0)`, then we want to send a
reset.

Therefore, we should drop the `vhost_started` condition, but in its
stead we then have to verify that we can indeed send a reset to this
vhost device, by not just checking `k->get_vhost != NULL` (introduced by
commit 95e1019a4a9), but also that the vhost back-end is connected
(`hdev = k->get_vhost(); hdev != NULL && hdev->vhost_ops != NULL`).

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Hanna Czenczek 
Message-Id: <20240723163941.48775-3-hre...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9e10cbc058..42589adf2c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2331,8 +2331,12 @@ void virtio_reset(void *opaque)
 vdev->device_endian = virtio_default_endian();
 }
 
-if (vdev->vhost_started && k->get_vhost) {
-vhost_reset_device(k->get_vhost(vdev));
+if (k->get_vhost) {
+struct vhost_dev *hdev = k->get_vhost(vdev);
+/* Only reset when vhost back-end is connected */
+if (hdev && hdev->vhost_ops) {
+vhost_reset_device(hdev);
+}
 }
 
 if (k->reset) {
-- 
MST




Re: [PATCH v1 00/14] s390x: virtio-mem support

2024-09-11 Thread Michael S. Tsirkin
> > 
> > I'd rather have it in a shared and bigger repo than in your personal
> > gitlab one. Maybe there's a space somewhere in QEMU or the Virtio team's
> > repos that would be a good fit if the kernel's docu isn't the right place?
> 
> At this point, outside of kernel/QEMU feels like the right thing to do.
> Conny is already a co-maintainer of my "personal" (;)) gitlab.
> 
> 
> And now I realize that I CCed Heiko on the Linux series but not the QEMU
> series. My bad.
> 
> [1] https://lore.kernel.org/all/20200727114819.3f816010.coh...@redhat.com/


No prob. Or if you want it in virtio spec, that's also fine.




[PULL 14/18] virtio-pci: Add lookup subregion of VirtIOPCIRegion MR

2024-09-11 Thread Michael S. Tsirkin
From: Gao Shiyuan 

Now virtio_address_space_lookup only lookup common/isr/device/notify
MR and exclude their subregions.

When VHOST_USER_PROTOCOL_F_HOST_NOTIFIER enable, the notify MR has
host-notifier subregions and we need use host-notifier MR to
notify the hardware accelerator directly instead of eventfd notify.

Further more, maybe common/isr/device MR also has subregions in
the future, so need memory_region_find for each MR incluing
their subregions.

Add lookup subregion of VirtIOPCIRegion MR instead of only lookup container MR.

Fixes: a93c8d8 ("virtio-pci: Replace modern_as with direct access to 
modern_bar")
Co-developed-by: Zuo Boqun 
Signed-off-by: Gao Shiyuan 
Signed-off-by: Zuo Boqun 
Message-Id: <20240903120304.97833-1-gaoshiy...@baidu.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 524b63e5c7..4d832fe845 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -615,8 +615,12 @@ static MemoryRegion 
*virtio_address_space_lookup(VirtIOPCIProxy *proxy,
 reg = &proxy->regs[i];
 if (*off >= reg->offset &&
 *off + len <= reg->offset + reg->size) {
-*off -= reg->offset;
-return ®->mr;
+MemoryRegionSection mrs = memory_region_find(®->mr,
+*off - reg->offset, len);
+assert(mrs.mr);
+*off = mrs.offset_within_region;
+memory_region_unref(mrs.mr);
+return mrs.mr;
 }
 }
 
-- 
MST




[PULL 12/18] tests/acpi: pc: update golden masters for DSDT

2024-09-11 Thread Michael S. Tsirkin
From: Ricardo Ribalda 

Signed-off-by: Ricardo Ribalda 
Message-Id: <20240814115736.1580337-4-riba...@chromium.org>
Acked-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/bios-tables-test-allowed-diff.h |  15 ---
 tests/data/acpi/x86/pc/DSDT | Bin 6830 -> 8527 bytes
 tests/data/acpi/x86/pc/DSDT.acpierst| Bin 6741 -> 8438 bytes
 tests/data/acpi/x86/pc/DSDT.acpihmat| Bin 8155 -> 9852 bytes
 tests/data/acpi/x86/pc/DSDT.bridge  | Bin 13701 -> 15398 bytes
 tests/data/acpi/x86/pc/DSDT.cphp| Bin 7294 -> 8991 bytes
 tests/data/acpi/x86/pc/DSDT.dimmpxm | Bin 8484 -> 10181 bytes
 tests/data/acpi/x86/pc/DSDT.hpbridge| Bin 6781 -> 8478 bytes
 tests/data/acpi/x86/pc/DSDT.hpbrroot| Bin 3337 -> 5034 bytes
 tests/data/acpi/x86/pc/DSDT.ipmikcs | Bin 6902 -> 8599 bytes
 tests/data/acpi/x86/pc/DSDT.memhp   | Bin 8189 -> 9886 bytes
 tests/data/acpi/x86/pc/DSDT.nohpet  | Bin 6688 -> 8385 bytes
 tests/data/acpi/x86/pc/DSDT.numamem | Bin 6836 -> 8533 bytes
 tests/data/acpi/x86/pc/DSDT.roothp  | Bin 10623 -> 12320 bytes
 tests/data/acpi/x86/q35/DSDT.cxl| Bin 9714 -> 13148 bytes
 tests/data/acpi/x86/q35/DSDT.viot   | Bin 9464 -> 14615 bytes
 16 files changed, 15 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index f81f4e2469..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,16 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/x86/pc/DSDT",
-"tests/data/acpi/x86/pc/DSDT.acpierst",
-"tests/data/acpi/x86/pc/DSDT.acpihmat",
-"tests/data/acpi/x86/pc/DSDT.bridge",
-"tests/data/acpi/x86/pc/DSDT.cphp",
-"tests/data/acpi/x86/pc/DSDT.dimmpxm",
-"tests/data/acpi/x86/pc/DSDT.hpbridge",
-"tests/data/acpi/x86/pc/DSDT.hpbrroot",
-"tests/data/acpi/x86/pc/DSDT.ipmikcs",
-"tests/data/acpi/x86/pc/DSDT.memhp",
-"tests/data/acpi/x86/pc/DSDT.nohpet",
-"tests/data/acpi/x86/pc/DSDT.numamem",
-"tests/data/acpi/x86/pc/DSDT.roothp",
-"tests/data/acpi/x86/q35/DSDT.cxl",
-"tests/data/acpi/x86/q35/DSDT.viot",
diff --git a/tests/data/acpi/x86/pc/DSDT b/tests/data/acpi/x86/pc/DSDT
index 
c93ad6b7f83a168a1833d7dba1112dd2ab8a431f..92225236e717b2e522a2ee00492fb0ded418dc7b
 100644
GIT binary patch
delta 1914
zcmY+_OODep0DxgPNnd%jP12_CiUT0A;Q&p;V$`UN=xkV3oPpV!ldxiqgg66h&cP{+
zY{yalUETPX{l!W9^Znl{tnc;9$2UR@KK}~edh+S}<6b@H&Fk~!>0R;R_3`}aMf}K{
zui>{W3L>RMbGz2UejYTp>se&6{yy}Y+qH57zqws6B1?i;se#bVy&LJO5?N}mneO6-
zQkb-ECwnHXx{*%ZT+gO;n|ii9w{v|{x-IyobW?p>p4-y5rCCMBN(Cl$TL?_*<_0!(
z+YfB&mKmDTZD9!MkkTQcqZTUdEgL~PqI5**7}7DMV@k(8X{^))(1|W(C!J6_p>zuA
z6w)cBQ%b{DYM~*Zp^)Zra0eAq8bKOC8c`Zi8bcbJ(j95shf1Z#{iip9G=VgsG@&$w
zG=(&^rGusp+MfRm(ix;PN@tYLA)P}y?@N^)_Mr=1n)5er?E=yTr3*?kNHa(?N;4<5
ztK@*@Lh3i%-IbiuoYDf)0@8xgg3_fa-D|q+L#0|$x}XXxTxf
z`Bf|_T|>HtbWQ1+(ha1WzEo`p-I!A9J(=_0Lb`=?OX-%<9i%%*cb(KgcZBYRZ2w-&
v7WR&3@8YKQCul;%81oaj=^7L(1Z93hC)pF2U};b!Ak^BO@J!EO)E`
z8Q_hhIHYHpz=)r`$a7P
VSu!$E3k_1#Im4bk^4G2;e?PvR!t!3eyuDMZ_w#r7-qTM%9{2NOR=qx7o<8KCULVh&U(~Ox
z`WAjqqaZRyRkv%Y?B`x}yPi}U*WZU;b-SM3z^`uCi%KU!Y)ntp&AlDf)mf#9y(Zno
z4YiWAY$kh>p4(BKy1AZB%O>?~d1~hRQo1epQo5ibJ}V2`pD}1ksKa*IGw0GP_P8Zv9R^MC&L{dN;v;W{Q*1RR|$ZhmvgczF#x~
WkQE~XjnE)PQ!wn=qj+7c?eGW5el%79

diff --git a/tests/data/acpi/x86/pc/DSDT.acpihmat 
b/tests/data/acpi/x86/pc/DSDT.acpihmat
index 
9d3695ff289036856886a093733926667a32a058..73a9ce59e9426b180fea0ec5820c4841ebdb6700
 100644
GIT binary patch
delta 1914
zcmY+_OODzw0DxhWgjZfb62dFMuDeRTK?S6X5fK`d*`uDJv#jKvq^n+_>T!C4Zo3}I
zmF+mnzr)18>@QB>&&RhWtnc;9w|7De%3tAoPd@#4+|Q4B^ZI;w`cQm&eLR1D5kK?h
zYxq5jf=DUR+^)5-p9jtDdJs17FlYqneO6-
zQkb-ECwnG6cO#v;xt>kyHuY?IYUlc8AR&Jhi27OS6iMl?qJiwh)-q%?)ho
zwjbEkEi*Kw+rkjiA*Dk?M=ey^TQ-7pMCpjqF{EQi$CQqH(paeppc7rnPCB7-3G=?-br90BN50y%f`%iBIX##0NX+miV
zX$om-O9xFIv_1bBq%%lol+GxfLpq0a-j^yp>_Zp2H0N*L+6ANwN*9!7kYM4bk#wn
z`Bhv|x`uQO>6+3tr5i{$eW}_Ix-q5Hdot(0g>(z)meMVyJ4kns?mDS~?g*`fZ2w-&
u7Ai<9N-IkDknSPfQ@XdMgC6?OFX{m4fzkt|M@Wy59w|LK>F>YZzv@4Lfz1m5

delta 204
zcmX}gy$XU*7zW_ipT#c%mByf@H3({t)S*x!(p`_|T)-h{D2W`U_0Vq74MaE5J#-h%
zfzjglJm2U&qaY5F_W;md4;tyG#Wan*)DCIwFUP%r()A{RboDZmU^maYFe>GNk&avz
zTb6?i@Y+@!(zQ$=T7>~bJycw)9mU9OJBGM+Tg@SAM{&~I@%}SWoQ$@72zlJ+WTVqG
Ys01JjMg}UOL5eD4*s*)|JX@ds52i3S0{{R3

diff --git a/tests/data/acpi/x86/pc/DSDT.bridge 
b/tests/data/acpi/x86/pc/DSDT.bridge
index 
840b45f354ac14c858d0af8fbd31e97949a65d4b..4cef454e379e1009141694e0f4036a2a701c80d7
 100644
GIT binary patch
delta 1914
zcmY+_xsKC79DwnO?OZ-`*v@@*bStf>X_A

[PULL 15/18] hw/cxl: fix physical address field in get scan media results output

2024-09-11 Thread Michael S. Tsirkin
From: peng guo 

When using the mailbox command get scan media results, the scan media
restart physical address field in the ouput palyload is not 64-byte
aligned.

This patch removed the error source of the restart physical address.

The Scan Media Restart Physical Address is the location from which the
host should restart the Scan Media operation. [5:0] bits are reserved.
Refer to CXL spec r3.1 Table 8-146

Fixes: 89b5cfcc31e6 ("hw/cxl: Add get scan media results cmd support")
Reviewed-by: Jonathan Cameron 
Link: 
https://lore.kernel.org/linux-cxl/20240819154206.16456-1-engguop...@buaa.edu.cn/
Signed-off-by: peng guo 
Message-Id: <20240825102212.3871-1-engguop...@buaa.edu.cn>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/cxl/cxl-mailbox-utils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3ebbd32e10..9258e48f95 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -2076,7 +2076,7 @@ static CXLRetCode cmd_media_get_scan_media_results(const 
struct cxl_cmd *cmd,
 
 start = ROUND_DOWN(ent->start, 64ull);
 stop = ROUND_DOWN(ent->start, 64ull) + ent->length;
-stq_le_p(&out->records[i].addr, start | (ent->type & 0x7));
+stq_le_p(&out->records[i].addr, start);
 stl_le_p(&out->records[i].length, (stop - start) / 
CXL_CACHE_LINE_SIZE);
 i++;
 
-- 
MST




[PULL 07/18] virtio: rename virtio_split_packed_update_used_idx

2024-09-11 Thread Michael S. Tsirkin
From: Wenyu Huang 

virtio_split_packed_update_used_idx should be
virtio_queue_split_update_used_idx like
virtio_split_packed_update_used_idx.

Signed-off-by: Wenyu Huang 
Message-Id: 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 42589adf2c..a26f18908e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3669,7 +3669,7 @@ static void 
virtio_queue_packed_update_used_idx(VirtIODevice *vdev, int n)
 return;
 }
 
-static void virtio_split_packed_update_used_idx(VirtIODevice *vdev, int n)
+static void virtio_queue_split_update_used_idx(VirtIODevice *vdev, int n)
 {
 RCU_READ_LOCK_GUARD();
 if (vdev->vq[n].vring.desc) {
@@ -3682,7 +3682,7 @@ void virtio_queue_update_used_idx(VirtIODevice *vdev, int 
n)
 if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
 return virtio_queue_packed_update_used_idx(vdev, n);
 } else {
-return virtio_split_packed_update_used_idx(vdev, n);
+return virtio_queue_split_update_used_idx(vdev, n);
 }
 }
 
-- 
MST




[PULL 16/18] hw/audio/virtio-sound: fix heap buffer overflow

2024-09-11 Thread Michael S. Tsirkin
From: Volker Rümelin 

Currently, the guest may write to the device configuration space,
whereas the virtio sound device specification in chapter 5.14.4
clearly states that the fields in the device configuration space
are driver-read-only.

Remove the set_config function from the virtio_snd class.

This also prevents a heap buffer overflow. See QEMU issue #2296.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2296
Signed-off-by: Volker Rümelin 
Message-Id: <20240901130112.8242-1-vr_q...@t-online.de>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/audio/virtio-snd.c | 24 
 hw/audio/trace-events |  1 -
 2 files changed, 25 deletions(-)

diff --git a/hw/audio/virtio-snd.c b/hw/audio/virtio-snd.c
index d1cf5eb445..69838181dd 100644
--- a/hw/audio/virtio-snd.c
+++ b/hw/audio/virtio-snd.c
@@ -107,29 +107,6 @@ virtio_snd_get_config(VirtIODevice *vdev, uint8_t *config)
 
 }
 
-static void
-virtio_snd_set_config(VirtIODevice *vdev, const uint8_t *config)
-{
-VirtIOSound *s = VIRTIO_SND(vdev);
-const virtio_snd_config *sndconfig =
-(const virtio_snd_config *)config;
-
-
-   trace_virtio_snd_set_config(vdev,
-   s->snd_conf.jacks,
-   sndconfig->jacks,
-   s->snd_conf.streams,
-   sndconfig->streams,
-   s->snd_conf.chmaps,
-   sndconfig->chmaps);
-
-memcpy(&s->snd_conf, sndconfig, sizeof(virtio_snd_config));
-le32_to_cpus(&s->snd_conf.jacks);
-le32_to_cpus(&s->snd_conf.streams);
-le32_to_cpus(&s->snd_conf.chmaps);
-
-}
-
 static void
 virtio_snd_pcm_buffer_free(VirtIOSoundPCMBuffer *buffer)
 {
@@ -1400,7 +1377,6 @@ static void virtio_snd_class_init(ObjectClass *klass, 
void *data)
 vdc->realize = virtio_snd_realize;
 vdc->unrealize = virtio_snd_unrealize;
 vdc->get_config = virtio_snd_get_config;
-vdc->set_config = virtio_snd_set_config;
 vdc->get_features = get_features;
 vdc->reset = virtio_snd_reset;
 vdc->legacy_features = 0;
diff --git a/hw/audio/trace-events b/hw/audio/trace-events
index b1870ff224..b8ef572767 100644
--- a/hw/audio/trace-events
+++ b/hw/audio/trace-events
@@ -41,7 +41,6 @@ asc_update_irq(int irq, int a, int b) "set IRQ to %d (A: 0x%x 
B: 0x%x)"
 
 #virtio-snd.c
 virtio_snd_get_config(void *vdev, uint32_t jacks, uint32_t streams, uint32_t 
chmaps) "snd %p: get_config jacks=%"PRIu32" streams=%"PRIu32" chmaps=%"PRIu32""
-virtio_snd_set_config(void *vdev, uint32_t jacks, uint32_t new_jacks, uint32_t 
streams, uint32_t new_streams, uint32_t chmaps, uint32_t new_chmaps) "snd %p: 
set_config jacks from %"PRIu32"->%"PRIu32", streams from %"PRIu32"->%"PRIu32", 
chmaps from %"PRIu32"->%"PRIu32
 virtio_snd_get_features(void *vdev, uint64_t features) "snd %p: get_features 
0x%"PRIx64
 virtio_snd_vm_state_running(void) "vm state running"
 virtio_snd_vm_state_stopped(void) "vm state stopped"
-- 
MST




[PULL 06/18] hw/pci/pci-hmp-cmds: Avoid displaying bogus size in 'info pci'

2024-09-11 Thread Michael S. Tsirkin
From: Philippe Mathieu-Daudé 

When BAR aren't mapped, we get:

  (qemu) info pci
Bus  0, device   0, function 0:
  Host bridge: PCI device dead:beef
...
BAR4: 32 bit memory at 0x [0x0ffe].
BAR5: I/O at 0x [0x0ffe].

Check the BAR is mapped comparing its address to PCI_BAR_UNMAPPED
which is what the PCI layer uses for unmapped BARs.
See pci_bar_address and pci_update_mappings implementations and
in "hw/pci/pci.h":

  typedef struct PCIIORegion {
  pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
  #define PCI_BAR_UNMAPPED (~(pcibus_t)0)
  ...

This improves the logging, not displaying bogus sizes:

  (qemu) info pci
Bus  0, device   0, function 0:
  Host bridge: PCI device dead:beef
...
  BAR4: 32 bit memory (not mapped)
  BAR5: I/O (not mapped)

Remove trailing dot which is not used in other commands format.

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20240801131449.51328-1-phi...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pci-hmp-cmds.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/hw/pci/pci-hmp-cmds.c b/hw/pci/pci-hmp-cmds.c
index b09fce9377..fdfe44435c 100644
--- a/hw/pci/pci-hmp-cmds.c
+++ b/hw/pci/pci-hmp-cmds.c
@@ -83,15 +83,25 @@ static void hmp_info_pci_device(Monitor *mon, const 
PciDeviceInfo *dev)
 monitor_printf(mon, "  BAR%" PRId64 ": ", region->value->bar);
 
 if (!strcmp(region->value->type, "io")) {
-monitor_printf(mon, "I/O at 0x%04" PRIx64
-" [0x%04" PRIx64 "].\n",
-   addr, addr + size - 1);
+if (addr != PCI_BAR_UNMAPPED) {
+monitor_printf(mon, "I/O at 0x%04" PRIx64
+" [0x%04" PRIx64 "]\n",
+   addr, addr + size - 1);
+} else {
+monitor_printf(mon, "I/O (not mapped)\n");
+}
 } else {
-monitor_printf(mon, "%d bit%s memory at 0x%08" PRIx64
-   " [0x%08" PRIx64 "].\n",
-   region->value->mem_type_64 ? 64 : 32,
-   region->value->prefetch ? " prefetchable" : "",
-   addr, addr + size - 1);
+if (addr != PCI_BAR_UNMAPPED) {
+monitor_printf(mon, "%d bit%s memory at 0x%08" PRIx64
+   " [0x%08" PRIx64 "]\n",
+   region->value->mem_type_64 ? 64 : 32,
+   region->value->prefetch ? " prefetchable" : "",
+   addr, addr + size - 1);
+} else {
+monitor_printf(mon, "%d bit%s memory (not mapped)\n",
+   region->value->mem_type_64 ? 64 : 32,
+   region->value->prefetch ? " prefetchable" : "");
+}
 }
 }
 
-- 
MST




[PULL 01/18] virtio: Allow .get_vhost() without vhost_started

2024-09-11 Thread Michael S. Tsirkin
From: Hanna Czenczek 

Historically, .get_vhost() was probably only called when
vdev->vhost_started is true.  However, we now decidedly want to call it
also when vhost_started is false, specifically so we can issue a reset
to the vhost back-end while device operation is stopped.

Some .get_vhost() implementations dereference some pointers (or return
offsets from them) that are probably guaranteed to be non-NULL when
vhost_started is true, but not necessarily otherwise.  This patch makes
all such implementations check all such pointers, returning NULL if any
is NULL.

Signed-off-by: Hanna Czenczek 
Message-Id: <20240723163941.48775-2-hre...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio.h  |  1 +
 hw/display/vhost-user-gpu.c |  2 +-
 hw/net/virtio-net.c | 19 +--
 hw/virtio/virtio-crypto.c   | 18 +++---
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 0fcbc5c0c6..f526ecc8fc 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -223,6 +223,7 @@ struct VirtioDeviceClass {
 int (*post_load)(VirtIODevice *vdev);
 const VMStateDescription *vmsd;
 bool (*primary_unplug_pending)(void *opaque);
+/* May be called even when vdev->vhost_started is false */
 struct vhost_dev *(*get_vhost)(VirtIODevice *vdev);
 void (*toggle_device_iotlb)(VirtIODevice *vdev);
 };
diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index c0c66910f1..14548f1a57 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -642,7 +642,7 @@ vhost_user_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 static struct vhost_dev *vhost_user_gpu_get_vhost(VirtIODevice *vdev)
 {
 VhostUserGPU *g = VHOST_USER_GPU(vdev);
-return &g->vhost->dev;
+return g->vhost ? &g->vhost->dev : NULL;
 }
 
 static Property vhost_user_gpu_properties[] = {
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ed33a32877..fb84d142ee 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3896,8 +3896,23 @@ static bool dev_unplug_pending(void *opaque)
 static struct vhost_dev *virtio_net_get_vhost(VirtIODevice *vdev)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
-NetClientState *nc = qemu_get_queue(n->nic);
-struct vhost_net *net = get_vhost_net(nc->peer);
+NetClientState *nc;
+struct vhost_net *net;
+
+if (!n->nic) {
+return NULL;
+}
+
+nc = qemu_get_queue(n->nic);
+if (!nc) {
+return NULL;
+}
+
+net = get_vhost_net(nc->peer);
+if (!net) {
+return NULL;
+}
+
 return &net->dev;
 }
 
diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index 5034768bff..0793f56965 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -1247,9 +1247,21 @@ static bool 
virtio_crypto_guest_notifier_pending(VirtIODevice *vdev, int idx)
 static struct vhost_dev *virtio_crypto_get_vhost(VirtIODevice *vdev)
 {
 VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(vdev);
-CryptoDevBackend *b = vcrypto->cryptodev;
-CryptoDevBackendClient *cc = b->conf.peers.ccs[0];
-CryptoDevBackendVhost *vhost_crypto = cryptodev_get_vhost(cc, b, 0);
+CryptoDevBackend *b;
+CryptoDevBackendClient *cc;
+CryptoDevBackendVhost *vhost_crypto;
+
+b = vcrypto->cryptodev;
+if (!b) {
+return NULL;
+}
+
+cc = b->conf.peers.ccs[0];
+vhost_crypto = cryptodev_get_vhost(cc, b, 0);
+if (!vhost_crypto) {
+return NULL;
+}
+
 return &vhost_crypto->dev;
 }
 
-- 
MST




[PULL 13/18] vhost_net: configure all host notifiers in a single MR transaction

2024-09-11 Thread Michael S. Tsirkin
From: zuoboqun 

This allows the vhost_net device which has multiple virtqueues to batch
the setup of all its host notifiers. This significantly reduces the
vhost_net device starting and stoping time, e.g. the time spend
on enabling notifiers reduce from 630ms to 75ms and the time spend on
disabling notifiers reduce from 441ms to 45ms for a VM with 192 vCPUs
and 15 vhost-user-net devices (64vq per device) in our case.

Signed-off-by: zuoboqun 
Message-Id: <20240816070835.8309-1-zuobo...@baidu.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/vhost.h |   4 +
 hw/net/vhost_net.c| 155 +++---
 hw/virtio/vhost.c |   6 +-
 3 files changed, 150 insertions(+), 15 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index d75faf46e9..c75be46c06 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -171,6 +171,10 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
  */
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 
+void vhost_dev_disable_notifiers_nvqs(struct vhost_dev *hdev,
+  VirtIODevice *vdev,
+  unsigned int nvqs);
+
 /**
  * vhost_dev_enable_notifiers() - enable event notifiers
  * @hdev: common vhost_dev structure
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index dedf9ad7c2..997aab0557 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -162,6 +162,135 @@ void vhost_net_save_acked_features(NetClientState *nc)
 #endif
 }
 
+static void vhost_net_disable_notifiers_nvhosts(VirtIODevice *dev,
+NetClientState *ncs, int data_queue_pairs, int nvhosts)
+{
+VirtIONet *n = VIRTIO_NET(dev);
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
+struct vhost_net *net;
+struct vhost_dev *hdev;
+int r, i, j;
+NetClientState *peer;
+
+/*
+ * Batch all the host notifiers in a single transaction to avoid
+ * quadratic time complexity in address_space_update_ioeventfds().
+ */
+memory_region_transaction_begin();
+
+for (i = 0; i < nvhosts; i++) {
+if (i < data_queue_pairs) {
+peer = qemu_get_peer(ncs, i);
+} else {
+peer = qemu_get_peer(ncs, n->max_queue_pairs);
+}
+
+net = get_vhost_net(peer);
+hdev = &net->dev;
+for (j = 0; j < hdev->nvqs; j++) {
+r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus),
+ hdev->vq_index + j,
+ false);
+if (r < 0) {
+error_report("vhost %d VQ %d notifier cleanup failed: %d",
+  i, j, -r);
+}
+assert(r >= 0);
+}
+}
+/*
+ * The transaction expects the ioeventfds to be open when it
+ * commits. Do it now, before the cleanup loop.
+ */
+memory_region_transaction_commit();
+
+for (i = 0; i < nvhosts; i++) {
+if (i < data_queue_pairs) {
+peer = qemu_get_peer(ncs, i);
+} else {
+peer = qemu_get_peer(ncs, n->max_queue_pairs);
+}
+
+net = get_vhost_net(peer);
+hdev = &net->dev;
+for (j = 0; j < hdev->nvqs; j++) {
+virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus),
+ hdev->vq_index + j);
+}
+virtio_device_release_ioeventfd(dev);
+}
+}
+
+static int vhost_net_enable_notifiers(VirtIODevice *dev,
+NetClientState *ncs, int data_queue_pairs, int cvq)
+{
+VirtIONet *n = VIRTIO_NET(dev);
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
+int nvhosts = data_queue_pairs + cvq;
+struct vhost_net *net;
+struct vhost_dev *hdev;
+int r, i, j;
+NetClientState *peer;
+
+/*
+ * Batch all the host notifiers in a single transaction to avoid
+ * quadratic time complexity in address_space_update_ioeventfds().
+ */
+memory_region_transaction_begin();
+
+for (i = 0; i < nvhosts; i++) {
+if (i < data_queue_pairs) {
+peer = qemu_get_peer(ncs, i);
+} else {
+peer = qemu_get_peer(ncs, n->max_queue_pairs);
+}
+
+net = get_vhost_net(peer);
+hdev = &net->dev;
+/*
+ * We will pass the notifiers to the kernel, make sure that QEMU
+ * doesn't interfere.
+ */
+r = virtio_device_grab_ioeventfd(dev);
+if (r < 0) {
+error_report("binding does not support host notifiers");
+memory_region_transaction_commit();
+goto fail_nvhosts;
+}
+
+for (j = 0; j < hdev->nvqs; j++) {
+r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus),
+

[PULL 00/18] virtio,pc,pci: features, fixes, cleanups

2024-09-11 Thread Michael S. Tsirkin
The following changes since commit a66f28df650166ae8b50c992eea45e7b247f4143:

  Merge tag 'migration-20240909-pull-request' of https://gitlab.com/peterx/qemu 
into staging (2024-09-10 11:19:22 +0100)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 6e3c2d58e967cde3dadae298e81c5e8eb9fb9080:

  hw/acpi/ich9: Add periodic and swsmi timer (2024-09-11 09:46:14 -0400)


virtio,pc,pci: features, fixes, cleanups

i286 acpi speedup by precomputing _PRT by Ricardo Ribalda
vhost_net speedup by using MR transactions by Zuo Boqun
ich9 gained support for periodic and swsmi timer by Dominic Prinz

Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin 


BALATON Zoltan (2):
  hw: Move declaration of IRQState to header and add init function
  hw/isa/vt82c686.c: Embed i8259 irq in device state instead of allocating

David Hildenbrand (1):
  virtio-mem: don't warn about THP sizes on a kernel without THP support

Dominic Prinz (1):
  hw/acpi/ich9: Add periodic and swsmi timer

Dongli Zhang (1):
  pci: don't skip function 0 occupancy verification for devfn auto assign

Gao Shiyuan (1):
  virtio-pci: Add lookup subregion of VirtIOPCIRegion MR

Hanna Czenczek (2):
  virtio: Allow .get_vhost() without vhost_started
  virtio: Always reset vhost devices

Philippe Mathieu-Daudé (1):
  hw/pci/pci-hmp-cmds: Avoid displaying bogus size in 'info pci'

Ricardo Ribalda (3):
  tests/acpi: pc: allow DSDT acpi table changes
  hw/i386/acpi-build: Return a pre-computed _PRT table
  tests/acpi: pc: update golden masters for DSDT

Volker Rümelin (1):
  hw/audio/virtio-sound: fix heap buffer overflow

Wenyu Huang (1):
  virtio: rename virtio_split_packed_update_used_idx

Zhenzhong Duan (2):
  intel_iommu: Fix invalidation descriptor type field
  intel_iommu: Make PASID-cache and PIOTLB type invalid in legacy mode

peng guo (1):
  hw/cxl: fix physical address field in get scan media results output

zuoboqun (1):
  vhost_net: configure all host notifiers in a single MR transaction

 hw/i386/intel_iommu_internal.h   |  11 +--
 include/hw/acpi/ich9.h   |   6 ++
 include/hw/acpi/ich9_timer.h |  23 ++
 include/hw/irq.h |  18 
 include/hw/southbridge/ich9.h|   4 +
 include/hw/virtio/vhost.h|   4 +
 include/hw/virtio/virtio.h   |   1 +
 hw/acpi/ich9.c   |  23 ++
 hw/acpi/ich9_timer.c |  93 +
 hw/audio/virtio-snd.c|  24 --
 hw/core/irq.c|  25 +++---
 hw/cxl/cxl-mailbox-utils.c   |   2 +-
 hw/display/vhost-user-gpu.c  |   2 +-
 hw/i386/acpi-build.c | 118 +-
 hw/i386/intel_iommu.c|  24 +++---
 hw/i386/pc.c |   5 +-
 hw/isa/lpc_ich9.c|  14 
 hw/isa/vt82c686.c|   7 +-
 hw/net/vhost_net.c   | 155 ---
 hw/net/virtio-net.c  |  19 -
 hw/pci/pci-hmp-cmds.c|  26 --
 hw/pci/pci.c |  17 ++--
 hw/virtio/vhost.c|   6 +-
 hw/virtio/virtio-crypto.c|  18 +++-
 hw/virtio/virtio-mem.c   |   7 ++
 hw/virtio/virtio-pci.c   |   8 +-
 hw/virtio/virtio.c   |  12 ++-
 hw/acpi/meson.build  |   2 +-
 hw/audio/trace-events|   1 -
 tests/data/acpi/x86/pc/DSDT  | Bin 6830 -> 8527 bytes
 tests/data/acpi/x86/pc/DSDT.acpierst | Bin 6741 -> 8438 bytes
 tests/data/acpi/x86/pc/DSDT.acpihmat | Bin 8155 -> 9852 bytes
 tests/data/acpi/x86/pc/DSDT.bridge   | Bin 13701 -> 15398 bytes
 tests/data/acpi/x86/pc/DSDT.cphp | Bin 7294 -> 8991 bytes
 tests/data/acpi/x86/pc/DSDT.dimmpxm  | Bin 8484 -> 10181 bytes
 tests/data/acpi/x86/pc/DSDT.hpbridge | Bin 6781 -> 8478 bytes
 tests/data/acpi/x86/pc/DSDT.hpbrroot | Bin 3337 -> 5034 bytes
 tests/data/acpi/x86/pc/DSDT.ipmikcs  | Bin 6902 -> 8599 bytes
 tests/data/acpi/x86/pc/DSDT.memhp| Bin 8189 -> 9886 bytes
 tests/data/acpi/x86/pc/DSDT.nohpet   | Bin 6688 -> 8385 bytes
 tests/data/acpi/x86/pc/DSDT.numamem  | Bin 6836 -> 8533 bytes
 tests/data/acpi/x86/pc/DSDT.roothp   | Bin 10623 -> 12320 bytes
 tests/data/acpi/x86/q35/DSDT.cxl | Bin 9714 -> 13148 bytes
 tests/data/acpi/x86/q35/DSDT.viot| Bin 9464 -> 14615 bytes
 44 files changed, 473 insertions(+), 202 deletions(-)
 create mode 100644 include/hw/acpi/ich9_timer.h
 create mode 100644 hw/acpi/ich9_timer.c




Re: [PATCH v2 5/8] qapi/pci: Supply missing member documentation

2024-09-11 Thread Michael S. Tsirkin
On Wed, Sep 11, 2024 at 01:25:42PM +0200, Markus Armbruster wrote:
> Since we neglect to document a member of PciMemoryRegion, its
> description in the QEMU QMP Reference manual is "Not documented".  Fix
> that.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Michael S. Tsirkin 


can you merge this?

> ---
>  qapi/pci.json| 2 ++
>  qapi/pragma.json | 1 -
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/pci.json b/qapi/pci.json
> index 78bee57b77..dc85a41d28 100644
> --- a/qapi/pci.json
> +++ b/qapi/pci.json
> @@ -33,6 +33,8 @@
>  # - 'io' if the region is a PIO region
>  # - 'memory' if the region is a MMIO region
>  #
> +# @address: memory address
> +#
>  # @size: memory size
>  #
>  # @prefetch: if @type is 'memory', true if the memory is prefetchable
> diff --git a/qapi/pragma.json b/qapi/pragma.json
> index 7b0e12f42b..baeae5bf52 100644
> --- a/qapi/pragma.json
> +++ b/qapi/pragma.json
> @@ -62,7 +62,6 @@
>  'MemoryDeviceInfoKind',
>  'NetClientDriver',
>  'ObjectType',
> -'PciMemoryRegion',
>  'QCryptodevBackendServiceType',
>  'QKeyCode',
>  'RbdAuthMode',
> -- 
> 2.46.0




Re: [PATCH for-9.2 v15 04/11] s390x/pci: Check for multifunction after device realization

2024-09-11 Thread Michael S. Tsirkin
On Wed, Sep 11, 2024 at 07:58:15PM +0900, Akihiko Odaki wrote:
> On 2024/09/11 18:38, Cédric Le Goater wrote:
> > +Matthew +Eric
> > 
> > Side note for the maintainers :
> > 
> > Before this change, the igb device, which is multifunction, was working
> > fine under Linux.
> > 
> > Was there a fix in Linux since :
> > 
> >    57da367b9ec4 ("s390x/pci: forbid multifunction pci device")
> >    6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug
> > handler")
> > 
> > ?
> > 
> > s390 PCI devices do not have extended capabilities, so the igb device
> > does not expose the SRIOV capability and only the PF is accessible but
> > it doesn't seem to be an issue. (Btw, CONFIG_PCI_IOV is set to y in the
> > default Linux config which is unexpected)
> 
> Doesn't s390x really see extended capabilities? hw/s390x/s390-pci-inst.c has
> a call pci_config_size() and pci_host_config_write_common(), which means it
> is exposing the whole PCI Express configuration space. Why can't s390x use
> extended capabilities then?
> 
> The best option for fix would be to replace the SR-IOV implementation with
> stub if s390x cannot use the SR-IOV capability. However I still need to know
> at what level I should change the implementation (e.g., is it fine to remove
> the entire capability, or should I keep the capability while writes to its
> registers no-op?)
> 
> Regards,
> Akihiko Odaki

Note changing caps needs compat hacks for cross version migration to work.

> > 
> > Thanks,
> > 
> > C.
> > 
> > 
> > 
> > On 8/23/24 07:00, Akihiko Odaki wrote:
> > > The SR-IOV PFs set the multifunction bits during device realization so
> > > check them after that. This forbids adding SR-IOV devices to s390x.
> > > 
> > > Signed-off-by: Akihiko Odaki 
> > > ---
> > >   hw/s390x/s390-pci-bus.c | 14 ++
> > >   1 file changed, 6 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> > > index 3e57d5faca18..00b2c1f6157b 100644
> > > --- a/hw/s390x/s390-pci-bus.c
> > > +++ b/hw/s390x/s390-pci-bus.c
> > > @@ -971,14 +971,7 @@ static void
> > > s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >   "this device");
> > >   }
> > > -    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> > > -    PCIDevice *pdev = PCI_DEVICE(dev);
> > > -
> > > -    if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
> > > -    error_setg(errp, "multifunction not supported in s390");
> > > -    return;
> > > -    }
> > > -    } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
> > > +    if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
> > >   S390PCIBusDevice *pbdev = S390_PCI_DEVICE(dev);
> > >   if (!s390_pci_alloc_idx(s, pbdev)) {
> > > @@ -1069,6 +1062,11 @@ static void s390_pcihost_plug(HotplugHandler
> > > *hotplug_dev, DeviceState *dev,
> > >   } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> > >   pdev = PCI_DEVICE(dev);
> > > +    if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
> > > +    error_setg(errp, "multifunction not supported in s390");
> > > +    return;
> > > +    }
> > > +
> > >   if (!dev->id) {
> > >   /* In the case the PCI device does not define an id */
> > >   /* we generate one based on the PCI address */
> > > 
> > 




Re: [PATCH] hw/smbios: support for type 7 (cache information)

2024-09-11 Thread Michael S. Tsirkin
On Sun, Aug 11, 2024 at 10:45:38AM +, Hal Martin wrote:
> This patch adds support for SMBIOS type 7 (Cache Information) to qemu.
> 
> level: cache level (1-8)
> size: cache size in bytes
> 
> Example usage:
> -smbios type=7,level=1,size=0x8000
> 
> Signed-off-by: Hal Martin 

A bunch of style issues here:

> ---
>  hw/smbios/smbios.c   | 63 
>  include/hw/firmware/smbios.h | 18 +++
>  qemu-options.hx  |  2 ++
>  3 files changed, 83 insertions(+)
> 
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index a394514264..65942f2354 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -83,6 +83,12 @@ static struct {
>  .processor_family = 0x01, /* Other */
>  };
>  
> +struct type7_instance {
> +uint16_t level, size;
> +QTAILQ_ENTRY(type7_instance) next;
> +};
> +static QTAILQ_HEAD(, type7_instance) type7 = QTAILQ_HEAD_INITIALIZER(type7);
> +
>  struct type8_instance {
>  const char *internal_reference, *external_reference;
>  uint8_t connector_type, port_type;
> @@ -330,6 +336,23 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = {
>  { /* end of list */ }
>  };
>  
> +static const QemuOptDesc qemu_smbios_type7_opts[] = {
> +{
> +.name = "type",
> +.type = QEMU_OPT_NUMBER,
> +.help = "SMBIOS element type",
> +},{
> +.name = "level",
> +.type = QEMU_OPT_NUMBER,
> +.help = "cache level",
> +},{
> +.name = "size",
> +.type = QEMU_OPT_NUMBER,
> +.help = "cache size",
> +},
> +{ /* end of list */ }
> +};
> +
>  static const QemuOptDesc qemu_smbios_type8_opts[] = {
>  {
>  .name = "type",
> @@ -733,6 +756,32 @@ static void smbios_build_type_4_table(MachineState *ms, 
> unsigned instance,
>  smbios_type4_count++;
>  }
>  
> +static void smbios_build_type_7_table(void)
> +{
> +unsigned instance = 0;
> +struct type7_instance *t7;
> +char designation[20];
> +
> +QTAILQ_FOREACH(t7, &type7, next) {
> +SMBIOS_BUILD_TABLE_PRE(7, T0_BASE + instance, true);
> +sprintf(designation, "CPU Internal L%d", t7->level);
> +SMBIOS_TABLE_SET_STR(7, socket_designation, designation);
> +t->cache_configuration =  0x180 | (t7->level-1); /* not socketed, 
> enabled, write back*/

bad comment style, line too long, bad math style

> +t->installed_size =  t7->size;
> +t->maximum_cache_size =  t7->size; /* set max to installed */
> +t->supported_sram_type = 0x10; /* pipeline burst */
> +t->current_sram_type = 0x10; /* pipeline burst */
> +t->cache_speed = 0x1; /* 1 ns */
> +t->error_correction_type = 0x6; /* Multi-bit ECC */
> +t->system_cache_type = 0x05; /* Unified */
> +t->associativity = 0x6; /* Fully Associative */
> +t->maximum_cache_size2 = t7->size;
> +t->installed_cache_size2 = t7->size;
> +SMBIOS_BUILD_TABLE_POST;
> +instance++;
> +}
> +}
> +
>  static void smbios_build_type_8_table(void)
>  {
>  unsigned instance = 0;
> @@ -1120,6 +1169,7 @@ static bool smbios_get_tables_ep(MachineState *ms,
>  }
>  }
>  
> +smbios_build_type_7_table();
>  smbios_build_type_8_table();
>  smbios_build_type_9_table(errp);
>  smbios_build_type_11_table();
> @@ -1478,6 +1528,19 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
> UINT16_MAX);
>  }
>  return;
> +case 7:
> +if (!qemu_opts_validate(opts, qemu_smbios_type7_opts, errp)) {
> +return;
> +}
> +struct type7_instance *t7_i;
> +t7_i = g_new0(struct type7_instance, 1);
> +t7_i->level = qemu_opt_get_number(opts,"level", 0x0);

bad comma style

> +t7_i->size = qemu_opt_get_number(opts, "size", 0x0200);
> +/* Only cache levels 1-8 are permitted */
> +if (t7_i->level > 0 && t7_i->level < 9) {
> +QTAILQ_INSERT_TAIL(&type7, t7_i, next);
> +}
> +return;
>  case 8:
>  if (!qemu_opts_validate(opts, qemu_smbios_type8_opts, errp)) {
>  return;
> diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
> index f066ab7262..1ea1506b46 100644
> --- a/include/hw/firmware/smbios.h
> +++ b/include/hw/firmware/smbios.h
> @@ -220,6 +220,24 @@ typedef enum smbios_type_4_len_ver {
>  SMBIOS_TYPE_4_LEN_V30 = offsetofend(struct smbios_type_4, thread_count2),
>  } smbios_type_4_len_ver;
>  
> +/* SMBIOS type 7 - Cache Information (v2.0+) */
> +struct smbios_type_7 {
> +struct smbios_structure_header header;
> +uint8_t socket_designation;
> +uint16_t cache_configuration;
> +uint16_t maximum_cache_size;
> +uint16_t installed_size;
> +uint16_t supported_sram_type;
> +uint16_t current_sram_type;
> +uint8_t cache_speed;
> +uint8_t e

Re: [PATCH v3 00/17] intel_iommu: Enable stage-1 translation for emulated device

2024-09-11 Thread Michael S. Tsirkin
On Wed, Sep 11, 2024 at 08:43:10AM +, Duan, Zhenzhong wrote:
> Hi Clement,
> 
> Thanks for your review. Hoping it could be accepted in the foreseeable future.
> 
> Thanks
> Zhenzhong

the comments are minor, so just keep iterating.

> >-Original Message-
> >From: CLEMENT MATHIEU--DRIF 
> >Subject: Re: [PATCH v3 00/17] intel_iommu: Enable stage-1 translation for
> >emulated device
> >
> >Hi Zhenzhong,
> >
> >Thanks for posting a new version.
> >I think it starting to look good.
> >Just a few comments.
> >
> > >cmd
> >
> >On 11/09/2024 07:22, Zhenzhong Duan wrote:
> >> Caution: External email. Do not open attachments or click links, unless 
> >> this
> >email comes from a known sender and you know the content is safe.
> >>
> >>
> >> Hi,
> >>
> >> Per Jason Wang's suggestion, iommufd nesting series[1] is split into
> >> "Enable stage-1 translation for emulated device" series and
> >> "Enable stage-1 translation for passthrough device" series.
> >>
> >> This series enables stage-1 translation support for emulated device
> >> in intel iommu which we called "modern" mode.
> >>
> >> PATCH1-5:  Some preparing work before support stage-1 translation
> >> PATCH6-8:  Implement stage-1 translation for emulated device
> >> PATCH9-13: Emulate iotlb invalidation of stage-1 mapping
> >> PATCH14:   Set default aw_bits to 48 in scalable modren mode
> >> PATCH15-16:Expose scalable "modern" mode and "x-cap-fs1gp" to cmdline
> >> PATCH17:   Add qtest
> >>
> >> Note in spec revision 3.4, it renames "First-level" to "First-stage",
> >> "Second-level" to "Second-stage". But the scalable mode was added
> >> before that change. So we keep old favor using First-level/fl/Second-
> >level/sl
> >> in code but change to use stage-1/stage-2 in commit log.
> >> But keep in mind First-level/fl/stage-1 all have same meaning,
> >> same for Second-level/sl/stage-2.
> >>
> >> Qemu code can be found at [2]
> >> The whole nesting series can be found at [3]
> >>
> >> [1] https://lists.gnu.org/archive/html/qemu-devel/2024-
> >01/msg02740.html
> >> [2]
> >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_stage1_em
> >u_v3
> >> [3]
> >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_rfc
> >v2
> >>
> >> Thanks
> >> Zhenzhong
> >>
> >> Changelog:
> >> v3:
> >> - drop unnecessary !(s->ecap & VTD_ECAP_SMTS) (Clement)
> >> - simplify calculation of return value for vtd_iova_fl_check_canonical()
> >(Liuyi)
> >> - make A/D bit setting atomic (Liuyi)
> >> - refine error msg (Clement, Liuyi)
> >>
> >> v2:
> >> - check ecap/cap bits instead of s->scalable_modern in
> >vtd_pe_type_check() (Clement)
> >> - declare VTD_ECAP_FLTS/FS1GP after the feature is implemented
> >(Clement)
> >> - define VTD_INV_DESC_PIOTLB_G (Clement)
> >> - make error msg consistent in vtd_process_piotlb_desc() (Clement)
> >> - refine commit log in patch16 (Clement)
> >> - add VTD_ECAP_IR to ECAP_MODERN_FIXED1 (Clement)
> >> - add a knob x-cap-fs1gp to control stage-1 1G paging capability
> >> - collect Clement's R-B
> >>
> >> v1:
> >> - define VTD_HOST_AW_AUTO (Clement)
> >> - passing pgtt as a parameter to vtd_update_iotlb (Clement)
> >> - prefix sl_/fl_ to second/first level specific functions (Clement)
> >> - pick reserved bit check from Clement, add his Co-developed-by
> >> - Update test without using libqtest-single.h (Thomas)
> >>
> >> rfcv2:
> >> - split from nesting series (Jason)
> >> - merged some commits from Clement
> >> - add qtest (jason)
> >>
> >>
> >> Clément Mathieu--Drif (4):
> >>intel_iommu: Check if the input address is canonical
> >>intel_iommu: Set accessed and dirty bits during first stage
> >>  translation
> >>intel_iommu: Add an internal API to find an address space with PASID
> >>intel_iommu: Add support for PASID-based device IOTLB invalidation
> >>
> >> Yi Liu (3):
> >>intel_iommu: Rename slpte to pte
> >>intel_iommu: Implement stage-1 translation
> >>intel_iommu: Modify x-scalable-mode to be string option to expose
> >>  scalable modern mode
> >>
> >> Yu Zhang (1):
> >>intel_iommu: Use the latest fault reasons defined by spec
> >>
> >> Zhenzhong Duan (9):
> >>intel_iommu: Make pasid entry type check accurate
> >>intel_iommu: Add a placeholder variable for scalable modern mode
> >>intel_iommu: Flush stage-2 cache in PASID-selective PASID-based iotlb
> >>  invalidation
> >>intel_iommu: Flush stage-1 cache in iotlb invalidation
> >>intel_iommu: Process PASID-based iotlb invalidation
> >>intel_iommu: piotlb invalidation should notify unmap
> >>intel_iommu: Set default aw_bits to 48 in scalable modern mode
> >>intel_iommu: Introduce a property to control FS1GP cap bit setting
> >>tests/qtest: Add intel-iommu test
> >>
> >>   MAINTAINERS|   1 +
> >>   hw/i386/intel_iommu_internal.h |  91 -
> >>   include/hw/i386/intel_iommu.h  |   9 +-
> >>   hw/i386/intel_iommu.c  | 694 +++
> >--
> >>   test

Re: [PATCH] softmmu: Support concurrent bounce buffers

2024-09-11 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 11:36:08PM +0200, Mattias Nissler wrote:
> On Tue, Sep 10, 2024 at 6:40 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Sep 10, 2024 at 06:10:50PM +0200, Mattias Nissler wrote:
> > > On Tue, Sep 10, 2024 at 5:44 PM Peter Maydell  
> > > wrote:
> > > >
> > > > On Tue, 10 Sept 2024 at 15:53, Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Aug 19, 2024 at 06:54:54AM -0700, Mattias Nissler wrote:
> > > > > > When DMA memory can't be directly accessed, as is the case when
> > > > > > running the device model in a separate process without shareable DMA
> > > > > > file descriptors, bounce buffering is used.
> > > > > >
> > > > > > It is not uncommon for device models to request mapping of several 
> > > > > > DMA
> > > > > > regions at the same time. Examples include:
> > > > > >  * net devices, e.g. when transmitting a packet that is split across
> > > > > >several TX descriptors (observed with igb)
> > > > > >  * USB host controllers, when handling a packet with multiple data 
> > > > > > TRBs
> > > > > >(observed with xhci)
> > > > > >
> > > > > > Previously, qemu only provided a single bounce buffer per 
> > > > > > AddressSpace
> > > > > > and would fail DMA map requests while the buffer was already in 
> > > > > > use. In
> > > > > > turn, this would cause DMA failures that ultimately manifest as 
> > > > > > hardware
> > > > > > errors from the guest perspective.
> > > > > >
> > > > > > This change allocates DMA bounce buffers dynamically instead of
> > > > > > supporting only a single buffer. Thus, multiple DMA mappings work
> > > > > > correctly also when RAM can't be mmap()-ed.
> > > > > >
> > > > > > The total bounce buffer allocation size is limited individually for 
> > > > > > each
> > > > > > AddressSpace. The default limit is 4096 bytes, matching the previous
> > > > > > maximum buffer size. A new x-max-bounce-buffer-size parameter is
> > > > > > provided to configure the limit for PCI devices.
> > > > > >
> > > > > > Signed-off-by: Mattias Nissler 
> > > > > > Reviewed-by: Philippe Mathieu-Daudé 
> > > > > > Acked-by: Peter Xu 
> > > > > > ---
> > > > > > This patch is split out from my "Support message-based DMA in 
> > > > > > vfio-user server"
> > > > > > series. With the series having been partially applied, I'm 
> > > > > > splitting this one
> > > > > > out as the only remaining patch to system emulation code in the 
> > > > > > hope to
> > > > > > simplify getting it landed. The code has previously been reviewed 
> > > > > > by Stefan
> > > > > > Hajnoczi and Peter Xu. This latest version includes changes to 
> > > > > > switch the
> > > > > > bounce buffer size bookkeeping to `size_t` as requested and LGTM'd 
> > > > > > by Phil in
> > > > > > v9.
> > > > > > ---
> > > > > >  hw/pci/pci.c|  8 
> > > > > >  include/exec/memory.h   | 14 +++
> > > > > >  include/hw/pci/pci_device.h |  3 ++
> > > > > >  system/memory.c |  5 ++-
> > > > > >  system/physmem.c| 82 
> > > > > > ++---
> > > > > >  5 files changed, 76 insertions(+), 36 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > > index fab86d0567..d2caf3ee8b 100644
> > > > > > --- a/hw/pci/pci.c
> > > > > > +++ b/hw/pci/pci.c
> > > > > > @@ -85,6 +85,8 @@ static Property pci_props[] = {
> > > > > >  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
> > > > > >  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
> > > > > >  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> > > > > > +DEFINE_PROP_SIZE32("x-max-bounce-buffer-size"

Re: [PATCH for-9.2 v15 00/11] hw/pci: SR-IOV related fixes and improvements

2024-09-11 Thread Michael S. Tsirkin
On Wed, Sep 11, 2024 at 12:05:46PM +0900, Akihiko Odaki wrote:
> On 2024/09/11 0:27, Michael S. Tsirkin wrote:
> > On Tue, Sep 10, 2024 at 04:13:14PM +0200, Cédric Le Goater wrote:
> > > On 9/10/24 15:34, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 10, 2024 at 03:21:54PM +0200, Cédric Le Goater wrote:
> > > > > On 9/10/24 11:33, Akihiko Odaki wrote:
> > > > > > On 2024/09/10 18:21, Michael S. Tsirkin wrote:
> > > > > > > On Fri, Aug 23, 2024 at 02:00:37PM +0900, Akihiko Odaki wrote:
> > > > > > > > Supersedes: <20240714-rombar-v2-0-af1504ef5...@daynix.com>
> > > > > > > > ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
> > > > > > > > 
> > > > > > > > I submitted a RFC series[1] to add support for SR-IOV emulation 
> > > > > > > > to
> > > > > > > > virtio-net-pci. During the development of the series, I fixed 
> > > > > > > > some
> > > > > > > > trivial bugs and made improvements that I think are 
> > > > > > > > independently
> > > > > > > > useful. This series extracts those fixes and improvements from 
> > > > > > > > the RFC
> > > > > > > > series.
> > > > > > > > 
> > > > > > > > [1]: 
> > > > > > > > https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6d...@daynix.com/
> > > > > > > > 
> > > > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > > 
> > > > > > > I don't think Cédric's issues have been addressed, am I wrong?
> > > > > > > Cédric, what is your take?
> > > > > > 
> > > > > > I put the URI to Cédric's report here:
> > > > > > https://lore.kernel.org/r/75cbc7d9-b48e-4235-85cf-49dacf3c7...@redhat.com
> > > > > > 
> > > > > > This issue was dealt with patch "s390x/pci: Check for multifunction 
> > > > > > after device realization". I found that s390x on QEMU does not 
> > > > > > support multifunction and SR-IOV devices accidentally circumvent 
> > > > > > this restriction, which means igb was never supposed to work with 
> > > > > > s390x. The patch prevents adding SR-IOV devices to s390x to ensure 
> > > > > > the restriction is properly enforced.
> > > > > 
> > > > > yes, indeed and it seems to fix :
> > > > > 
> > > > > 6069bcdeacee ("s390x/pci: Move some hotplug checks to the 
> > > > > pre_plug handler")
> > > > > 
> > > > > I will update patch 4.
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > C.
> > > > > 
> > > > > 
> > > > > That said, the igb device worked perfectly fine under the s390x 
> > > > > machine.
> > > > 
> > > > And it works for you after this patchset, yes?
> > > 
> > > ah no, IGB is not an available device for the s390x machine anymore :
> > > 
> > >qemu-system-s390x: -device igb,netdev=net1,mac=C0:FF:EE:00:00:13: 
> > > multifunction not supported in s390
> > 
> > 
> > So patch 4 didn't relly help.
> > 
> > 
> > > This is what commit 57da367b9ec4 ("s390x/pci: forbid multifunction
> > > pci device") initially required (and later broken by 6069bcdeacee).
> > > So I guess we are fine with the expected behavior.
> > > 
> > > Thanks,
> > > 
> > > C.
> > 
> > Better to fix than to guess if there are users, I think.
> 
> Yes, but it will require some knowledge of s390x, which I cannot provide.
> 
> Commit 57da367b9ec4 ("s390x/pci: forbid multifunction pci device") says
> having a multifunction device will make the guest spin forever. That is not
> what Cédric observed with igb so it may no longer be relevant, but I cannot
> be sure that the problem is resolved without understanding how multifunction
> devices are supposed to work with s390x.
> 
> Ideally someone with s390x expertise should check relevant hardware
> documentation and confirm QEMU properly implements mutlifunction devices.

The fact is, QEMU already does what most users want from it. So the
first rule whenever adding a new feature is not breaking old
functionality.  I know, it's annoying, as some of it is held together by
duct tape.  We have a friendly community so if you ask nicely, you
usually can get help. You'd probably have to be a bit more specific
with your questions.

> Let's keep the restriction until then.
> 
> Regards,
> Akihiko Odaki


Not introducing regressions is a hard rule, sorry.

-- 
MST




Re: [PATCH v5 0/3] vhost-user-blk: live resize additional APIs

2024-09-11 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 03:18:40PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> v5:
> 03: drop extra check on is is runstate running

Causes build failures when generating qdoc.

https://gitlab.com/mstredhat/qemu/-/jobs/7792086965


> 
> Vladimir Sementsov-Ogievskiy (3):
>   qdev-monitor: add option to report GenericError from find_device_state
>   vhost-user-blk: split vhost_user_blk_sync_config()
>   qapi: introduce device-sync-config
> 
>  hw/block/vhost-user-blk.c | 27 ++--
>  hw/virtio/virtio-pci.c|  9 +++
>  include/hw/qdev-core.h|  3 +++
>  qapi/qdev.json| 24 ++
>  system/qdev-monitor.c | 53 ---
>  5 files changed, 105 insertions(+), 11 deletions(-)
> 
> -- 
> 2.34.1




Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors

2024-09-11 Thread Michael S. Tsirkin
On Wed, Sep 11, 2024 at 12:44:59PM +0530, Prasad Pandit wrote:
> Hello Michael,
> 
> On Tue, 10 Sept 2024 at 22:40, Michael S. Tsirkin  wrote:
> > So are we going to see a version with BQL?
> 
> ===
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..96ac0ed93b 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -2079,6 +2079,7 @@ static int vhost_user_postcopy_end(struct
> vhost_dev *dev, Error **errp)
> 
>  trace_vhost_user_postcopy_end_entry();
> 
> +BQL_LOCK_GUARD();
>  ret = vhost_user_write(dev, &msg, NULL, 0);
>  if (ret < 0) {
>  error_setg(errp, "Failed to send postcopy_end to vhost");
> -- 
> 2.43.5
> ===
> 
> * We ran the test with the above BQL patch, but it did not help to fix
> race errors. I'm continuing to debug it, will update here soon.
> 
> Thank you.
> ---
>   - Prasad


Thanks! I have a suspicion there's a path where we wait for the
migration thread until BQL.

-- 
MST




Re: [PATCH] MAINTAINERS: Add myself as a reviewer of VT-d

2024-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 08:29:25PM +0200, Philippe Mathieu-Daudé wrote:
> On 10/9/24 19:14, Michael S. Tsirkin wrote:
> > On Tue, Aug 20, 2024 at 09:51:47AM +, CLEMENT MATHIEU--DRIF wrote:
> > > Signed-off-by: Clément Mathieu--Drif 
> > 
> > Using index info to reconstruct a base tree...
> > error: patch failed: MAINTAINERS:3672
> > error: MAINTAINERS: patch does not apply
> > error: Did you hand edit your patch?
> > It does not apply to blobs recorded in its index.
> 
> I suppose you process your mailbox in FIFO order so you might
> notice it before reading this reply, but I already queued this
> patch since it was languishing and trivial:
> https://lore.kernel.org/qemu-devel/9f4cba51-5aa0-4942-a7a0-6bd3eb29a...@linaro.org/

Thanks!. You can add 
Acked-by: Michael S. Tsirkin 
if you like.

> > 
> > > ---
> > >   MAINTAINERS | 1 +
> > >   1 file changed, 1 insertion(+)
> 




Re: [PATCH 0/5] Interrupt Remap support for emulated amd viommu

2024-09-10 Thread Michael S. Tsirkin
On Wed, Sep 04, 2024 at 05:02:52AM -0500, Santosh Shukla wrote:
> Series adds following feature support for emulated amd vIOMMU
> 1) Pass Through(PT) mode
> 2) Interrupt Remapping(IR) mode
> 
> 1) PT mode
> Introducing the shared 'nodma' memory region that can be aliased
> by all the devices in the PT mode. Shared memory with aliasing
> approach will help run VM faster when lot of devices attached to
> VM.
> 
> 2) IR mode
> Shared IR memory region with aliasing approach proposed for the
> reason mentioned in 1). Also add support to invalidate Interrupt
> remaping table(IRT).
> 
> Series based on f259e4cb8a8b4ef5463326fc214a7d8d7703d5de.


Fails build on non-kvm:

https://gitlab.com/mstredhat/qemu/-/jobs/7791357916

/usr/lib/gcc-cross/i686-linux-gnu/10/../../../../i686-linux-gnu/bin/ld: 
libqemu-x86_64-softmmu.a.p/hw_i386_amd_iommu.c.o: in function 
`amdvi_sysbus_realize':
/builds/mstredhat/qemu/build/../hw/i386/amd_iommu.c:1660: undefined reference 
to `kvm_enable_x2apic'
collect2: error: ld returned 1 exit status


> Testing:
> 1. nvme/fio testing for VM with > 255 vCPU with xtsup=on and x2apic
> enabled
> 2. Windows Server 2022 VM testing for > 255 vCPU.
> 
> Suravee Suthikulpanit (5):
>   amd_iommu: Rename variable mmio to mr_mmio
>   amd_iommu: Add support for pass though mode
>   amd_iommu: Use shared memory region for Interrupt Remapping
>   amd_iommu: Send notification when invaldate interrupt entry cache
>   amd_iommu: Check APIC ID > 255 for XTSup
> 
>  hw/i386/acpi-build.c |  4 +-
>  hw/i386/amd_iommu.c  | 98 +++-
>  hw/i386/amd_iommu.h  |  5 ++-
>  3 files changed, 85 insertions(+), 22 deletions(-)
> 
> -- 
> 2.43.5




Re: [PATCH v1 00/14] s390x: virtio-mem support

2024-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 07:57:55PM +0200, David Hildenbrand wrote:
> This series is based on:
> [PATCH v2] virtio: kconfig: memory devices are PCI only [1]
> 
> I finally found the time (IOW forced myself) to finish virtio-mem
> support on s390x. The last RFC was from 2020, so I won't talk about
> what changed -- a lot changed in the meantime :)
> 
> There is really not much left to do on s390x, because virtio-mem already
> implements most things we need today (e.g., early-migration,
> unplugged-inaccessible). The biggest part of this series is just doing what
> we do with virtio-pci, wiring it up in the machine hotplug handler and ...
> well, messing with the physical memory layout where we can now exceed
> initial RAM size and have sparsity (memory holes).
> 
> I tested a lot of things, including:
>  * Memory hotplug/unplug
>  * Device hotplug/unplug
>  * System resets / reboots
>  * Migrate to/from file (including storage attributes under KVM)
>  * Basic live migration
>  * Basic postcopy live migration
> 
> More details on how to use it on s390x -- which is pretty much how
> we use it on other architectures, except
> s/virtio-mem-pci/virtio-mem-ccw/ --- is in the last patch.
> 
> This series introduces a new diag(500) "STORAGE LIMIT" subcode that will
> be documented at [2] once this+kernel part go upstream.
> 
> There are not many s390x-specific virtio-mem future work items, except:
> * Storage attribute migration might be improved
> * We might want to reset storage attributes of unplugged memory
>   (might or might not be required for upcoming page table reclaim in
>Linux; TBD)


I don't see anything needing virtio specific here, let me know if
I missed anything.
A quick look is fine so I guess you can add
Acked-by: Michael S. Tsirkin 



> I'll send out the kernel driver bits soon.
> 
> [1] https://lkml.kernel.org/r/20240906101658.514470-1-pbonz...@redhat.com
> [2] https://gitlab.com/davidhildenbrand/s390x-os-virt-spec
> 
> Cc: Paolo Bonzini 
> Cc: Thomas Huth 
> Cc: Halil Pasic 
> Cc: Christian Borntraeger 
> Cc: Eric Farman 
> Cc: Richard Henderson 
> Cc: David Hildenbrand 
> Cc: Ilya Leoshkevich 
> Cc: Janosch Frank 
> Cc: "Michael S. Tsirkin" 
> Cc: Cornelia Huck 
> 
> David Hildenbrand (14):
>   s390x/s390-virtio-ccw: don't crash on weird RAM sizes
>   s390x/s390-virtio-hcall: remove hypercall registration mechanism
>   s390x/s390-virtio-hcall: prepare for more diag500 hypercalls
>   s390x: rename s390-virtio-hcall* to s390-hypercall*
>   s390x/s390-virtio-ccw: move setting the maximum guest size from sclp
> to machine code
>   s390x: introduce s390_get_memory_limit()
>   s390x/s390-hypercall: introduce DIAG500 STORAGE_LIMIT
>   s390x/s390-stattrib-kvm: prepare memory devices and sparse memory
> layouts
>   s390x/s390-skeys: prepare for memory devices
>   s390x/pv: check initial, not maximum RAM size
>   s390x/s390-virtio-ccw: prepare for memory devices
>   s390x: introduce s390_get_max_pagesize()
>   s390x/virtio-ccw: add support for virtio based memory devices
>   s390x: virtio-mem support
> 
>  MAINTAINERS|   4 +
>  hw/s390x/Kconfig   |   1 +
>  hw/s390x/meson.build   |   4 +-
>  hw/s390x/s390-hypercall.c  |  77 +++
>  hw/s390x/s390-hypercall.h  |  25 
>  hw/s390x/s390-skeys.c  |   4 +-
>  hw/s390x/s390-stattrib-kvm.c   |  63 +
>  hw/s390x/s390-virtio-ccw.c | 143 +
>  hw/s390x/s390-virtio-hcall.c   |  41 --
>  hw/s390x/s390-virtio-hcall.h   |  25 
>  hw/s390x/sclp.c|  17 +--
>  hw/s390x/virtio-ccw-md.c   | 153 ++
>  hw/s390x/virtio-ccw-md.h   |  44 +++
>  hw/s390x/virtio-ccw-mem.c  | 226 +
>  hw/s390x/virtio-ccw-mem.h  |  34 +
>  hw/virtio/Kconfig  |   1 +
>  hw/virtio/virtio-mem.c |   4 +-
>  target/s390x/cpu-sysemu.c  |  35 -
>  target/s390x/cpu.h |   2 +
>  target/s390x/kvm/kvm.c |  12 +-
>  target/s390x/kvm/pv.c  |   2 +-
>  target/s390x/tcg/misc_helper.c |   6 +-
>  22 files changed, 746 insertions(+), 177 deletions(-)
>  create mode 100644 hw/s390x/s390-hypercall.c
>  create mode 100644 hw/s390x/s390-hypercall.h
>  delete mode 100644 hw/s390x/s390-virtio-hcall.c
>  delete mode 100644 hw/s390x/s390-virtio-hcall.h
>  create mode 100644 hw/s390x/virtio-ccw-md.c
>  create mode 100644 hw/s390x/virtio-ccw-md.h
>  create mode 100644 hw/s390x/virtio-ccw-mem.c
>  create mode 100644 hw/s390x/virtio-ccw-mem.h
> 
> -- 
> 2.46.0




Re: [PATCH v1] virtio-mem: don't warn about THP sizes on a kernel without THP support

2024-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 06:34:33PM +0200, David Hildenbrand wrote:
> If the config directory in sysfs does not exist at all, we are dealing
> with a system that does not support THPs. Simply use 1 MiB block size
> then, instead of warning "Could not detect THP size, falling back to
> ..." and falling back to the default THP size.
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Gavin Shan 
> Cc: Juraj Marcin 
> Signed-off-by: David Hildenbrand 

Okay, I picked it up. But I have a question

> ---
>  hw/virtio/virtio-mem.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index ef64bf1b4a..4075f3d4ce 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -88,6 +88,7 @@ static uint32_t virtio_mem_default_thp_size(void)
>  static uint32_t thp_size;
>  
>  #define HPAGE_PMD_SIZE_PATH 
> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
> +#define HPAGE_PATH "/sys/kernel/mm/transparent_hugepage/"


If this code runs e.g. on windows, it will poke at cwd root with
unpredictable results.
It doesn't look like this is linux specific, did I miss anything?

>  static uint32_t virtio_mem_thp_size(void)
>  {
>  gchar *content = NULL;
> @@ -98,6 +99,12 @@ static uint32_t virtio_mem_thp_size(void)
>  return thp_size;
>  }
>  
> +/* No THP -> no restrictions. */
> +if (!g_file_test(HPAGE_PATH, G_FILE_TEST_EXISTS)) {
> +thp_size = VIRTIO_MEM_MIN_BLOCK_SIZE;
> +return thp_size;
> +}
> +
>  /*
>   * Try to probe the actual THP size, fallback to (sane but eventually
>   * incorrect) default sizes.
> -- 
> 2.46.0




Re: [PATCH v2 0/7] Introduce SMP Cache Topology

2024-09-10 Thread Michael S. Tsirkin
On Sun, Sep 08, 2024 at 08:59:13PM +0800, Zhao Liu wrote:
> Hi all,
> 
> Compared with previous Patch v1 [1], I've put the cache properties list
> into -machine, this is to meet current needs and also remain compatible
> with my future topology support (more discussion details, pls refer [2]).
> 
> This series is based on the commit 1581a0bc928d ("Merge tag 'pull-ufs-
> 20240906' of https://gitlab.com/jeuk20.kim/qemu into staging ufs
> queue").


Needs review from QOM maintainers.

> Background
> ==
> 
> The x86 and ARM (RISCV) need to allow user to configure cache properties
> (current only topology):
>  * For x86, the default cache topology model (of max/host CPU) does not
>always match the Host's real physical cache topology. Performance can
>increase when the configured virtual topology is closer to the
>physical topology than a default topology would be.
>  * For ARM, QEMU can't get the cache topology information from the CPU
>registers, then user configuration is necessary. Additionally, the
>cache information is also needed for MPAM emulation (for TCG) to
>build the right PPTT. (Originally from Jonathan)
> 
> 
> About smp-cache
> ===
> 
> In this version, smp-cache is implemented as a array integrated in
> -machine. Though -machine currently can't support JSON format, this is
> the one of the directions of future.
> 
> An example is as follows:
> 
> smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> 
> "cache" specifies the cache that the properties will be applied on. This
> field is the combination of cache level and cache type. Now it supports
> "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
> cache) and "l3" (L3 unified cache).
> 
> "topology" field accepts CPU topology levels including "thread", "core",
> "module", "cluster", "die", "socket", "book", "drawer" and a special
> value "default".
> 
> The "default" is introduced to make it easier for libvirt to set a
> default parameter value without having to care about the specific
> machine (because currently there is no proper way for machine to
> expose supported topology levels and caches).
> 
> If "default" is set, then the cache topology will follow the
> architecture's default cache topology model. If other CPU topology level
> is set, the cache will be shared at corresponding CPU topology level.
> 
> 
> Welcome your comment!
> 
> 
> [1]: Patch v1: 
> https://lore.kernel.org/qemu-devel/20240704031603.1744546-1-zhao1@intel.com/
> [2]: API disscussion: 
> https://lore.kernel.org/qemu-devel/8734ndj33j@pond.sub.org/
> 
> Thanks and Best Regards,
> Zhao
> ---
> Changelog:
> 
> Main changes since Patch v1:
>  * Dropped handwriten smp-cache object and integrated cache properties
>list into MachineState and used -machine to configure SMP cache
>properties. (Markus)
>  * Dropped prefix of CpuTopologyLevel enumeration. (Markus)
>  * Rename CPU_TOPO_LEVEL_* to CPU_TOPOLOGY_LEVEL_* to match the QAPI's
>generated code. (Markus)
>  * Renamed SMPCacheProperty/SMPCacheProperties (QAPI structures) to
>SmpCacheProperties/SmpCachePropertiesWrapper. (Markus)
>  * Renamed SMPCacheName (QAPI structure) to SmpCacheLevelAndType and
>dropped prefix. (Markus)
>  * Renamed 'name' field in SmpCacheProperties to 'cache', since the
>type and level of the cache in SMP system could be able to specify
>all of these kinds of cache explicitly enough.
>  * Renamed 'topo' field in SmpCacheProperties to 'topology'. (Markus)
>  * Returned error information when user repeats setting cache
>properties. (Markus)
>  * Renamed SmpCacheLevelAndType to CacheLevelAndType, since this
>representation is general across SMP or hybrid system.
>  * Dropped machine_check_smp_cache_support() and did the check when
>-machine parses smp-cache in machine_parse_smp_cache().
> 
> Main changes since RFC v2:
>  * Dropped cpu-topology.h and cpu-topology.c since QAPI has the helper
>(CpuTopologyLevel_str) to convert enum to string. (Markus)
>  * Fixed text format in machine.json (CpuTopologyLevel naming, 2 spaces
>between sentences). (Markus)
>  * Added a new level "default" to de-compatibilize some arch-specific
>topo settings. (Daniel)
>  * Moved CpuTopologyLevel to qapi/machine-common.json, at where the
>cache enumeration and smp-cache object would be added.
>- If smp-cache object is defined in qapi/machine.json, storage-daemon
>  will complain about the qmp cmds in qapi/machine.json during
>  compiling.
>  * Referred to Daniel's suggestion to introduce cache JSON list, though
>as a standalone object since -smp/-machine can't support JSON.
>  * Linked machine's smp_cache to smp-cache object instead of a builtin
>structure. This is to get around the fact that the keyval format of
>-machine can't supp

Re: [PATCH 1/2] acpi: ged: Add macro for acpi ged sleep register

2024-09-10 Thread Michael S. Tsirkin
On Fri, Sep 06, 2024 at 10:19:42AM +0800, Bibo Mao wrote:
> Macro definition is added for acpi ged sleep register, so that ged
> emulation driver can use this, also it can be used in FDT table if
> ged is exposed with FDT table.
> 
> Signed-off-by: Bibo Mao 
> ---
>  hw/acpi/generic_event_device.c | 6 +++---
>  include/hw/acpi/generic_event_device.h | 3 +++
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 15b4c3ebbf..10a338877c 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -201,9 +201,9 @@ static void ged_regs_write(void *opaque, hwaddr addr, 
> uint64_t data,
>  
>  switch (addr) {
>  case ACPI_GED_REG_SLEEP_CTL:
> -slp_typ = (data >> 2) & 0x07;
> -slp_en  = (data >> 5) & 0x01;
> -if (slp_en && slp_typ == 5) {
> +slp_typ = (data & ACPI_GED_SLP_TYP_MASK) >> ACPI_GED_SLP_TYP_SHIFT;
> +slp_en  = !!(data  & ACPI_GED_SLP_ENABLE);
> +if (slp_en && slp_typ == ACPI_GED_SLP_TYP_S5) {
>  qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>  }
>  return;
> diff --git a/include/hw/acpi/generic_event_device.h 
> b/include/hw/acpi/generic_event_device.h
> index 40af3550b5..526fea6efe 100644
> --- a/include/hw/acpi/generic_event_device.h
> +++ b/include/hw/acpi/generic_event_device.h
> @@ -82,7 +82,10 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
>  #define ACPI_GED_RESET_VALUE   0x42
>  
>  /* ACPI_GED_REG_SLEEP_CTL.SLP_TYP value for S5 (aka poweroff) */
> +#define ACPI_GED_SLP_TYP_SHIFT 0x02
>  #define ACPI_GED_SLP_TYP_S50x05
> +#define ACPI_GED_SLP_TYP_MASK  0x1C
> +#define ACPI_GED_SLP_ENABLE0x20

The comment is wrong now, isn't it?
Pls document each value, copying name from spec verbatim.

>  
>  #define GED_DEVICE  "GED"
>  #define AML_GED_EVT_REG "EREG"
> -- 
> 2.39.3




Re: [PATCH] docs: fix vhost-user protocol doc

2024-09-10 Thread Michael S. Tsirkin
On Sun, Sep 08, 2024 at 10:49:54PM +0800, luzhixing12345 wrote:
> >On Fri, Sep 06, 2024 at 10:10:45AM +0800, luzhixing12345 wrote:
> >> Hi, can someone help review this patch?
> >> 
> >> Signed-off-by: luzhixing12345 
> >
> >You got comments Aug 5, pls address them.
> 
> ok, the comments are addressed.

Pls post v2, list the comments addressed in the changelog,
coming after ---

Thanks!


> >On Sun, Aug 04, 2024 at 01:04:20PM GMT, luzhixing12345 wrote:
> >>add a ref link to Memory region description
> >>
> >>add extra type(64 bits) to Log description structure fields
> >>
> >>fix 's to 's
> >>
> >>---
> >> docs/interop/vhost-user.rst | 22 +-
> >> 1 file changed, 13 insertions(+), 9 deletions(-)
> >
> >Please run `scripts/checkpatch.pl` before sending.
> >
> >S-o-b missing here.
> >
> >>
> >>diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >>index d8419fd2f1..e34b305bd9 100644
> >>--- a/docs/interop/vhost-user.rst
> >>+++ b/docs/interop/vhost-user.rst
> >>@@ -167,6 +167,8 @@ A vring address description
> >> Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
> >> been negotiated. Otherwise it is a user address.
> >>
> >>+.. _memory_region_description:
> >>+
> >> Memory region description
> >> ^
> >>
> >>@@ -180,7 +182,7 @@ Memory region description
> >>
> >> :user address: a 64-bit user address
> >>
> >>-:mmap offset: 64-bit offset where region starts in the mapped memory
> >>+:mmap offset: a 64-bit offset where region starts in the mapped memory
> >>
> >> When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been
> >> successfully negotiated, the memory region description contains two extra
> >>@@ -190,7 +192,7 @@ fields at the end.
> >> | guest address | size | user address | mmap offset | xen mmap flags | 
> >> domid |
> >> +---+--+--+-++---+
> >>
> >>-:xen mmap flags: 32-bit bit field
> >>+:xen mmap flags: a 32-bit bit field
> >>
> >> - Bit 0 is set for Xen foreign memory mapping.
> >> - Bit 1 is set for Xen grant memory mapping.
> >>@@ -211,6 +213,8 @@ Single memory region description
> >>
> >> :padding: 64-bit
> >>
> >>+:region: :ref:`Memory region description `
> >>+
> >> A region is represented by Memory region description.
> >
> >Should we merge this line with the one added?
> 
> Desciptions about memory regions are merged into one line.
> 
> >>
> >> Multiple Memory regions description
> >
> >Should we extend also the Multiple Memory region description?
> 
> Yes, this patch adds a ref link to it.
> 
> >>@@ -233,9 +237,9 @@ Log description
> >> | log size | log offset |
> >> +--++
> >>
> >>-:log size: size of area used for logging
> >>+:log size: a 64-bit size of area used for logging
> >>
> >>-:log offset: offset from start of supplied file descriptor where
> >>+:log offset: a 64-bit offset from start of supplied file descriptor where
> >>  logging starts (i.e. where guest address 0 would be
> >>  logged)
> >>
> >>@@ -382,7 +386,7 @@ the kernel implementation.
> >>
> >> The communication consists of the *front-end* sending message requests and
> >> the *back-end* sending message replies. Most of the requests don't require
> >>-replies. Here is a list of the ones that do:
> >>+replies, except for the following requests:
> >>
> >> * ``VHOST_USER_GET_FEATURES``
> >> * ``VHOST_USER_GET_PROTOCOL_FEATURES``
> >>@@ -1239,11 +1243,11 @@ Front-end message types
> >>   (*a vring descriptor index for split virtqueues* vs. *vring descriptor
> >>   indices for packed virtqueues*).
> >>
> >>-  When and as long as all of a device's vrings are stopped, it is
> >>+  When and as long as all of a device's vrings are stopped, it is
> >>   *suspended*, see :ref:`Suspended device state
> >>   `.
> >>
> >>-  The request payload's *num* field is currently reserved and must be
> >>+  The request payload's *num* field is currently reserved and must be
> >>   set to 0.
> >>
> >> ``VHOST_USER_SET_VRING_KICK``
> >>@@ -1662,7 +1666,7 @@ Front-end message types
> >>   :reply payload: ``u64``
> >>
> >>   Front-end and back-end negotiate a channel over which to transfer the
> >>-  back-end's internal state during migration.  Either side (front-end or
> >>+  back-end's internal state during migration.  Either side (front-end or
> >>   back-end) may create the channel.  The nature of this channel is not
> >>   restricted or defined in this document, but whichever side creates it
> >>   must create a file descriptor that is provided to the respectively
> >>@@ -1714,7 +1718,7 @@ Front-end message types
> >>   :request payload: N/A
> >>   :reply payload: ``u64``
> >>
> >>-  After transferring the back-end's internal state during migration (see
> >>+  After transferring the back-end's internal state during migration (see
> >>   the :ref:`Migrating back-end state `
> >>   section), check whether the back-end was able to successfully fully

Re: [PATCH] docs/devel: Prohibit calling object_unparent() for memory region

2024-09-10 Thread Michael S. Tsirkin
On Thu, Aug 29, 2024 at 02:46:48PM +0900, Akihiko Odaki wrote:
> Previously it was allowed to call object_unparent() for a memory region
> in instance_finalize() of its parent. However, such a call typically
> has no effect because child objects get unparented before
> instance_finalize().
> 
> Worse, memory regions typically gets finalized when they get unparented
> before instance_finalize(). This means calling object_unparent() for
> them in instance_finalize() is to call the function for an object
> already finalized, which should be avoided.
> 
> Signed-off-by: Akihiko Odaki 


Acked-by: Michael S. Tsirkin 

who's applying this? Paolo?

> ---
>  docs/devel/memory.rst | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst
> index 69c5e3f914ac..83760279e3db 100644
> --- a/docs/devel/memory.rst
> +++ b/docs/devel/memory.rst
> @@ -168,11 +168,10 @@ and VFIOQuirk in hw/vfio/pci.c.
>  
>  You must not destroy a memory region as long as it may be in use by a
>  device or CPU.  In order to do this, as a general rule do not create or
> -destroy memory regions dynamically during a device's lifetime, and only
> -call object_unparent() in the memory region owner's instance_finalize
> -callback.  The dynamically allocated data structure that contains the
> -memory region then should obviously be freed in the instance_finalize
> -callback as well.
> +destroy memory regions dynamically during a device's lifetime, and do not
> +call object_unparent().  The dynamically allocated data structure that 
> contains
> +the memory region then should be freed in the instance_finalize callback, 
> which
> +is called after it gets unparented.
>  
>  If you break this rule, the following situation can happen:
>  
> @@ -199,8 +198,9 @@ but nevertheless it is used in a few places.
>  
>  For regions that "have no owner" (NULL is passed at creation time), the
>  machine object is actually used as the owner.  Since instance_finalize is
> -never called for the machine object, you must never call object_unparent
> -on regions that have no owner, unless they are aliases or containers.
> +never called for the machine object, you must never free regions that have no
> +owner, unless they are aliases or containers, which you can manually call
> +object_unparent() for.
>  
>  
>  Overlapping regions and priority
> 
> ---
> base-commit: 31669121a01a14732f57c49400bc239cf9fd505f
> change-id: 20240829-memory-cfd3ee0af44d
> 
> Best regards,
> -- 
> Akihiko Odaki 




Re: [PATCH v5 0/3] Upgrade ACPI SPCR table to support SPCR table revision 4 format

2024-09-10 Thread Michael S. Tsirkin
On Wed, Aug 28, 2024 at 06:59:17PM -0700, Sia Jee Heng wrote:
> Update the SPCR table to accommodate the SPCR Table revision 4 [1].
> The SPCR table has been modified to adhere to the revision 4 format [2].
> 
> Meanwhile, the virt SPCR golden reference file for RISC-V have been updated to
> accommodate the SPCR Table revision 4.
> 
> [1]: 
> https://learn.microsoft.com/en-us/windows-hardware/drivers/serports/serial-port-console-redirection-table
> [2]: https://github.com/acpica/acpica/pull/931

Seems most appropriate on riscv5 tree.
The code looks ok.

Reviewed-by: Michael S. Tsirkin 


> Changes in v5:
> - Reverted the SPCR table revision history for the ARM architecture.
> - Corrected the output of the SPCR Table diff.
> 
> Changes in v4:
> - Remove the SPCR table revision 4 update for the ARM architecture.
> 
> Changes in v3:
> - Rebased on the latest QEMU.
> - Added Acked-by: Alistair Francis 
> 
> Changes in v2:
> - Utilizes a three-patch approach to modify the ACPI pre-built binary
>   files required by the Bios-Table-Test.
> - Rebases and incorporates changes to support both ARM and RISC-V ACPI
>   pre-built binary files.
> 
> Sia Jee Heng (3):
>   qtest: allow SPCR acpi table changes
>   hw/acpi: Upgrade ACPI SPCR table to support SPCR table revision 4
> format
>   tests/qtest/bios-tables-test: Update virt SPCR golden reference for
> RISC-V
> 
>  hw/acpi/aml-build.c   |  20 
>  hw/arm/virt-acpi-build.c  |   8 ++--
>  hw/riscv/virt-acpi-build.c|  12 +---
>  include/hw/acpi/acpi-defs.h   |   7 +--
>  include/hw/acpi/aml-build.h   |   2 +-
>  tests/data/acpi/riscv64/virt/SPCR | Bin 80 -> 90 bytes
>  6 files changed, 37 insertions(+), 12 deletions(-)
> 
> 
> base-commit: cec99171931ea79215c79661d33423ac84e63b6e
> -- 
> 2.34.1




Re: [PATCH] MAINTAINERS: Add myself as a reviewer of VT-d

2024-09-10 Thread Michael S. Tsirkin
On Tue, Aug 20, 2024 at 09:51:47AM +, CLEMENT MATHIEU--DRIF wrote:
> Signed-off-by: Clément Mathieu--Drif 

Using index info to reconstruct a base tree...
error: patch failed: MAINTAINERS:3672
error: MAINTAINERS: patch does not apply
error: Did you hand edit your patch?
It does not apply to blobs recorded in its index.

> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3584d6a6c6..b12973f595 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3672,6 +3672,7 @@ VT-d Emulation
>  M: Michael S. Tsirkin 
>  R: Jason Wang 
>  R: Yi Liu 
> +R: Clément Mathieu--Drif 
>  S: Supported
>  F: hw/i386/intel_iommu.c
>  F: hw/i386/intel_iommu_internal.h


I did it manually, for once.

> -- 
> 2.45.2




Re: [PATCH v2 0/2] Postcopy migration and vhost-user errors

2024-09-10 Thread Michael S. Tsirkin
On Wed, Aug 28, 2024 at 03:39:12PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit 
> 
> Hello,
> 
> * virsh(1) offers multiple options to initiate Postcopy migration:
> 
> 1) virsh migrate --postcopy --postcopy-after-precopy
> 2) virsh migrate --postcopy + virsh migrate-postcopy
> 3) virsh migrate --postcopy --timeout  --timeout-postcopy
> 
> When Postcopy migration is invoked via method (2) or (3) above,
> the migrated guest on the destination host hangs sometimes.
> 
> * During Postcopy migration, multiple threads are spawned on the destination
> host to start the guest and setup devices. One such thread starts vhost
> device via vhost_dev_start() function and another called fault_thread handles
> page faults in user space using kernel's userfaultfd(2) system.
> 
> * When fault_thread exits upon completion of Postcopy migration, it sends a
> 'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
> message is sent while vhost device is being setup via vhost_dev_start().
> 
>  Thread-1  Thread-2
> 
> vhost_dev_startpostcopy_ram_incoming_cleanup
>  vhost_device_iotlb_misspostcopy_notify
>   vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
> process_message_reply  process_message_reply
>  vhost_user_readvhost_user_read
>   vhost_user_read_header vhost_user_read_header
>"Fail to update device iotlb"  "Failed to receive reply to 
> postcopy_end"
> 
> This creates confusion when vhost device receives 'postcopy_end' message while
> it is still trying to update IOTLB entries.
> 
> This seems to leave the guest in a stranded/hung state because fault_thread
> has exited saying Postcopy migration has ended, but vhost-device is probably
> still expecting updates. QEMU logs following errors on the destination host
> ===
> ...
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
> Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to 
> postcopy_end
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
> Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
> Flags 0x8 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
> Flags 0x16 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
> Flags 0x0 instead of 0x5.
> qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
> ===


So are we going to see a version with BQL?

> * Couple of patches here help to fix/handle these errors.
> 
> Thank you.
> ---
> Prasad Pandit (2):
>   vhost: fail device start if iotlb update fails
>   vhost-user: add a request-reply lock
> 
>  hw/virtio/vhost-user.c | 74 ++
>  hw/virtio/vhost.c  |  6 ++-
>  include/hw/virtio/vhost-user.h |  3 ++
>  3 files changed, 82 insertions(+), 1 deletion(-)
> 
> --
> 2.46.0




Re: [PATCH] softmmu: Support concurrent bounce buffers

2024-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 06:10:50PM +0200, Mattias Nissler wrote:
> On Tue, Sep 10, 2024 at 5:44 PM Peter Maydell  
> wrote:
> >
> > On Tue, 10 Sept 2024 at 15:53, Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Aug 19, 2024 at 06:54:54AM -0700, Mattias Nissler wrote:
> > > > When DMA memory can't be directly accessed, as is the case when
> > > > running the device model in a separate process without shareable DMA
> > > > file descriptors, bounce buffering is used.
> > > >
> > > > It is not uncommon for device models to request mapping of several DMA
> > > > regions at the same time. Examples include:
> > > >  * net devices, e.g. when transmitting a packet that is split across
> > > >several TX descriptors (observed with igb)
> > > >  * USB host controllers, when handling a packet with multiple data TRBs
> > > >(observed with xhci)
> > > >
> > > > Previously, qemu only provided a single bounce buffer per AddressSpace
> > > > and would fail DMA map requests while the buffer was already in use. In
> > > > turn, this would cause DMA failures that ultimately manifest as hardware
> > > > errors from the guest perspective.
> > > >
> > > > This change allocates DMA bounce buffers dynamically instead of
> > > > supporting only a single buffer. Thus, multiple DMA mappings work
> > > > correctly also when RAM can't be mmap()-ed.
> > > >
> > > > The total bounce buffer allocation size is limited individually for each
> > > > AddressSpace. The default limit is 4096 bytes, matching the previous
> > > > maximum buffer size. A new x-max-bounce-buffer-size parameter is
> > > > provided to configure the limit for PCI devices.
> > > >
> > > > Signed-off-by: Mattias Nissler 
> > > > Reviewed-by: Philippe Mathieu-Daudé 
> > > > Acked-by: Peter Xu 
> > > > ---
> > > > This patch is split out from my "Support message-based DMA in vfio-user 
> > > > server"
> > > > series. With the series having been partially applied, I'm splitting 
> > > > this one
> > > > out as the only remaining patch to system emulation code in the hope to
> > > > simplify getting it landed. The code has previously been reviewed by 
> > > > Stefan
> > > > Hajnoczi and Peter Xu. This latest version includes changes to switch 
> > > > the
> > > > bounce buffer size bookkeeping to `size_t` as requested and LGTM'd by 
> > > > Phil in
> > > > v9.
> > > > ---
> > > >  hw/pci/pci.c|  8 
> > > >  include/exec/memory.h   | 14 +++
> > > >  include/hw/pci/pci_device.h |  3 ++
> > > >  system/memory.c |  5 ++-
> > > >  system/physmem.c| 82 ++---
> > > >  5 files changed, 76 insertions(+), 36 deletions(-)
> > > >
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index fab86d0567..d2caf3ee8b 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -85,6 +85,8 @@ static Property pci_props[] = {
> > > >  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
> > > >  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
> > > >  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> > > > +DEFINE_PROP_SIZE32("x-max-bounce-buffer-size", PCIDevice,
> > > > + max_bounce_buffer_size, 
> > > > DEFAULT_MAX_BOUNCE_BUFFER_SIZE),
> > > >  DEFINE_PROP_END_OF_LIST()
> > > >  };
> > > >
> > >
> > > I'm a bit puzzled by now there being two fields named
> > > max_bounce_buffer_size, one directly controllable by
> > > a property.
> 
> One is one the pci device, the other is on the address space. The
> former can be set via a command line parameter, and that value is used
> to initialize the field on the address space, which is then consulted
> when allocating bounce buffers.
> 
> I'm not sure which aspect of this is unclear and/or deserves
> additional commenting - let me know and I'll be happy to send a patch.

I'd document what does each field do.

> > >
> > > Pls add code comments explaining how they are related.
> > >
> > >
> > > Also, what is the point of adding a property without
> > > making it part of an API? No one will be able to rely on
> > > it working.
> >
> > Note that this patch is already upstream as commit 637b0aa13.
> >
> > thanks
> > -- PMM

Maybe you can answer this?




Re: [PATCH for-9.2] hw: add compat machines for 9.2

2024-09-10 Thread Michael S. Tsirkin
On Thu, Sep 05, 2024 at 08:05:14PM +0100, Peter Maydell wrote:
> On Thu, 5 Sept 2024 at 19:22, Daniel P. Berrangé  wrote:
> >
> > On Fri, Aug 16, 2024 at 11:47:16AM +0100, Daniel P. Berrangé wrote:
> > > On Fri, Aug 16, 2024 at 12:37:23PM +0200, Cornelia Huck wrote:
> > > > Add 9.2 machine types for arm/i440fx/m68k/q35/s390x/spapr.
> > > >
> > > > Signed-off-by: Cornelia Huck 
> > > > ---
> > > >  hw/arm/virt.c  |  9 -
> > > >  hw/core/machine.c  |  3 +++
> > > >  hw/i386/pc.c   |  3 +++
> > > >  hw/i386/pc_piix.c  | 15 ---
> > > >  hw/i386/pc_q35.c   | 13 +++--
> > > >  hw/m68k/virt.c |  9 -
> > > >  hw/ppc/spapr.c | 15 +--
> > > >  hw/s390x/s390-virtio-ccw.c | 14 +-
> > > >  include/hw/boards.h|  3 +++
> > > >  include/hw/i386/pc.h   |  3 +++
> > > >  10 files changed, 77 insertions(+), 10 deletions(-)
> > >
> > > Reviewed-by: Daniel P. Berrangé 
> > >
> > >
> > > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> > > > index d9e69243b4a7..746bfe05d386 100644
> > > > --- a/hw/i386/pc_piix.c
> > > > +++ b/hw/i386/pc_piix.c
> > > > @@ -479,13 +479,24 @@ static void 
> > > > pc_i440fx_machine_options(MachineClass *m)
> > > >   "Use a different south bridge 
> > > > than PIIX3");
> > > >  }
> > > >
> > > > -static void pc_i440fx_machine_9_1_options(MachineClass *m)
> > > > +static void pc_i440fx_machine_9_2_options(MachineClass *m)
> > > >  {
> > > >  pc_i440fx_machine_options(m);
> > > >  m->alias = "pc";
> > > >  m->is_default = true;
> > > >  }
> > > >
> > > > +DEFINE_I440FX_MACHINE(9, 2);
> > > > +
> > > > +static void pc_i440fx_machine_9_1_options(MachineClass *m)
> > > > +{
> > > > +pc_i440fx_machine_9_2_options(m);
> > > > +m->alias = NULL;
> > > > +m->is_default = false;
> > > > +compat_props_add(m->compat_props, hw_compat_9_1, 
> > > > hw_compat_9_1_len);
> > > > +compat_props_add(m->compat_props, pc_compat_9_1, 
> > > > pc_compat_9_1_len);
> > > > +}
> > > > +
> > > >  DEFINE_I440FX_MACHINE(9, 1);
> > > >
> > > >  static void pc_i440fx_machine_9_0_options(MachineClass *m)
> > > > @@ -493,8 +504,6 @@ static void 
> > > > pc_i440fx_machine_9_0_options(MachineClass *m)
> > > >  PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> > > >
> > > >  pc_i440fx_machine_9_1_options(m);
> > > > -m->alias = NULL;
> > > > -m->is_default = false;
> > > >  m->smbios_memory_device_size = 16 * GiB;
> > >
> > > Feels like we should be adding an "_AS_LATEST" macro
> > > variant for piix/q35 too, so it matches the pattern
> > > in other targets for handling alias & is_default.
> > >
> > > Not a thing your patch needs todo though.
> >
> > I've just a patch that does that now. If it looks good & you want to include
> > it as a pre-requisite for your patch here feel free to grab, otherwise I can
> > rebase it after your patch merges.
> 
> I have this patch in my target-arm pullreq that's currently posted
> and pending merge, by the way.
> 
> -- PMM

Ok feel free to tack on

Reviewed-by: Michael S. Tsirkin 





Re: [PATCH for-9.2 v15 00/11] hw/pci: SR-IOV related fixes and improvements

2024-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 04:13:14PM +0200, Cédric Le Goater wrote:
> On 9/10/24 15:34, Michael S. Tsirkin wrote:
> > On Tue, Sep 10, 2024 at 03:21:54PM +0200, Cédric Le Goater wrote:
> > > On 9/10/24 11:33, Akihiko Odaki wrote:
> > > > On 2024/09/10 18:21, Michael S. Tsirkin wrote:
> > > > > On Fri, Aug 23, 2024 at 02:00:37PM +0900, Akihiko Odaki wrote:
> > > > > > Supersedes: <20240714-rombar-v2-0-af1504ef5...@daynix.com>
> > > > > > ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
> > > > > > 
> > > > > > I submitted a RFC series[1] to add support for SR-IOV emulation to
> > > > > > virtio-net-pci. During the development of the series, I fixed some
> > > > > > trivial bugs and made improvements that I think are independently
> > > > > > useful. This series extracts those fixes and improvements from the 
> > > > > > RFC
> > > > > > series.
> > > > > > 
> > > > > > [1]: 
> > > > > > https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6d...@daynix.com/
> > > > > > 
> > > > > > Signed-off-by: Akihiko Odaki 
> > > > > 
> > > > > I don't think Cédric's issues have been addressed, am I wrong?
> > > > > Cédric, what is your take?
> > > > 
> > > > I put the URI to Cédric's report here:
> > > > https://lore.kernel.org/r/75cbc7d9-b48e-4235-85cf-49dacf3c7...@redhat.com
> > > > 
> > > > This issue was dealt with patch "s390x/pci: Check for multifunction 
> > > > after device realization". I found that s390x on QEMU does not support 
> > > > multifunction and SR-IOV devices accidentally circumvent this 
> > > > restriction, which means igb was never supposed to work with s390x. The 
> > > > patch prevents adding SR-IOV devices to s390x to ensure the restriction 
> > > > is properly enforced.
> > > 
> > > yes, indeed and it seems to fix :
> > > 
> > >6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug 
> > > handler")
> > > 
> > > I will update patch 4.
> > > 
> > > 
> > > Thanks,
> > > 
> > > C.
> > > 
> > > 
> > > That said, the igb device worked perfectly fine under the s390x machine.
> > 
> > And it works for you after this patchset, yes?
> 
> ah no, IGB is not an available device for the s390x machine anymore :
> 
>   qemu-system-s390x: -device igb,netdev=net1,mac=C0:FF:EE:00:00:13: 
> multifunction not supported in s390


So patch 4 didn't relly help.


> This is what commit 57da367b9ec4 ("s390x/pci: forbid multifunction
> pci device") initially required (and later broken by 6069bcdeacee).
> So I guess we are fine with the expected behavior.
> 
> Thanks,
> 
> C.

Better to fix than to guess if there are users, I think.

-- 
MST




Re: [PATCH] vhost-user: rewrite vu_dispatch with if-else

2024-09-10 Thread Michael S. Tsirkin
On Sun, Aug 04, 2024 at 10:23:53PM +0800, luzhixing12345 wrote:
> rewrite with if-else instead of goto
> 
> and I have a question, in two incorrent cases
> 
> - need reply but no reply_requested
> - no need reply but has reply_requested
> 
> should we call vu_panic or print warning message?

this is not how you post a patch to the list.


> ---
>  subprojects/libvhost-user/libvhost-user.c | 39 +--
>  subprojects/libvhost-user/libvhost-user.h |  6 ++--
>  2 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 9c630c2170..187e25f9bb 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -2158,32 +2158,39 @@ vu_dispatch(VuDev *dev)
>  {
>  VhostUserMsg vmsg = { 0, };
>  int reply_requested;
> -bool need_reply, success = false;
> +bool need_reply, success = true;
>  
>  if (!dev->read_msg(dev, dev->sock, &vmsg)) {
> -goto end;
> +success = false;
> +free(vmsg.data);
> +return success;
>  }
>  
>  need_reply = vmsg.flags & VHOST_USER_NEED_REPLY_MASK;
>  
>  reply_requested = vu_process_message(dev, &vmsg);
> -if (!reply_requested && need_reply) {
> -vmsg_set_reply_u64(&vmsg, 0);
> -reply_requested = 1;
> -}
> -
> -if (!reply_requested) {
> -success = true;
> -goto end;
> -}
>  
> -if (!vu_send_reply(dev, dev->sock, &vmsg)) {
> -goto end;
> +if (need_reply) {
> +if (reply_requested) {
> +if (!vu_send_reply(dev, dev->sock, &vmsg)) {
> +success = false;
> +}
> +} else {
> +// need reply but no reply requested, return 0(u64)
> +vmsg_set_reply_u64(&vmsg, 0);
> +if (!vu_send_reply(dev, dev->sock, &vmsg)) {
> +success = false;
> +}
> +}
> +} else {
> +// no need reply but reply requested, send a reply
> +if (reply_requested) {
> +if (!vu_send_reply(dev, dev->sock, &vmsg)) {
> +success = false;
> +}
> +}
>  }
>  
> -success = true;
> -
> -end:
>  free(vmsg.data);
>  return success;
>  }
> diff --git a/subprojects/libvhost-user/libvhost-user.h 
> b/subprojects/libvhost-user/libvhost-user.h
> index deb40e77b3..2daf8578f6 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -238,6 +238,8 @@ typedef struct VuDev VuDev;
>  
>  typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
>  typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
> +typedef uint64_t (*vu_get_protocol_features_cb) (VuDev *dev);
> +typedef void (*vu_set_protocol_features_cb) (VuDev *dev, uint64_t features);
>  typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
>int *do_reply);
>  typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
> @@ -256,9 +258,9 @@ typedef struct VuDevIface {
>  vu_set_features_cb set_features;
>  /* get the protocol feature bitmask from the underlying vhost
>   * implementation */
> -vu_get_features_cb get_protocol_features;
> +vu_get_protocol_features_cb get_protocol_features;
>  /* enable protocol features in the underlying vhost implementation. */
> -vu_set_features_cb set_protocol_features;
> +vu_set_protocol_features_cb set_protocol_features;
>  /* process_msg is called for each vhost-user message received */
>  /* skip libvhost-user processing if return value != 0 */
>  vu_process_msg_cb process_msg;
> -- 
> 2.34.1




Re: [PATCH] vhost-user: add NEED_REPLY flag

2024-09-10 Thread Michael S. Tsirkin
On Sun, Aug 04, 2024 at 11:48:59PM +0800, luzhixing12345 wrote:
> Front-end message requests which need reply should set NEED_REPLY_MASK
> in flag, and response from slave need clear NEED_REPLY_MASK flag.


neither this.

> ---
>  hw/virtio/vhost-user.c| 2 +-
>  subprojects/libvhost-user/libvhost-user.c | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..edf2271e0a 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1082,7 +1082,7 @@ static int vhost_user_get_u64(struct vhost_dev *dev, 
> int request, uint64_t *u64)
>  int ret;
>  VhostUserMsg msg = {
>  .hdr.request = request,
> -.hdr.flags = VHOST_USER_VERSION,
> +.hdr.flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
>  };
>  
>  if (vhost_user_per_device_request(request) && dev->vq_index != 0) {
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 9c630c2170..40f665bd7f 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -667,6 +667,7 @@ vu_send_reply(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
>  {
>  /* Set the version in the flags when sending the reply */
>  vmsg->flags &= ~VHOST_USER_VERSION_MASK;
> +vmsg->flags &= ~VHOST_USER_NEED_REPLY_MASK;
>  vmsg->flags |= VHOST_USER_VERSION;
>  vmsg->flags |= VHOST_USER_REPLY_MASK;
>  
> -- 
> 2.34.1




Re: [PATCH v2] hw/acpi/ich9: Add periodic and swsmi timer

2024-09-10 Thread Michael S. Tsirkin
On Wed, Jul 31, 2024 at 10:28:34AM +0200, Dominic Prinz wrote:
> This patch implements the periodic and the swsmi ICH9 chipset timer. They are
> especially useful when prototyping UEFI firmware (e.g. with EDK2's OVMF)
> using QEMU.
> 
> For backwards compatibility, the compat properties "x-smi-swsmi-timer",
> and "x-smi-periodic-timer" are introduced.
> 
> Additionally, writes to the SMI_STS register are enabled for the
> corresponding two bits.
> 
> Signed-off-by: Dominic Prinz 
> ---
> Changes since previous version:
>   - Ensured backwards compatablity by introducing two compat properties
>   - Introduced write mask for SMI_STS register to make future work easier
> 
>  hw/acpi/ich9.c| 23 +
>  hw/acpi/ich9_timer.c  | 93 +++
>  hw/acpi/meson.build   |  2 +-
>  hw/i386/pc.c  |  2 +
>  hw/isa/lpc_ich9.c | 14 ++
>  include/hw/acpi/ich9.h|  6 +++
>  include/hw/acpi/ich9_timer.h  | 23 +
>  include/hw/southbridge/ich9.h |  4 ++
>  8 files changed, 166 insertions(+), 1 deletion(-)
>  create mode 100644 hw/acpi/ich9_timer.c
>  create mode 100644 include/hw/acpi/ich9_timer.h
> 
> diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
> index 02d8546bd3..c15e5b8281 100644
> --- a/hw/acpi/ich9.c
> +++ b/hw/acpi/ich9.c
> @@ -35,6 +35,7 @@
>  #include "sysemu/runstate.h"
>  #include "hw/acpi/acpi.h"
>  #include "hw/acpi/ich9_tco.h"
> +#include "hw/acpi/ich9_timer.h"
>  
>  #include "hw/southbridge/ich9.h"
>  #include "hw/mem/pc-dimm.h"
> @@ -108,6 +109,18 @@ static void ich9_smi_writel(void *opaque, hwaddr addr, 
> uint64_t val,
>  }
>  pm->smi_en &= ~pm->smi_en_wmask;
>  pm->smi_en |= (val & pm->smi_en_wmask);
> +if (pm->swsmi_timer_enabled) {
> +ich9_pm_update_swsmi_timer(pm, pm->smi_en &
> +   ICH9_PMIO_SMI_EN_SWSMI_EN);
> +}
> +if (pm->periodic_timer_enabled) {
> +ich9_pm_update_periodic_timer(pm, pm->smi_en &
> +  
> ICH9_PMIO_SMI_EN_PERIODIC_EN);
> +}
> +break;
> +case 4:
> +pm->smi_sts &= ~pm->smi_sts_wmask;
> +pm->smi_sts |= (val & pm->smi_sts_wmask);
>  break;
>  }
>  }
> @@ -286,6 +299,8 @@ static void pm_powerdown_req(Notifier *n, void *opaque)
>  
>  void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, qemu_irq sci_irq)
>  {
> +pm->smi_sts_wmask = 0;
> +
>  memory_region_init(&pm->io, OBJECT(lpc_pci), "ich9-pm", ICH9_PMIO_SIZE);
>  memory_region_set_enabled(&pm->io, false);
>  memory_region_add_subregion(pci_address_space_io(lpc_pci),
> @@ -305,6 +320,14 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
> qemu_irq sci_irq)
>"acpi-smi", 8);
>  memory_region_add_subregion(&pm->io, ICH9_PMIO_SMI_EN, &pm->io_smi);
>  
> +if (pm->swsmi_timer_enabled) {
> +ich9_pm_swsmi_timer_init(pm);
> +}
> +
> +if (pm->periodic_timer_enabled) {
> +ich9_pm_periodic_timer_init(pm);
> +}
> +
>  if (pm->enable_tco) {
>  acpi_pm_tco_init(&pm->tco_regs, &pm->io);
>  }
> diff --git a/hw/acpi/ich9_timer.c b/hw/acpi/ich9_timer.c
> new file mode 100644
> index 00..5b1c910156
> --- /dev/null
> +++ b/hw/acpi/ich9_timer.c
> @@ -0,0 +1,93 @@
> +/*
> + * QEMU ICH9 Timer emulation
> + *
> + * Copyright (c) 2024 Dominic Prinz 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/core/cpu.h"
> +#include "hw/pci/pci.h"
> +#include "hw/southbridge/ich9.h"
> +#include "qemu/timer.h"
> +
> +#include "hw/acpi/ich9_timer.h"
> +
> +void ich9_pm_update_swsmi_timer(ICH9LPCPMRegs *pm, bool enable)
> +{
> +uint16_t swsmi_rate_sel;
> +int64_t expire_time;
> +ICH9LPCState *lpc;
> +
> +if (enable) {
> +lpc = container_of(pm, ICH9LPCState, pm);
> +swsmi_rate_sel =
> +(pci_get_word(lpc->d.config + ICH9_LPC_GEN_PMCON_3) & 0xc0) >> 6;
> +
> +if (swsmi_rate_sel == 0) {
> +expire_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 150LL;
> +} else {
> +expire_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +  8 * (1 << swsmi_rate_sel) * 100LL;
> +}
> +
> +timer_mod(pm->swsmi_timer, expire_time);
> +} else {
> +timer_del(pm->swsmi_timer);
> +}
> +}
> +
> +static void ich9_pm_swsmi_timer_expired(void *opaque)
> +{
> +ICH9LPCPMRegs *pm = opaque;
> +
> +pm->smi_sts |= ICH9_PMIO_SMI_STS_SWSMI_STS;
> +ich9_generate_smi();
> +
> +ich9_pm_update_swsmi_timer(pm, pm->smi_en & ICH9_PMIO_SMI_EN_SWSMI_EN);
> +}
> +
> +void ich9_pm_swsmi_timer_init(ICH9LPCPMRegs *pm)
> +{
> +pm->smi_sts_wmask |= ICH9_PMIO_SMI_STS_SWSMI_STS;
> +pm->swsmi_tim

Re: [PATCH v4] pci-bridge: avoid linking a single downstream port more than once

2024-09-10 Thread Michael S. Tsirkin
On Thu, Jul 25, 2024 at 05:38:19AM -0400, Yao Xingtao wrote:
> Since the downstream port is not checked, two slots can be linked to
> a single port. However, this can prevent the driver from detecting the
> device properly.
> 
> It is necessary to ensure that a downstream port is not linked more than
> once.
> 
> Links: 
> https://lore.kernel.org/qemu-devel/oszpr01mb6453bc61d2ff4035f18084ef8d...@oszpr01mb6453.jpnprd01.prod.outlook.com
> Signed-off-by: Yao Xingtao 
> 
> ---
> V3[3] -> V4:
>  - make the error message more readable
>  - fix the downstream port check error
> 
> V2[2] -> V3:
>  - Move this check into pcie_cap_init()
> 
> V1[1] -> V2:
>  - Move downstream port check forward
> 
> [1] 
> https://lore.kernel.org/qemu-devel/20240704033834.3362-1-yaoxt.f...@fujitsu.com
> [2] 
> https://lore.kernel.org/qemu-devel/20240717085621.55315-1-yaoxt.f...@fujitsu.com
> [3] 
> https://lore.kernel.org/qemu-devel/20240725032731.13032-1-yaoxt.f...@fujitsu.com
> ---
>  hw/pci/pcie.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 4b2f0805c6e0..1e53be1bc7c5 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -192,6 +192,13 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset,
>  
>  assert(pci_is_express(dev));
>  
> +if ((type == PCI_EXP_TYPE_DOWNSTREAM || type == PCI_EXP_TYPE_ROOT_PORT) 
> &&
> +pcie_find_port_by_pn(pci_get_bus(dev), port)) {
> +error_setg(errp, "The port %d is already in use, please select "
> +   "another port", port);
> +return -EBUSY;
> +}
> +
>  pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
>   PCI_EXP_VER2_SIZEOF, errp);
>  if (pos < 0) {


But can't there be two functions of a multi-function device,
sharing a port?

> -- 
> 2.41.0




Re: [PATCH] softmmu: Support concurrent bounce buffers

2024-09-10 Thread Michael S. Tsirkin
On Mon, Aug 19, 2024 at 06:54:54AM -0700, Mattias Nissler wrote:
> When DMA memory can't be directly accessed, as is the case when
> running the device model in a separate process without shareable DMA
> file descriptors, bounce buffering is used.
> 
> It is not uncommon for device models to request mapping of several DMA
> regions at the same time. Examples include:
>  * net devices, e.g. when transmitting a packet that is split across
>several TX descriptors (observed with igb)
>  * USB host controllers, when handling a packet with multiple data TRBs
>(observed with xhci)
> 
> Previously, qemu only provided a single bounce buffer per AddressSpace
> and would fail DMA map requests while the buffer was already in use. In
> turn, this would cause DMA failures that ultimately manifest as hardware
> errors from the guest perspective.
> 
> This change allocates DMA bounce buffers dynamically instead of
> supporting only a single buffer. Thus, multiple DMA mappings work
> correctly also when RAM can't be mmap()-ed.
> 
> The total bounce buffer allocation size is limited individually for each
> AddressSpace. The default limit is 4096 bytes, matching the previous
> maximum buffer size. A new x-max-bounce-buffer-size parameter is
> provided to configure the limit for PCI devices.
> 
> Signed-off-by: Mattias Nissler 
> Reviewed-by: Philippe Mathieu-Daudé 
> Acked-by: Peter Xu 
> ---
> This patch is split out from my "Support message-based DMA in vfio-user 
> server"
> series. With the series having been partially applied, I'm splitting this one
> out as the only remaining patch to system emulation code in the hope to
> simplify getting it landed. The code has previously been reviewed by Stefan
> Hajnoczi and Peter Xu. This latest version includes changes to switch the
> bounce buffer size bookkeeping to `size_t` as requested and LGTM'd by Phil in
> v9.
> ---
>  hw/pci/pci.c|  8 
>  include/exec/memory.h   | 14 +++
>  include/hw/pci/pci_device.h |  3 ++
>  system/memory.c |  5 ++-
>  system/physmem.c| 82 ++---
>  5 files changed, 76 insertions(+), 36 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index fab86d0567..d2caf3ee8b 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -85,6 +85,8 @@ static Property pci_props[] = {
>  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
>  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
>  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> +DEFINE_PROP_SIZE32("x-max-bounce-buffer-size", PCIDevice,
> + max_bounce_buffer_size, DEFAULT_MAX_BOUNCE_BUFFER_SIZE),
>  DEFINE_PROP_END_OF_LIST()
>  };
>  

I'm a bit puzzled by now there being two fields named
max_bounce_buffer_size, one directly controllable by
a property.

Pls add code comments explaining how they are related.


Also, what is the point of adding a property without
making it part of an API? No one will be able to rely on
it working.




> @@ -1204,6 +1206,8 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> *pci_dev,
> "bus master container", UINT64_MAX);
>  address_space_init(&pci_dev->bus_master_as,
> &pci_dev->bus_master_container_region, pci_dev->name);
> +pci_dev->bus_master_as.max_bounce_buffer_size =
> +pci_dev->max_bounce_buffer_size;
>  
>  if (phase_check(PHASE_MACHINE_READY)) {
>  pci_init_bus_master(pci_dev);
> @@ -2633,6 +2637,10 @@ static void pci_device_class_init(ObjectClass *klass, 
> void *data)
>  k->unrealize = pci_qdev_unrealize;
>  k->bus_type = TYPE_PCI_BUS;
>  device_class_set_props(k, pci_props);
> +object_class_property_set_description(
> +klass, "x-max-bounce-buffer-size",
> +"Maximum buffer size allocated for bounce buffers used for mapped "
> +"access to indirect DMA memory");
>  }
>  
>  static void pci_device_class_base_init(ObjectClass *klass, void *data)
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 296fd068c0..e5e865d1a9 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1084,13 +1084,7 @@ typedef struct AddressSpaceMapClient {
>  QLIST_ENTRY(AddressSpaceMapClient) link;
>  } AddressSpaceMapClient;
>  
> -typedef struct {
> -MemoryRegion *mr;
> -void *buffer;
> -hwaddr addr;
> -hwaddr len;
> -bool in_use;
> -} BounceBuffer;
> +#define DEFAULT_MAX_BOUNCE_BUFFER_SIZE (4096)
>  
>  /**
>   * struct AddressSpace: describes a mapping of addresses to #MemoryRegion 
> objects
> @@ -1110,8 +1104,10 @@ struct AddressSpace {
>  QTAILQ_HEAD(, MemoryListener) listeners;
>  QTAILQ_ENTRY(AddressSpace) address_spaces_link;
>  
> -/* Bounce buffer to use for this address space. */
> -BounceBuffer bounce;
> +/* Maximum DMA bounce buffer size used for indirect memory map requests 
> */
> +size_t max_bounce_buffer_size;
> +/

Re: [PATCH v3] vhost-user: Do not wait for reply for not sent VHOST_USER_SET_LOG_BASE

2024-09-10 Thread Michael S. Tsirkin
On Thu, Aug 01, 2024 at 08:45:40PM +0800, BillXiang wrote:
> From: BillXiang 
> 
> Currently, we have added VHOST_USER_SET_LOG_BASE to 
> vhost_user_per_device_request in commit 7c211eb078c4 
> ("vhost-user: Skip unnecessary duplicated VHOST_USER_SET_LOG_BASE requests"), 
> as a result, VHOST_USER_SET_LOG_BASE will be sent only once 
> when 'vq_index == 0'.
> In this patch we add the check of 'vq_index == 0' before 
> vhost_user_read, such that we do not wait for reply for not
> sent VHOST_USER_SET_LOG_BASE.
> 
> Signed-off-by: BillXiang 

If you still want this in, pls rewrite the commit log to make
it clear wat is going on: e.g. "cleanup X which does not do
Y with Z, which does" and repost.


> ---
>  hw/virtio/vhost-user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..fd12992d15 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -460,7 +460,7 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, 
> uint64_t base,
>  return ret;
>  }
>  
> -if (shmfd) {
> +if (shmfd && (dev->vq_index == 0)) {
>  msg.hdr.size = 0;
>  ret = vhost_user_read(dev, &msg);
>  if (ret < 0) {
> -- 
> 2.30.0




Re: [PATCH for-9.2 v15 00/11] hw/pci: SR-IOV related fixes and improvements

2024-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 03:21:54PM +0200, Cédric Le Goater wrote:
> On 9/10/24 11:33, Akihiko Odaki wrote:
> > On 2024/09/10 18:21, Michael S. Tsirkin wrote:
> > > On Fri, Aug 23, 2024 at 02:00:37PM +0900, Akihiko Odaki wrote:
> > > > Supersedes: <20240714-rombar-v2-0-af1504ef5...@daynix.com>
> > > > ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
> > > > 
> > > > I submitted a RFC series[1] to add support for SR-IOV emulation to
> > > > virtio-net-pci. During the development of the series, I fixed some
> > > > trivial bugs and made improvements that I think are independently
> > > > useful. This series extracts those fixes and improvements from the RFC
> > > > series.
> > > > 
> > > > [1]: 
> > > > https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6d...@daynix.com/
> > > > 
> > > > Signed-off-by: Akihiko Odaki 
> > > 
> > > I don't think Cédric's issues have been addressed, am I wrong?
> > > Cédric, what is your take?
> > 
> > I put the URI to Cédric's report here:
> > https://lore.kernel.org/r/75cbc7d9-b48e-4235-85cf-49dacf3c7...@redhat.com
> > 
> > This issue was dealt with patch "s390x/pci: Check for multifunction after 
> > device realization". I found that s390x on QEMU does not support 
> > multifunction and SR-IOV devices accidentally circumvent this restriction, 
> > which means igb was never supposed to work with s390x. The patch prevents 
> > adding SR-IOV devices to s390x to ensure the restriction is properly 
> > enforced.
> 
> yes, indeed and it seems to fix :
> 
>   6069bcdeacee ("s390x/pci: Move some hotplug checks to the pre_plug handler")
> 
> I will update patch 4.
> 
> 
> Thanks,
> 
> C.
> 
> 
> That said, the igb device worked perfectly fine under the s390x machine.

And it works for you after this patchset, yes?




Re: [PATCH v2] hw/i386/intel_iommu: Block CFI when necessary

2024-09-10 Thread Michael S. Tsirkin
On Mon, Jul 08, 2024 at 06:08:16PM +0800, Yuke Peng wrote:
> According to Intel VT-d specification 5.1.4, CFI must be blocked when
> Extended Interrupt Mode is enabled or Compatibility format interrupts
> are disabled.
> 
> Signed-off-by: Yuke Peng 

The rename is fine.
The issue with the patch is the extra section.
We need to avoid adding it for compat machine types.

Do you have the time to address this?

> ---
> Changes in v2:
> - Use subsections for the cfi_enabled field.
> - Link to v1: 
> https://lore.kernel.org/qemu-devel/20240625112819.862282-1-pykfi...@gmail.com/
> 
> ---
>  hw/i386/intel_iommu.c | 53 +--
>  hw/i386/trace-events  |  1 +
>  include/hw/i386/intel_iommu.h |  1 +
>  3 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 5085a6fee3..af9c864bde 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2410,6 +2410,22 @@ static void vtd_handle_gcmd_ire(IntelIOMMUState *s, 
> bool en)
>  }
>  }
>  
> +/* Handle Compatibility Format Interrupts Enable/Disable */
> +static void vtd_handle_gcmd_cfi(IntelIOMMUState *s, bool en)
> +{
> +trace_vtd_cfi_enable(en);
> +
> +if (en) {
> +s->cfi_enabled = true;
> +/* Ok - report back to driver */
> +vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_CFIS);
> +} else {
> +s->cfi_enabled = false;
> +/* Ok - report back to driver */
> +vtd_set_clear_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_CFIS, 0);
> +}
> +}
> +
>  /* Handle write to Global Command Register */
>  static void vtd_handle_gcmd_write(IntelIOMMUState *s)
>  {
> @@ -2440,6 +2456,10 @@ static void vtd_handle_gcmd_write(IntelIOMMUState *s)
>  /* Interrupt remap enable/disable */
>  vtd_handle_gcmd_ire(s, val & VTD_GCMD_IRE);
>  }
> +if (changed & VTD_GCMD_CFI) {
> +/* Compatibility format interrupts enable/disable */
> +vtd_handle_gcmd_cfi(s, val & VTD_GCMD_CFI);
> +}
>  }
>  
>  /* Handle write to Context Command Register */
> @@ -3283,7 +3303,25 @@ static int vtd_post_load(void *opaque, int version_id)
>  return 0;
>  }
>  
> -static const VMStateDescription vtd_vmstate = {
> +static bool vtd_cfi_needed(void *opaque)
> +{
> +IntelIOMMUState *iommu = opaque;
> +
> +return iommu->intr_enabled && !iommu->intr_eime;
> +}
> +
> +static const VMStateDescription vmstate_vtd_cfi = {
> +.name = "iommu-intel/cfi",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = vtd_cfi_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_BOOL(cfi_enabled, IntelIOMMUState),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static const VMStateDescription vmstate_vtd = {
>  .name = "iommu-intel",
>  .version_id = 1,
>  .minimum_version_id = 1,
> @@ -3306,6 +3344,10 @@ static const VMStateDescription vtd_vmstate = {
>  VMSTATE_BOOL(intr_enabled, IntelIOMMUState),
>  VMSTATE_BOOL(intr_eime, IntelIOMMUState),
>  VMSTATE_END_OF_LIST()
> +},
> +.subsections = (const VMStateDescription * []) {
> +&vmstate_vtd_cfi,
> +NULL
>  }
>  };
>  
> @@ -3525,6 +3567,12 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState 
> *iommu,
>  
>  /* This is compatible mode. */
>  if (addr.addr.int_mode != VTD_IR_INT_FORMAT_REMAP) {
> +if (iommu->intr_eime || !iommu->cfi_enabled) {
> +if (do_fault) {
> +vtd_report_ir_fault(iommu, sid, VTD_FR_IR_REQ_COMPAT, 0);
> +}
> +return -EINVAL;
> +}
>  memcpy(translated, origin, sizeof(*origin));
>  goto out;
>  }
> @@ -3950,6 +3998,7 @@ static void vtd_init(IntelIOMMUState *s)
>  s->root_scalable = false;
>  s->dmar_enabled = false;
>  s->intr_enabled = false;
> +s->cfi_enabled = false;
>  s->iq_head = 0;
>  s->iq_tail = 0;
>  s->iq = 0;
> @@ -4243,7 +4292,7 @@ static void vtd_class_init(ObjectClass *klass, void 
> *data)
>  X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass);
>  
>  dc->reset = vtd_reset;
> -dc->vmsd = &vtd_vmstate;
> +dc->vmsd = &vmstate_vtd;
>  device_class_set_props(dc, vtd_properties);
>  dc->hotpluggable = false;
>  x86_class->realize = vtd_realize;
> diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> index 53c02d7ac8..ffd87db65f 100644
> --- a/hw/i386/trace-events
> +++ b/hw/i386/trace-events
> @@ -57,6 +57,7 @@ vtd_dmar_translate(uint8_t bus, uint8_t slot, uint8_t func, 
> uint64_t iova, uint6
>  vtd_dmar_enable(bool en) "enable %d"
>  vtd_dmar_fault(uint16_t sid, int fault, uint64_t addr, bool is_write) "sid 
> 0x%"PRIx16" fault %d addr 0x%"PRIx64" write %d"
>  vtd_ir_enable(bool en) "enable %d"
> +vtd_cfi_enable(bool en) "enable %d"
>  vtd_ir_irte_get(int index, uint64_t lo, uint64_t hi) "index %d low 
> 0x%"PRIx64" high 0x%"PRIx64
>  vtd_ir_remap(int index, int 

Re: [PATCH for-9.2 v15 00/11] hw/pci: SR-IOV related fixes and improvements

2024-09-10 Thread Michael S. Tsirkin
On Tue, Sep 10, 2024 at 06:33:01PM +0900, Akihiko Odaki wrote:
> On 2024/09/10 18:21, Michael S. Tsirkin wrote:
> > On Fri, Aug 23, 2024 at 02:00:37PM +0900, Akihiko Odaki wrote:
> > > Supersedes: <20240714-rombar-v2-0-af1504ef5...@daynix.com>
> > > ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
> > > 
> > > I submitted a RFC series[1] to add support for SR-IOV emulation to
> > > virtio-net-pci. During the development of the series, I fixed some
> > > trivial bugs and made improvements that I think are independently
> > > useful. This series extracts those fixes and improvements from the RFC
> > > series.
> > > 
> > > [1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6d...@daynix.com/
> > > 
> > > Signed-off-by: Akihiko Odaki 
> > 
> > I don't think Cédric's issues have been addressed, am I wrong?
> > Cédric, what is your take?
> 
> I put the URI to Cédric's report here:
> https://lore.kernel.org/r/75cbc7d9-b48e-4235-85cf-49dacf3c7...@redhat.com
> 
> This issue was dealt with patch "s390x/pci: Check for multifunction after
> device realization". I found that s390x on QEMU does not support
> multifunction and SR-IOV devices accidentally circumvent this restriction,
> which means igb was never supposed to work with s390x. The patch prevents
> adding SR-IOV devices to s390x to ensure the restriction is properly
> enforced.
> 
> Regards,
> Akihiko Odaki

Cédric would appreciate your Tested-by/Reviewed-by.

-- 
MST




Re: [PATCH for-9.2 v15 00/11] hw/pci: SR-IOV related fixes and improvements

2024-09-10 Thread Michael S. Tsirkin
On Fri, Aug 23, 2024 at 02:00:37PM +0900, Akihiko Odaki wrote:
> Supersedes: <20240714-rombar-v2-0-af1504ef5...@daynix.com>
> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")
> 
> I submitted a RFC series[1] to add support for SR-IOV emulation to
> virtio-net-pci. During the development of the series, I fixed some
> trivial bugs and made improvements that I think are independently
> useful. This series extracts those fixes and improvements from the RFC
> series.
> 
> [1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6d...@daynix.com/
> 
> Signed-off-by: Akihiko Odaki 

I don't think Cédric's issues have been addressed, am I wrong?
Cédric, what is your take?



> ---
> Changes in v15:
> - Fixed variable shadowing in patch
>   "pcie_sriov: Remove num_vfs from PCIESriovPF"
> - Link to v14: 
> https://lore.kernel.org/r/20240813-reuse-v14-0-4c15bc6ee...@daynix.com
> 
> Changes in v14:
> - Dropped patch "pcie_sriov: Ensure VF function number does not
>   overflow" as I found the restriction it imposes is unnecessary.
> - Link to v13: 
> https://lore.kernel.org/r/20240805-reuse-v13-0-aaeaa4d7d...@daynix.com
> 
> Changes in v13:
> - Added patch "s390x/pci: Check for multifunction after device
>   realization". I found SR-IOV devices, which are multifunction devices,
>   are not supposed to work at all with s390x on QEMU.
> - Link to v12: 
> https://lore.kernel.org/r/20240804-reuse-v12-0-d3930c411...@daynix.com
> 
> Changes in v12:
> - Changed to ignore invalid PCI_SRIOV_NUM_VF writes as done for
>   PCI_SRIOV_CTRL_VFE.
> - Updated the message for patch "hw/pci: Use -1 as the default value for
>   rombar". (Markus Armbruster)
> - Link to v11: 
> https://lore.kernel.org/r/20240802-reuse-v11-0-fb83bb8c1...@daynix.com
> 
> Changes in v11:
> - Rebased.
> - Dropped patch "hw/pci: Convert rom_bar into OnOffAuto".
> - Added patch "hw/pci: Use -1 as the default value for rombar".
> - Added for-9.2 to give a testing period for possible breakage with
>   libvirt/s390x.
> - Link to v10: 
> https://lore.kernel.org/r/20240627-reuse-v10-0-7ca0b8ed3...@daynix.com
> 
> Changes in v10:
> - Added patch "hw/ppc/spapr_pci: Do not reject VFs created after a PF".
> - Added patch "hw/ppc/spapr_pci: Do not create DT for disabled PCI device".
> - Added patch "hw/pci: Convert rom_bar into OnOffAuto".
> - Dropped patch "hw/pci: Determine if rombar is explicitly enabled".
> - Dropped patch "hw/qdev: Remove opts member".
> - Link to v9: 
> https://lore.kernel.org/r/20240315-reuse-v9-0-67aa69af4...@daynix.com
> 
> Changes in v9:
> - Rebased.
> - Restored '#include "qapi/error.h"' (Michael S. Tsirkin)
> - Added patch "pcie_sriov: Ensure VF function number does not overflow"
>   to fix abortion with wrong PF addr.
> - Link to v8: 
> https://lore.kernel.org/r/20240228-reuse-v8-0-282660281...@daynix.com
> 
> Changes in v8:
> - Clarified that "hw/pci: Replace -1 with UINT32_MAX for romsize" is
>   not a bug fix. (Markus Armbruster)
> - Squashed patch "vfio: Avoid inspecting option QDict for rombar" into
>   "hw/pci: Determine if rombar is explicitly enabled".
>   (Markus Armbruster)
> - Noted the minor semantics change for patch "hw/pci: Determine if
>   rombar is explicitly enabled". (Markus Armbruster)
> - Link to v7: 
> https://lore.kernel.org/r/20240224-reuse-v7-0-29c14bcb9...@daynix.com
> 
> Changes in v7:
> - Replaced -1 with UINT32_MAX when expressing uint32_t.
>   (Markus Armbruster)
> - Added patch "hw/pci: Replace -1 with UINT32_MAX for romsize".
> - Link to v6: 
> https://lore.kernel.org/r/20240220-reuse-v6-0-2e42a28b0...@daynix.com
> 
> Changes in v6:
> - Fixed migration.
> - Added patch "pcie_sriov: Do not manually unrealize".
> - Restored patch "pcie_sriov: Release VFs failed to realize" that was
>   missed in v5.
> - Link to v5: 
> https://lore.kernel.org/r/20240218-reuse-v5-0-e4fc1c19b...@daynix.com
> 
> Changes in v5:
> - Added patch "hw/pci: Always call pcie_sriov_pf_reset()".
> - Added patch "pcie_sriov: Reset SR-IOV extended capability".
> - Removed a reference to PCI_SRIOV_CTRL_VFE in hw/nvme.
>   (Michael S. Tsirkin)
> - Noted the impact on the guest of patch "pcie_sriov: Do not reset
>   NumVFs after unregistering VFs". (Michael S. Tsirkin)
> - Changed to use pcie_sriov_num_vfs().
> - Restored pci_set_power() and changed it to call pci_set_enabled() only
>   for PFs with an expalanation. (Michael S. Tsirkin)
> - Reordered patches.
> - Link to

Re: [PATCH 0/2] Solve vt82c686 qemu_irq leak.

2024-09-10 Thread Michael S. Tsirkin
On Sat, Jun 29, 2024 at 10:01:52PM +0200, BALATON Zoltan wrote:
> This is an alternative appriach to solve the qemu_irq leak in
> vt82c686. Allowing embedding an irq and init it in place like done
> with other objects may allow cleaner fix for similar issues and I also
> plan to use this for adding qemu_itq to pegasos2 machine state for
> which gpio would not work.
> 
> BALATON Zoltan (2):
>   hw: Move declaration of IRQState to header and add init function
>   hw/isa/vt82c686.c: Embed i8259 irq in device state instead of
> allocating

This looked like a simpler approach to shut up analyzer warnings, so I
picked this one.



>  hw/core/irq.c | 25 +++--
>  hw/isa/vt82c686.c |  7 ---
>  include/hw/irq.h  | 18 ++
>  3 files changed, 33 insertions(+), 17 deletions(-)
> 
> -- 
> 2.30.9
> 
> 




Re: [PATCH v2 0/7] Report fatal errors from failure with pre-opened eBPF RSS FDs

2024-09-06 Thread Michael S. Tsirkin
On Thu, Sep 05, 2024 at 07:13:23PM +0100, Daniel P. Berrangé wrote:
> The virtio-net code for eBPF RSS is still ignoring errors when
> failing to load the eBPF RSS program passed in by the mgmt app
> via pre-opened FDs.
> 
> This series re-factors the eBPF common code so that it actually
> reports using "Error" objects. Then it makes virtio-net treat
> a failure to load pre-opened FDs as a fatal problem. When doing
> speculative opening of eBPF FDs, QEMU merely prints a warning,
> and allows the software fallback to continue.
> 
> Trace event coverage is significantly expanded to make this all
> much more debuggable too.


looks good
Reviewed-by: Michael S. Tsirkin 

Jason's tree.

> Changed in v2:
> 
>  - Split 'ebpf_error' probe into multiple probes
> 
> Daniel P. Berrangé (7):
>   hw/net: fix typo s/epbf/ebpf/ in virtio-net
>   ebpf: drop redundant parameter checks in static methods
>   ebpf: improve error trace events
>   ebpf: add formal error reporting to all APIs
>   hw/net: report errors from failing to use eBPF RSS FDs
>   ebpf: improve trace event coverage to all key operations
>   hw/net: improve tracing of eBPF RSS setup
> 
>  ebpf/ebpf_rss.c | 118 
>  ebpf/ebpf_rss.h |  10 ++--
>  ebpf/trace-events   |   8 ++-
>  hw/net/trace-events |   8 +--
>  hw/net/virtio-net.c |  63 +++
>  5 files changed, 137 insertions(+), 70 deletions(-)
> 
> -- 
> 2.45.2




Re: [PATCH v2] docs: fix vhost-user protocol doc

2024-09-06 Thread Michael S. Tsirkin
On Fri, Sep 06, 2024 at 10:10:45AM +0800, luzhixing12345 wrote:
> Hi, can someone help review this patch?
> 
> Signed-off-by: luzhixing12345 

You got comments Aug 5, pls address them.




Re: [PATCH v5 5/8] device/virtio-nsm: Support for Nitro Secure Module device

2024-09-04 Thread Michael S. Tsirkin
On Thu, Sep 05, 2024 at 12:30:07AM +0600, Dorjoy Chowdhury wrote:
> On Wed, Sep 4, 2024 at 2:47 AM Dorjoy Chowdhury  
> wrote:
> >
> >
> >
> > On Wed, Sep 4, 2024, 2:32 AM Michael S. Tsirkin  wrote:
> >>
> >> On Wed, Sep 04, 2024 at 01:58:15AM +0600, Dorjoy Chowdhury wrote:
> >> > On Thu, Aug 29, 2024 at 1:11 AM Michael S. Tsirkin  
> >> > wrote:
> >> > >
> >> > > On Thu, Aug 29, 2024 at 01:04:05AM +0600, Dorjoy Chowdhury wrote:
> >> > > > On Thu, Aug 29, 2024 at 12:28 AM Michael S. Tsirkin 
> >> > > >  wrote:
> >> > > > >
> >> > > > > On Thu, Aug 22, 2024 at 09:08:46PM +0600, Dorjoy Chowdhury wrote:
> >> > > > > > Nitro Secure Module (NSM)[1] device is used in AWS Nitro 
> >> > > > > > Enclaves[2]
> >> > > > > > for stripped down TPM functionality like cryptographic 
> >> > > > > > attestation.
> >> > > > > > The requests to and responses from NSM device are CBOR[3] 
> >> > > > > > encoded.
> >> > > > > >
> >> > > > > > This commit adds support for NSM device in QEMU. Although 
> >> > > > > > related to
> >> > > > > > AWS Nitro Enclaves, the virito-nsm device is independent and can 
> >> > > > > > be
> >> > > > > > used in other machine types as well. The libcbor[4] library has 
> >> > > > > > been
> >> > > > > > used for the CBOR encoding and decoding functionalities.
> >> > > > > >
> >> > > > > > [1] 
> >> > > > > > https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
> >> > > > > > [2] 
> >> > > > > > https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> >> > > > > > [3] http://cbor.io/
> >> > > > > > [4] https://libcbor.readthedocs.io/en/latest/
> >> > > > > >
> >> > > > > > Signed-off-by: Dorjoy Chowdhury 
> >> > > > > > ---
> >> > > > > >  MAINTAINERS  |   10 +
> >> > > > > >  hw/virtio/Kconfig|5 +
> >> > > > > >  hw/virtio/cbor-helpers.c |  326 ++
> >> > > > > >  hw/virtio/meson.build|6 +
> >> > > > > >  hw/virtio/virtio-nsm-pci.c   |   73 ++
> >> > > > > >  hw/virtio/virtio-nsm.c   | 1638 
> >> > > > > > ++
> >> > > > > >  include/hw/virtio/cbor-helpers.h |   46 +
> >> > > > > >  include/hw/virtio/virtio-nsm.h   |   59 ++
> >> > > > > >  meson.build  |2 +
> >> > > > > >  9 files changed, 2165 insertions(+)
> >> > > >
> >> > > > [...]
> >> > > >
> >> > > > > > +static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
> >> > > > > > +{
> >> > > > > > +g_autofree VirtQueueElement *out_elem = NULL;
> >> > > > > > +g_autofree VirtQueueElement *in_elem = NULL;
> >> > > > > > +VirtIONSM *vnsm = VIRTIO_NSM(vdev);
> >> > > > > > +Error *err = NULL;
> >> > > > > > +
> >> > > > > > +out_elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> >> > > > > > +if (!out_elem) {
> >> > > > > > +/* nothing in virtqueue */
> >> > > > > > +return;
> >> > > > > > +}
> >> > > > > > +
> >> > > > > > +if (out_elem->out_num != 1) {
> >> > > > > > +virtio_error(vdev, "Expected one request buffer first 
> >> > > > > > in virtqueue");
> >> > > > > > +goto cleanup;
> >> > > > > > +}
> >> > > > >
> >> > > > > Seems to assume request in a single s/g element?
> >> > > > > We generally avoid this kind of thing.
> >> > > > >
> >> > > > > Applies equally elsewheree.
> >> > > > >
> 

Re: [PATCH v5 5/8] device/virtio-nsm: Support for Nitro Secure Module device

2024-09-03 Thread Michael S. Tsirkin
On Wed, Sep 04, 2024 at 01:58:15AM +0600, Dorjoy Chowdhury wrote:
> On Thu, Aug 29, 2024 at 1:11 AM Michael S. Tsirkin  wrote:
> >
> > On Thu, Aug 29, 2024 at 01:04:05AM +0600, Dorjoy Chowdhury wrote:
> > > On Thu, Aug 29, 2024 at 12:28 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Thu, Aug 22, 2024 at 09:08:46PM +0600, Dorjoy Chowdhury wrote:
> > > > > Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves[2]
> > > > > for stripped down TPM functionality like cryptographic attestation.
> > > > > The requests to and responses from NSM device are CBOR[3] encoded.
> > > > >
> > > > > This commit adds support for NSM device in QEMU. Although related to
> > > > > AWS Nitro Enclaves, the virito-nsm device is independent and can be
> > > > > used in other machine types as well. The libcbor[4] library has been
> > > > > used for the CBOR encoding and decoding functionalities.
> > > > >
> > > > > [1] 
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
> > > > > [2] 
> > > > > https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> > > > > [3] http://cbor.io/
> > > > > [4] https://libcbor.readthedocs.io/en/latest/
> > > > >
> > > > > Signed-off-by: Dorjoy Chowdhury 
> > > > > ---
> > > > >  MAINTAINERS  |   10 +
> > > > >  hw/virtio/Kconfig|5 +
> > > > >  hw/virtio/cbor-helpers.c |  326 ++
> > > > >  hw/virtio/meson.build|6 +
> > > > >  hw/virtio/virtio-nsm-pci.c   |   73 ++
> > > > >  hw/virtio/virtio-nsm.c   | 1638 
> > > > > ++
> > > > >  include/hw/virtio/cbor-helpers.h |   46 +
> > > > >  include/hw/virtio/virtio-nsm.h   |   59 ++
> > > > >  meson.build  |2 +
> > > > >  9 files changed, 2165 insertions(+)
> > >
> > > [...]
> > >
> > > > > +static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
> > > > > +{
> > > > > +g_autofree VirtQueueElement *out_elem = NULL;
> > > > > +g_autofree VirtQueueElement *in_elem = NULL;
> > > > > +VirtIONSM *vnsm = VIRTIO_NSM(vdev);
> > > > > +Error *err = NULL;
> > > > > +
> > > > > +out_elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > > > > +if (!out_elem) {
> > > > > +/* nothing in virtqueue */
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +if (out_elem->out_num != 1) {
> > > > > +virtio_error(vdev, "Expected one request buffer first in 
> > > > > virtqueue");
> > > > > +goto cleanup;
> > > > > +}
> > > >
> > > > Seems to assume request in a single s/g element?
> > > > We generally avoid this kind of thing.
> > > >
> > > > Applies equally elsewheree.
> > > >
> > >
> > > Thank you for reviewing. I think I did it this way (first virqueue_pop
> > > gives out_elem with out_num == 1 and the next virtqueue_pop gives
> > > in_elem with in_num == 1) after seeing what the virqueue contains
> > > (using printfs) when running in a VM and sending some NSM requests and
> > > I noticed the above. Can you give me a bit more details about what
> > > this should be like? Is there any existing virtio device code I can
> > > look at for example?
> > > Thanks!
> >
> >
> > Use iov_to_buf / iov_from_buf
> >
> > there are many examples in the tree, I'd look for some recent ones.
> >
> 
> I am a bit stuck at this and I would appreciate some help. I looked at
> other "iov_to_buf" and "iov_from_buf" examples in QEMU and in those I
> see there are known request and response "structs" associated with it.
> But in the case of NSM, the request and responses can be arbitrary
> CBOR objects i.e., no specific structs or lengths associated.


take whatever you want to access, move it to a buffer with iov_to_buf
then access the buffer.

reverse is even easier. put in a buffer, copy with iov_from_buf.

> So I am
> not sure using "iov_to_buf" / "iov_from_buf" makes sense here.
> And about the request response being in a single s/g element, I think
> it's because of how the NSM driver is in drivers/misc/nsm.c (see
> nsm_sendrecv_msg_locked function)in the linux kernel tree.

yes but driver is free to change this.
Isn't there a spec for this device to consult?
Sending that to virtio tc will be needed before we add this to qemu.

> I am not sure what changes are needed in the current code if any. Do
> you have any suggestions on this?
> 
> Regards,
> Dorjoy




Re: [PATCH V2 1/1] virtio-pci: Add lookup subregion of VirtIOPCIRegion MR

2024-09-03 Thread Michael S. Tsirkin
On Tue, Aug 20, 2024 at 07:56:31PM +0800, Gao Shiyuan wrote:
> When VHOST_USER_PROTOCOL_F_HOST_NOTIFIER feature negotiated and
> virtio_queue_set_host_notifier_mr success on system blk
> device's queue, the VM can't load MBR if the notify region's
> address above 4GB.
> 
> Assign the address of notify region in the modern bar above 4G, the vp_notify
> in SeaBIOS will use PCI Cfg Capability to write notify region. This will trap
> into QEMU and be handled by the host bridge when we don't enable mmconfig.
> QEMU will call virtio_write_config and since it writes to the BAR region
> through the PCI Cfg Capability, it will call virtio_address_space_write.
> 
> virtio_queue_set_host_notifier_mr add host notifier subregion of notify region
> MR, QEMU need write the mmap address instead of eventfd notify the hardware
> accelerator at the vhost-user backend. So virtio_address_space_lookup in
> virtio_address_space_write need return a host-notifier subregion of notify MR
> instead of notify MR.
> 
> Add lookup subregion of VirtIOPCIRegion MR instead of only lookup container 
> MR.
> 
> Fixes: a93c8d8 ("virtio-pci: Replace modern_as with direct access to 
> modern_bar")
> 
> Co-developed-by: Zuo Boqun 
> Signed-off-by: Gao Shiyuan 
> Signed-off-by: Zuo Boqun 
> ---
>  hw/virtio/virtio-pci.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> ---
> v1 -> v2:
> * modify commit message
> * replace direct iteration over subregions with memory_region_find.
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 9534730bba..5d2d27a6a3 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -610,19 +610,29 @@ static MemoryRegion 
> *virtio_address_space_lookup(VirtIOPCIProxy *proxy,
>  {
>  int i;
>  VirtIOPCIRegion *reg;
> +MemoryRegion *mr = NULL;
> +MemoryRegionSection mrs;
>  
>  for (i = 0; i < ARRAY_SIZE(proxy->regs); ++i) {
>  reg = &proxy->regs[i];
>  if (*off >= reg->offset &&
>  *off + len <= reg->offset + reg->size) {
> -*off -= reg->offset;
> -return ®->mr;
> +mrs = memory_region_find(®->mr, *off - reg->offset, len);
> +if (!mrs.mr) {
> +error_report("Failed to find memory region for address"
> + "0x%" PRIx64 "", *off);
> +return NULL;
> +}


I'm not sure when can this happen. If it can't assert will do.


> +*off = mrs.offset_within_region;
> +memory_region_unref(mrs.mr);
> +return mrs.mr;
>  }
>  }
>  
>  return NULL;
>  }
>  
> +
>  /* Below are generic functions to do memcpy from/to an address space,
>   * without byteswaps, with input validation.
>   *
> -- 
> 2.39.3 (Apple Git-146)




Re: [PATCH v2 2/2] vhost-user: add a request-reply lock

2024-08-29 Thread Michael S. Tsirkin
On Thu, Aug 29, 2024 at 10:29:24AM -0400, Peter Xu wrote:
> On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> > Hello Michael,
> > 
> > On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin  wrote:
> > > Weird.  Seems to indicate some kind of deadlock?
> > 
> > * Such a deadlock should occur across all environments I guess, not
> > sure why it happens selectively. It is strange.
> > 
> > > So maybe vhost_user_postcopy_end should take the BQL?
> > ===
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index e7c1215671..31acda3818 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
> >   */
> >  qemu_event_wait(&mis->main_thread_load_event);
> >  }
> > +bql_lock();
> >  postcopy_ram_incoming_cleanup(mis);
> > +bql_unlock();
> > 
> >  if (load_res < 0) {
> >  /*
> > ===
> > 
> > * Actually a BQL patch above was tested and it worked fine. But not
> > sure if it is an acceptable solution. Another contention was taking
> > BQL could make things more complicated, so a local vhost-user specific
> > lock should be better.
> > 
> > ...wdyt?
> 
> I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not
> in postcopy code directly.

maybe that's better, ok.

>  I'm recently looking at how to make precopy
> load even take less bql and even make it a separate thread. Above is
> definitely going backwards, per we discussed already internally.


At the same time a small bugfix is better, can be backported.


> I cherish postcopy doesn't need to take bql on its own in most paths, and
> we shouldn't add unnecessary bql requirement even if vhost-user isn't used.
> 
> Personally I still prefer we look into why a separate mutex won't work and
> why that timed out; that could be part of whoever is going to investigate
> the whole issue (including the hang later on). Otherwise I'm ok from
> migration pov that we take bql in the vhost-user hook, but not in savevm.c.
> 
> Thanks,

ok

> -- 
> Peter Xu




Re: [PATCH v2 2/2] vhost-user: add a request-reply lock

2024-08-29 Thread Michael S. Tsirkin
On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> Hello Michael,
> 
> On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin  wrote:
> > Weird.  Seems to indicate some kind of deadlock?
> 
> * Such a deadlock should occur across all environments I guess, not
> sure why it happens selectively. It is strange.

Some kind of race?

> > So maybe vhost_user_postcopy_end should take the BQL?
> ===
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e7c1215671..31acda3818 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>   */
>  qemu_event_wait(&mis->main_thread_load_event);
>  }
> +bql_lock();
>  postcopy_ram_incoming_cleanup(mis);
> +bql_unlock();
> 
>  if (load_res < 0) {
>  /*
> ===
> 
> * Actually a BQL patch above was tested and it worked fine. But not
> sure if it is an acceptable solution. Another contention was taking
> BQL could make things more complicated, so a local vhost-user specific
> lock should be better.
> 
> ...wdyt?
> ---
>   - Prasad

Keep it simple, is my advice. Not causing regressions is good.

-- 
MST




Re: [PATCH v2 2/2] vhost-user: add a request-reply lock

2024-08-29 Thread Michael S. Tsirkin
On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit 
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().

So maybe vhost_user_postcopy_end should take the BQL?

>  Thread-1   Thread-2
> 
>  vhost_dev_startpostcopy_ram_incoming_cleanup
>  vhost_device_iotlb_misspostcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply  process_message_reply
>  vhost_user_readvhost_user_read
>  vhost_user_read_header vhost_user_read_header
>  "Fail to update device iotlb"  "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu 
> Signed-off-by: Prasad Pandit 


CC Author and reviewer of the offending commit.


> ---
>  hw/virtio/vhost-user.c | 74 ++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>the lock for longer fails some tests during rpmbuild(8).
>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>on RHEL-9, but not on Fedora-40 machine.
>  - koji builds successful with this patch
>https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
>https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369
> 
> v1: Use QEMU_LOCK_GUARD(), rename lock variable
>  - 
> https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppan...@redhat.com/
> 
> v0:
>  - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
>  - https://lore.kernel.org/all/20240720153808-mutt-send-email-...@kernel.org/
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev 
> *dev, uint64_t base,
>  .hdr.size = sizeof(msg.payload.log),
>  };
> 
> +struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  /* Send only once with first queue pair */
>  if (dev->vq_index != 0) {
>  return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
> bool reply_supported)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
>  struct vhost_memory_region *shadow_reg;
>  int i, fd, shadow_reg_idx, ret;
>  ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>  vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0);
>  msg->payload.mem_reg.region = region_buffer;
> 
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  ret = vhost_user_write(dev, msg, NULL, 0);
>  if (ret < 0) {
>  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>  bool reply_supported, bool track_ramblocks)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
>  int i, fd, ret, reg_idx, reg_fd_idx;
>  struct vhost_memory_region *reg;
>  MemoryRegion *

Re: [PATCH v2 2/2] vhost-user: add a request-reply lock

2024-08-28 Thread Michael S. Tsirkin
On Thu, Aug 29, 2024 at 11:09:44AM +0530, Prasad Pandit wrote:
> On Wed, 28 Aug 2024 at 16:45, Michael S. Tsirkin  wrote:
> > >  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
> > >the lock for longer fails some tests during rpmbuild(8).
> >
> > what do you mean fails rpmbuild? that qemu with this patch can not be 
> > compiled?
> 
> * In V1 of this patch, QEMU_LOCK_GUARD was placed near beginning of
> the function. But that caused some unit tests to fail reporting
> TIMEOUT errors. In this V2, QEMU_LOCK_GUARD is placed near
> vhost_user_write() calls, to reduce the time that lock is held.
> 
> * Both (V1 & V2) compile well, but fail at '%check' stage while
> running unit tests (on some machines), ie. rpm package is not built.
> rpmbuild(8) on F40 machine failed, but koji scratch build with the
> same SRPM worked fine. Those scratch builds are shared above. RHEL-9
> SRPM built well on RHEL-9 host, but failed to build on F40 machine
> reporting failure at '%check' stage of rpmbuild(8).
> 
> Thank you.
> ---
>   - Prasad

Weird.  Seems to indicate some kind of deadlock?

-- 
MST




Re: [PATCH v5 5/8] device/virtio-nsm: Support for Nitro Secure Module device

2024-08-28 Thread Michael S. Tsirkin
On Thu, Aug 29, 2024 at 01:04:05AM +0600, Dorjoy Chowdhury wrote:
> On Thu, Aug 29, 2024 at 12:28 AM Michael S. Tsirkin  wrote:
> >
> > On Thu, Aug 22, 2024 at 09:08:46PM +0600, Dorjoy Chowdhury wrote:
> > > Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves[2]
> > > for stripped down TPM functionality like cryptographic attestation.
> > > The requests to and responses from NSM device are CBOR[3] encoded.
> > >
> > > This commit adds support for NSM device in QEMU. Although related to
> > > AWS Nitro Enclaves, the virito-nsm device is independent and can be
> > > used in other machine types as well. The libcbor[4] library has been
> > > used for the CBOR encoding and decoding functionalities.
> > >
> > > [1] 
> > > https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
> > > [2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> > > [3] http://cbor.io/
> > > [4] https://libcbor.readthedocs.io/en/latest/
> > >
> > > Signed-off-by: Dorjoy Chowdhury 
> > > ---
> > >  MAINTAINERS  |   10 +
> > >  hw/virtio/Kconfig|5 +
> > >  hw/virtio/cbor-helpers.c |  326 ++
> > >  hw/virtio/meson.build|6 +
> > >  hw/virtio/virtio-nsm-pci.c   |   73 ++
> > >  hw/virtio/virtio-nsm.c   | 1638 ++
> > >  include/hw/virtio/cbor-helpers.h |   46 +
> > >  include/hw/virtio/virtio-nsm.h   |   59 ++
> > >  meson.build  |2 +
> > >  9 files changed, 2165 insertions(+)
> 
> [...]
> 
> > > +static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
> > > +{
> > > +g_autofree VirtQueueElement *out_elem = NULL;
> > > +g_autofree VirtQueueElement *in_elem = NULL;
> > > +VirtIONSM *vnsm = VIRTIO_NSM(vdev);
> > > +Error *err = NULL;
> > > +
> > > +out_elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > > +if (!out_elem) {
> > > +/* nothing in virtqueue */
> > > +return;
> > > +}
> > > +
> > > +if (out_elem->out_num != 1) {
> > > +virtio_error(vdev, "Expected one request buffer first in 
> > > virtqueue");
> > > +goto cleanup;
> > > +}
> >
> > Seems to assume request in a single s/g element?
> > We generally avoid this kind of thing.
> >
> > Applies equally elsewheree.
> >
> 
> Thank you for reviewing. I think I did it this way (first virqueue_pop
> gives out_elem with out_num == 1 and the next virtqueue_pop gives
> in_elem with in_num == 1) after seeing what the virqueue contains
> (using printfs) when running in a VM and sending some NSM requests and
> I noticed the above. Can you give me a bit more details about what
> this should be like? Is there any existing virtio device code I can
> look at for example?
> Thanks!


Use iov_to_buf / iov_from_buf

there are many examples in the tree, I'd look for some recent ones.


> > > +
> > > +in_elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > > +if (!in_elem) {
> > > +virtio_error(vdev, "Expected response buffer after request 
> > > buffer "
> > > + "in virtqueue");
> > > +goto cleanup;
> > > +}
> > > +if (in_elem->in_num != 1) {
> > > +virtio_error(vdev, "Expected one response buffer after request 
> > > buffer "
> > > + "in virtqueue");
> > > +goto cleanup;
> > > +}
> > > +
> > > +if (!get_nsm_request_response(vnsm, out_elem->out_sg, in_elem->in_sg,
> > > +  &err)) {
> > > +error_report_err(err);
> > > +virtio_error(vdev, "Failed to get NSM request response");
> > > +goto cleanup;
> > > +}
> > > +
> > > +virtqueue_push(vq, out_elem, 0);
> > > +virtqueue_push(vq, in_elem, in_elem->in_sg->iov_len);
> > > +virtio_notify(vdev, vq);
> > > +return;
> > > +
> > > + cleanup:
> > > +if (out_elem) {
> > > +virtqueue_detach_element(vq, out_elem, 0);
> > > +}
> > > +if (in_elem) {
> > > +virtqueue_detach_element(vq, in_elem, 0);
> > > +}
> > > +return;
> > > +}
> > > +
> > > +static uint64_t get_features(VirtIODevice *vdev, uint64_t f, Error 
> > > **errp)
> > > +{
> > > +return f;
> > > +}
> > > +
> 
> [...]
> 
> Regards,
> Dorjoy




Re: [PATCH v5 5/8] device/virtio-nsm: Support for Nitro Secure Module device

2024-08-28 Thread Michael S. Tsirkin
On Thu, Aug 22, 2024 at 09:08:46PM +0600, Dorjoy Chowdhury wrote:
> Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves[2]
> for stripped down TPM functionality like cryptographic attestation.
> The requests to and responses from NSM device are CBOR[3] encoded.
> 
> This commit adds support for NSM device in QEMU. Although related to
> AWS Nitro Enclaves, the virito-nsm device is independent and can be
> used in other machine types as well. The libcbor[4] library has been
> used for the CBOR encoding and decoding functionalities.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html
> [2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> [3] http://cbor.io/
> [4] https://libcbor.readthedocs.io/en/latest/
> 
> Signed-off-by: Dorjoy Chowdhury 
> ---
>  MAINTAINERS  |   10 +
>  hw/virtio/Kconfig|5 +
>  hw/virtio/cbor-helpers.c |  326 ++
>  hw/virtio/meson.build|6 +
>  hw/virtio/virtio-nsm-pci.c   |   73 ++
>  hw/virtio/virtio-nsm.c   | 1638 ++
>  include/hw/virtio/cbor-helpers.h |   46 +
>  include/hw/virtio/virtio-nsm.h   |   59 ++
>  meson.build  |2 +
>  9 files changed, 2165 insertions(+)
>  create mode 100644 hw/virtio/cbor-helpers.c
>  create mode 100644 hw/virtio/virtio-nsm-pci.c
>  create mode 100644 hw/virtio/virtio-nsm.c
>  create mode 100644 include/hw/virtio/cbor-helpers.h
>  create mode 100644 include/hw/virtio/virtio-nsm.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3584d6a6c6..da4f698137 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2340,6 +2340,16 @@ F: include/sysemu/rng*.h
>  F: backends/rng*.c
>  F: tests/qtest/virtio-rng-test.c
>  
> +virtio-nsm
> +M: Alexander Graf 
> +M: Dorjoy Chowdhury 
> +S: Maintained
> +F: hw/virtio/cbor-helpers.c
> +F: hw/virtio/virtio-nsm.c
> +F: hw/virtio/virtio-nsm-pci.c
> +F: include/hw/virtio/cbor-helpers.h
> +F: include/hw/virtio/virtio-nsm.h
> +
>  vhost-user-stubs
>  M: Alex Bennée 
>  S: Maintained
> diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
> index aa63ff7fd4..29fee32035 100644
> --- a/hw/virtio/Kconfig
> +++ b/hw/virtio/Kconfig
> @@ -6,6 +6,11 @@ config VIRTIO_RNG
>  default y
>  depends on VIRTIO
>  
> +config VIRTIO_NSM
> +   bool
> +   default y
> +   depends on VIRTIO
> +
>  config VIRTIO_IOMMU
>  bool
>  default y
> diff --git a/hw/virtio/cbor-helpers.c b/hw/virtio/cbor-helpers.c
> new file mode 100644
> index 00..a0e58d6862
> --- /dev/null
> +++ b/hw/virtio/cbor-helpers.c
> @@ -0,0 +1,326 @@
> +/*
> + * QEMU CBOR helpers
> + *
> + * Copyright (c) 2024 Dorjoy Chowdhury 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#include "hw/virtio/cbor-helpers.h"
> +
> +bool qemu_cbor_map_add(cbor_item_t *map, cbor_item_t *key, cbor_item_t 
> *value)
> +{
> +bool success = false;
> +struct cbor_pair pair = (struct cbor_pair) {
> +.key = cbor_move(key),
> +.value = cbor_move(value)
> +};
> +
> +success = cbor_map_add(map, pair);
> +if (!success) {
> +cbor_incref(pair.key);
> +cbor_incref(pair.value);
> +}
> +
> +return success;
> +}
> +
> +bool qemu_cbor_array_push(cbor_item_t *array, cbor_item_t *value)
> +{
> +bool success = false;
> +
> +success = cbor_array_push(array, cbor_move(value));
> +if (!success) {
> +cbor_incref(value);
> +}
> +
> +return success;
> +}
> +
> +bool qemu_cbor_add_bool_to_map(cbor_item_t *map, const char *key, bool value)
> +{
> +cbor_item_t *key_cbor = NULL;
> +cbor_item_t *value_cbor = NULL;
> +
> +key_cbor = cbor_build_string(key);
> +if (!key_cbor) {
> +goto cleanup;
> +}
> +value_cbor = cbor_build_bool(value);
> +if (!value_cbor) {
> +goto cleanup;
> +}
> +if (!qemu_cbor_map_add(map, key_cbor, value_cbor)) {
> +goto cleanup;
> +}
> +
> +return true;
> +
> + cleanup:
> +if (key_cbor) {
> +cbor_decref(&key_cbor);
> +}
> +if (value_cbor) {
> +cbor_decref(&value_cbor);
> +}
> +return false;
> +}
> +
> +bool qemu_cbor_add_uint8_to_map(cbor_item_t *map, const char *key,
> +uint8_t value)
> +{
> +cbor_item_t *key_cbor = NULL;
> +cbor_item_t *value_cbor = NULL;
> +
> +key_cbor = cbor_build_string(key);
> +if (!key_cbor) {
> +goto cleanup;
> +}
> +value_cbor = cbor_build_uint8(value);
> +if (!value_cbor) {
> +goto cleanup;
> +}
> +if (!qemu_cbor_map_add(map, key_cbor, value_cbor)) {
> +goto cleanup;
> +}
> +
> +return true;
> +
> + cleanup:
> +if (key_cbor) {
> +cbor_decref(&key_cbor);
> +}
> +if (value_cbor) {
> +cbor_decref(&value_cbor);
> +

Re: [PATCH v2 2/2] vhost-user: add a request-reply lock

2024-08-28 Thread Michael S. Tsirkin
On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit 
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().
> 
>  Thread-1   Thread-2
> 
>  vhost_dev_startpostcopy_ram_incoming_cleanup
>  vhost_device_iotlb_misspostcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply  process_message_reply
>  vhost_user_readvhost_user_read
>  vhost_user_read_header vhost_user_read_header
>  "Fail to update device iotlb"  "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu 
> Signed-off-by: Prasad Pandit 
> ---
>  hw/virtio/vhost-user.c | 74 ++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>the lock for longer fails some tests during rpmbuild(8).

what do you mean fails rpmbuild? that qemu with this
patch can not be compiled?

>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>on RHEL-9, but not on Fedora-40 machine.
>  - koji builds successful with this patch
>https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
>https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369
> 
> v1: Use QEMU_LOCK_GUARD(), rename lock variable
>  - 
> https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppan...@redhat.com/
> 
> v0:
>  - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
>  - https://lore.kernel.org/all/20240720153808-mutt-send-email-...@kernel.org/
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev 
> *dev, uint64_t base,
>  .hdr.size = sizeof(msg.payload.log),
>  };
> 
> +struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  /* Send only once with first queue pair */
>  if (dev->vq_index != 0) {
>  return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
> bool reply_supported)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
>  struct vhost_memory_region *shadow_reg;
>  int i, fd, shadow_reg_idx, ret;
>  ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>  vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0);
>  msg->payload.mem_reg.region = region_buffer;
> 
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  ret = vhost_user_write(dev, msg, NULL, 0);
>  if (ret < 0) {
>  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>  bool reply_supported, bool track_ramblocks)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
>  int i, fd, ret, reg_idx, reg_fd_idx;
>  struct vhost_memory_region *reg;
>  MemoryRegion *mr;
> @@ -746,6 +755,8 

Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API

2024-08-27 Thread Michael S. Tsirkin
On Tue, Aug 27, 2024 at 04:15:42PM -0400, Peter Xu wrote:
> On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote:
> > From: Jialin Wang 
> > 
> > Hi,
> > 
> > This patch series attempts to refactor RDMA live migration by
> > introducing a new QIOChannelRDMA class based on the rsocket API.
> > 
> > The /usr/include/rdma/rsocket.h provides a higher level rsocket API
> > that is a 1-1 match of the normal kernel 'sockets' API, which hides the
> > detail of rdma protocol into rsocket and allows us to add support for
> > some modern features like multifd more easily.
> > 
> > Here is the previous discussion on refactoring RDMA live migration using
> > the rsocket API:
> > 
> > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-phi...@linaro.org/
> > 
> > We have encountered some bugs when using rsocket and plan to submit them to
> > the rdma-core community.
> > 
> > In addition, the use of rsocket makes our programming more convenient,
> > but it must be noted that this method introduces multiple memory copies,
> > which can be imagined that there will be a certain performance degradation,
> > hoping that friends with RDMA network cards can help verify, thank you!
> > 
> > Jialin Wang (6):
> >   migration: remove RDMA live migration temporarily
> >   io: add QIOChannelRDMA class
> >   io/channel-rdma: support working in coroutine
> >   tests/unit: add test-io-channel-rdma.c
> >   migration: introduce new RDMA live migration
> >   migration/rdma: support multifd for RDMA migration
> 
> This series has been idle for a while; we still need to know how to move
> forward.


What exactly is the question? This got a bunch of comments,
the first thing to do would be to address them.


>  I guess I lost the latest status quo..
> 
> Any update (from anyone..) on what stage are we in?
> 
> Thanks,
> -- 
> Peter Xu




Re: [PATCH v3] vhost-user: Do not wait for reply for not sent VHOST_USER_SET_LOG_BASE

2024-08-27 Thread Michael S. Tsirkin
On Tue, Aug 27, 2024 at 09:00:35PM +0800, BillXiang wrote:
> 
> > From: "Prasad Pandit"
> > Date:  Tue, Aug 27, 2024, 20:37
> > Subject:  Re: [PATCH v3] vhost-user: Do not wait for reply for not sent 
> > VHOST_USER_SET_LOG_BASE
> > To: "BillXiang"
> > Cc: "Michael S. Tsirkin", 
> > On Tue, 27 Aug 2024 at 16:50, BillXiang  wrote:
> > > it's better to be consistent to use vhost_user_per_device_request for 
> > > those per-device messages, right?
> > 
> > * ...consistent to use? Could you please elaborate a little?
> > 
> > Thank you.
> > ---
> >   - Prasad
> 
> That was elaborated in commit b931bfbf0429 (" vhost-user: add multiple queue 
> support "). 
> We have added vhost_user_one_time_request() to send those per-device messages 
> only once 
> for multi-queue device. Which was then changed to 
> vhost_user_per_device_request() in 
> commit 0dcb4172f2ce ("vhost-user: Change one_time to per_device request").
> And VHOST_USER_SET_LOG_BASE should be one of those per-device messages that 
> only
> be sent once for multi-queue device.

Bill,
it's important to make it clear, in the commit message, what is the
current behaviour and what is the effect of the patch.
For example: currently qemu hangs waiting for , to fix,
 so we never wait for  .
At the moment, I'm not really sure if this is a bugfix, or
a cleanup, or what.


-- 
MST




[PULL 1/3] vhost: Add VIRTIO_NET_F_RSC_EXT to vhost feature bits

2024-08-20 Thread Michael S. Tsirkin
From: Akihiko Odaki 

VIRTIO_NET_F_RSC_EXT is implemented in the rx data path, which vhost
implements, so vhost needs to support the feature if it is ever to be
enabled with vhost. The feature must be disabled otherwise.

Fixes: 2974e916df87 ("virtio-net: support RSC v4/v6 tcp traffic for Windows 
HCK")
Reported-by: Jason Wang 
Signed-off-by: Akihiko Odaki 
Message-Id: <20240802-rsc-v1-1-2b607bd2f...@daynix.com>
Acked-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/net/vhost_net.c | 2 ++
 net/vhost-vdpa.c   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index a788e6937e..dedf9ad7c2 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -50,6 +50,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_RING_RESET,
 VIRTIO_F_IN_ORDER,
 VIRTIO_F_NOTIFICATION_DATA,
+VIRTIO_NET_F_RSC_EXT,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -81,6 +82,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_RESET,
 VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
+VIRTIO_NET_F_RSC_EXT,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
 VIRTIO_NET_F_GUEST_USO6,
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 03457ead66..46b02c50be 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -88,6 +88,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_NET_F_MQ,
 VIRTIO_NET_F_MRG_RXBUF,
 VIRTIO_NET_F_MTU,
+VIRTIO_NET_F_RSC_EXT,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_STATUS,
 VIRTIO_RING_F_EVENT_IDX,
-- 
MST




[PULL 2/3] hw/audio/virtio-snd: fix invalid param check

2024-08-20 Thread Michael S. Tsirkin
From: Volker Rümelin 

Commit 9b6083465f ("virtio-snd: check for invalid param shift
operands") tries to prevent invalid parameters specified by the
guest. However, the code is not correct.

Change the code so that the parameters format and rate, which are
a bit numbers, are compared with the bit size of the data type.

Fixes: 9b6083465f ("virtio-snd: check for invalid param shift operands")
Signed-off-by: Volker Rümelin 
Message-Id: <20240802071805.7123-1-vr_q...@t-online.de>
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/audio/virtio-snd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/audio/virtio-snd.c b/hw/audio/virtio-snd.c
index e5196aa4bb..d1cf5eb445 100644
--- a/hw/audio/virtio-snd.c
+++ b/hw/audio/virtio-snd.c
@@ -282,12 +282,12 @@ uint32_t virtio_snd_set_pcm_params(VirtIOSound *s,
 error_report("Number of channels is not supported.");
 return cpu_to_le32(VIRTIO_SND_S_NOT_SUPP);
 }
-if (BIT(params->format) > sizeof(supported_formats) ||
+if (params->format >= sizeof(supported_formats) * BITS_PER_BYTE ||
 !(supported_formats & BIT(params->format))) {
 error_report("Stream format is not supported.");
 return cpu_to_le32(VIRTIO_SND_S_NOT_SUPP);
 }
-if (BIT(params->rate) > sizeof(supported_rates) ||
+if (params->rate >= sizeof(supported_rates) * BITS_PER_BYTE ||
 !(supported_rates & BIT(params->rate))) {
 error_report("Stream rate is not supported.");
 return cpu_to_le32(VIRTIO_SND_S_NOT_SUPP);
-- 
MST




[PULL 3/3] virtio-pci: Fix the use of an uninitialized irqfd

2024-08-20 Thread Michael S. Tsirkin
From: Cindy Lu 

The crash was reported in MAC OS and NixOS, here is the link for this bug
https://gitlab.com/qemu-project/qemu/-/issues/2334
https://gitlab.com/qemu-project/qemu/-/issues/2321

In this bug, they are using the virtio_input device. The guest notifier was
not supported for this device, The function virtio_pci_set_guest_notifiers()
was not called, and the vector_irqfd was not initialized.

So the fix is adding the check for vector_irqfd in virtio_pci_get_notifier()

The function virtio_pci_get_notifier() can be used in various devices.
It could also be called when VIRTIO_CONFIG_S_DRIVER_OK is not set. In this 
situation,
the vector_irqfd being NULL is acceptable. We can allow the device continue to 
boot

If the vector_irqfd still hasn't been initialized after 
VIRTIO_CONFIG_S_DRIVER_OK
is set, it means that the function set_guest_notifiers was not called before the
driver started. This indicates that the device is not using the notifier.
At this point, we will let the check fail.

This fix is verified in vyatta,MacOS,NixOS,fedora system.

The bt tree for this bug is:
Thread 6 "CPU 0/KVM" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7c817be006c0 (LWP 1269146)]
kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
817 if (irqfd->users == 0) {
(gdb) thread apply all bt
...
Thread 6 (Thread 0x7c817be006c0 (LWP 1269146) "CPU 0/KVM"):
0  kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
1  kvm_virtio_pci_vector_use_one () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:893
2  0x5983657045e2 in memory_region_write_accessor () at 
../qemu-9.0.0/system/memory.c:497
3  0x598365704ba6 in access_with_adjusted_size () at 
../qemu-9.0.0/system/memory.c:573
4  0x598365705059 in memory_region_dispatch_write () at 
../qemu-9.0.0/system/memory.c:1528
5  0x5983659b8e1f in flatview_write_continue_step.isra.0 () at 
../qemu-9.0.0/system/physmem.c:2713
6  0x59836570ba7d in flatview_write_continue () at 
../qemu-9.0.0/system/physmem.c:2743
7  flatview_write () at ../qemu-9.0.0/system/physmem.c:2774
8  0x59836570bb76 in address_space_write () at 
../qemu-9.0.0/system/physmem.c:2894
9  0x598365763afe in address_space_rw () at 
../qemu-9.0.0/system/physmem.c:2904
10 kvm_cpu_exec () at ../qemu-9.0.0/accel/kvm/kvm-all.c:2917
11 0x59836576656e in kvm_vcpu_thread_fn () at 
../qemu-9.0.0/accel/kvm/kvm-accel-ops.c:50
12 0x598365926ca8 in qemu_thread_start () at 
../qemu-9.0.0/util/qemu-thread-posix.c:541
13 0x7c8185bcd1cf in ??? () at /usr/lib/libc.so.6
14 0x7c8185c4e504 in clone () at /usr/lib/libc.so.6

Fixes: 2ce6cff94d ("virtio-pci: fix use of a released vector")
Cc: qemu-sta...@nongnu.org
Signed-off-by: Cindy Lu 
Message-Id: <20240806093715.65105-1-l...@redhat.com>
Acked-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 9534730bba..524b63e5c7 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -866,6 +866,9 @@ static int virtio_pci_get_notifier(VirtIOPCIProxy *proxy, 
int queue_no,
 VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
 VirtQueue *vq;
 
+if (!proxy->vector_irqfd && vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)
+return -1;
+
 if (queue_no == VIRTIO_CONFIG_IRQ_IDX) {
 *n = virtio_config_get_guest_notifier(vdev);
 *vector = vdev->config_vector;
-- 
MST




[PULL 0/3] virtio: regression fixes

2024-08-20 Thread Michael S. Tsirkin
The following changes since commit 76277cf82f0e1123bd69ec59d22014b8f78485ec:

  Merge tag 'hw-misc-20240820' of https://github.com/philmd/qemu into staging 
(2024-08-20 09:17:41 +1000)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to a8e63ff289d137197ad7a701a587cc432872d798:

  virtio-pci: Fix the use of an uninitialized irqfd (2024-08-20 06:57:47 -0400)


virtio: regression fixes

3 small patches to make sure we don't ship regressions.

Signed-off-by: Michael S. Tsirkin 


Akihiko Odaki (1):
  vhost: Add VIRTIO_NET_F_RSC_EXT to vhost feature bits

Cindy Lu (1):
  virtio-pci: Fix the use of an uninitialized irqfd

Volker Rümelin (1):
  hw/audio/virtio-snd: fix invalid param check

 hw/audio/virtio-snd.c  | 4 ++--
 hw/net/vhost_net.c | 2 ++
 hw/virtio/virtio-pci.c | 3 +++
 net/vhost-vdpa.c   | 1 +
 4 files changed, 8 insertions(+), 2 deletions(-)




Re: [PATCH v2] hw/virtio/vdpa-dev: Check returned value instead of dereferencing @errp

2024-08-20 Thread Michael S. Tsirkin
On Wed, Jul 17, 2024 at 12:26:15AM +0800, Zhao Liu wrote:
> As the comment in qapi/error, dereferencing @errp requires
> ERRP_GUARD():
> 
> * = Why, when and how to use ERRP_GUARD() =
> *
> * Without ERRP_GUARD(), use of the @errp parameter is restricted:
> * - It must not be dereferenced, because it may be null.
> ...
> * ERRP_GUARD() lifts these restrictions.
> *
> * To use ERRP_GUARD(), add it right at the beginning of the function.
> * @errp can then be used without worrying about the argument being
> * NULL or &error_fatal.
> *
> * Using it when it's not needed is safe, but please avoid cluttering
> * the source with useless code.
> 
> Though vhost_vdpa_device_realize() is called at DeviceClass.realize()
> context and won't get NULL @errp, it's still better to follow the
> requirement to add the ERRP_GUARD().
> 
> But qemu_open() and vhost_vdpa_device_get_u32()'s return values can
> distinguish between successful and unsuccessful calls, so check the
> return values directly without dereferencing @errp, which eliminates
> the need of ERRP_GUARD().
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Acked-by: Eugenio Pérez 
> Signed-off-by: Zhao Liu 
> ---
> v2:
>  * Added a/b from Eugenio.
>  * Deleted unnecessary ERRP_GUARD(). (Eugenio)
> ---
>  hw/virtio/vdpa-dev.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> index 64b96b226c39..8a1e16fce3de 100644
> --- a/hw/virtio/vdpa-dev.c
> +++ b/hw/virtio/vdpa-dev.c
> @@ -63,19 +63,19 @@ static void vhost_vdpa_device_realize(DeviceState *dev, 
> Error **errp)
>  }
>  
>  v->vhostfd = qemu_open(v->vhostdev, O_RDWR, errp);
> -if (*errp) {
> +if (v->vhostfd < 0) {
>  return;
>  }
>  
>  v->vdev_id = vhost_vdpa_device_get_u32(v->vhostfd,
> VHOST_VDPA_GET_DEVICE_ID, errp);
> -if (*errp) {
> +if (v->vdev_id < 0) {
>  goto out;
>  }

vdev_id is unsigned, no idea how is this supposed to work.

>  
>  max_queue_size = vhost_vdpa_device_get_u32(v->vhostfd,
> VHOST_VDPA_GET_VRING_NUM, 
> errp);
> -if (*errp) {
> +if (max_queue_size < 0) {
>  goto out;
>  }
>  
max_queue_size is unsigned, too.

> @@ -89,7 +89,7 @@ static void vhost_vdpa_device_realize(DeviceState *dev, 
> Error **errp)
>  
>  v->num_queues = vhost_vdpa_device_get_u32(v->vhostfd,
>VHOST_VDPA_GET_VQS_COUNT, 
> errp);
> -if (*errp) {
> +if (v->num_queues < 0) {
>  goto out;
>  }
>  

num_queues is unsigned, too.

> @@ -127,7 +127,7 @@ static void vhost_vdpa_device_realize(DeviceState *dev, 
> Error **errp)
>  v->config_size = vhost_vdpa_device_get_u32(v->vhostfd,
> VHOST_VDPA_GET_CONFIG_SIZE,
> errp);
> -if (*errp) {
> +if (v->config_size < 0) {
>  goto vhost_cleanup;
>  }
>  
> -- 
> 2.34.1




Re: [RFC-PATCH v2] vhost-user: add a request-reply lock

2024-08-19 Thread Michael S. Tsirkin
On Mon, Aug 19, 2024 at 11:42:02AM -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 19, 2024 at 05:32:48PM +0530, Prasad Pandit wrote:
> > From: Prasad Pandit 
> > 
> > QEMU threads use vhost_user_write/read calls to send
> > and receive request/reply messages from a vhost-user
> > device. When multiple threads communicate with the
> > same vhost-user device, they can receive each other's
> > messages, resulting in an erroneous state.
> > 
> > When fault_thread exits upon completion of Postcopy
> > migration, it sends a 'postcopy_end' message to the
> > vhost-user device. But sometimes 'postcopy_end' message
> > is sent while vhost device is being setup via
> > vhost_dev_start().
> > 
> >  Thread-1   Thread-2
> > 
> >  vhost_dev_startpostcopy_ram_incoming_cleanup
> >  vhost_device_iotlb_misspostcopy_notify
> >  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
> >  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
> >  process_message_reply  process_message_reply
> >  vhost_user_readvhost_user_read
> >  vhost_user_read_header vhost_user_read_header
> >  "Fail to update device iotlb"  "Failed to receive reply to 
> > postcopy_end"
> > 
> > This creates confusion when vhost-user device receives
> > 'postcopy_end' message while it is trying to update
> > IOTLB entries.
> > 
> >  vhost_user_read_header:
> >   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> >  vhost_device_iotlb_miss:
> >   700871,700871: Fail to update device iotlb
> >  vhost_user_postcopy_end:
> >   700871,700900: Failed to receive reply to postcopy_end
> >  vhost_user_read_header:
> >   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> > 
> > Here fault thread seems to end the postcopy migration
> > while another thread is starting the vhost-user device.
> > 
> > Add a mutex lock to hold for one request-reply cycle
> > and avoid such race condition.
> > 
> > Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> > Suggested-by: Peter Xu 
> > Signed-off-by: Prasad Pandit 
> 
> makes sense.
> Acked-by: Michael S. Tsirkin 
> But do not post v2 as reply to v1 pls.


Also, looks like this will replace Message-Id: 
<20240801124540.38774-1-xiangwench...@dayudpu.com>
correct?

> > ---
> >  hw/virtio/vhost-user.c | 74 ++
> >  include/hw/virtio/vhost-user.h |  3 ++
> >  2 files changed, 77 insertions(+)
> > 
> > v2:
> >  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
> >the lock for longer fails some tests during rpmbuild(8).
> >  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
> >this patch, whereas Fedora SRPM does not build.
> >  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
> >on RHEL-9, but not on Fedora-40 machine.
> > 
> > v1: 
> > https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppan...@redhat.com/#R
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 00561daa06..7b030ae2cd 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -24,6 +24,7 @@
> >  #include "qemu/main-loop.h"
> >  #include "qemu/uuid.h"
> >  #include "qemu/sockets.h"
> > +#include "qemu/lockable.h"
> >  #include "sysemu/runstate.h"
> >  #include "sysemu/cryptodev.h"
> >  #include "migration/postcopy-ram.h"
> > @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev 
> > *dev, uint64_t base,
> >  .hdr.size = sizeof(msg.payload.log),
> >  };
> >  
> > +struct vhost_user *u = dev->opaque;
> > +struct VhostUserState *us = u->user;
> > +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> > +
> >  /* Send only once with first queue pair */
> >  if (dev->vq_index != 0) {
> >  return 0;
> > @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
> > bool reply_supported)
> >  {
> >  struct vhost_user *u = dev->opaque;
> > +struct VhostUserState *us = u->user;
> >  struct vhost_memory_region *shadow_reg;
> >  int i, fd, shadow_reg_idx, ret;
> >  

Re: [RFC-PATCH v2] vhost-user: add a request-reply lock

2024-08-19 Thread Michael S. Tsirkin
On Mon, Aug 19, 2024 at 05:32:48PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit 
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().
> 
>  Thread-1   Thread-2
> 
>  vhost_dev_startpostcopy_ram_incoming_cleanup
>  vhost_device_iotlb_misspostcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply  process_message_reply
>  vhost_user_readvhost_user_read
>  vhost_user_read_header vhost_user_read_header
>  "Fail to update device iotlb"  "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update
> IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu 
> Signed-off-by: Prasad Pandit 

makes sense.
Acked-by: Michael S. Tsirkin 
But do not post v2 as reply to v1 pls.

> ---
>  hw/virtio/vhost-user.c | 74 ++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>the lock for longer fails some tests during rpmbuild(8).
>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>on RHEL-9, but not on Fedora-40 machine.
> 
> v1: 
> https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppan...@redhat.com/#R
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev 
> *dev, uint64_t base,
>  .hdr.size = sizeof(msg.payload.log),
>  };
>  
> +struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  /* Send only once with first queue pair */
>  if (dev->vq_index != 0) {
>  return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
> bool reply_supported)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
>  struct vhost_memory_region *shadow_reg;
>  int i, fd, shadow_reg_idx, ret;
>  ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>  vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0);
>  msg->payload.mem_reg.region = region_buffer;
>  
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  ret = vhost_user_write(dev, msg, NULL, 0);
>  if (ret < 0) {
>  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>  bool reply_supported, bool track_ramblocks)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct 

Re: [PATCH v2 4/4] virtio-net: Add support for USO features

2024-08-18 Thread Michael S. Tsirkin
On Sun, Aug 18, 2024 at 02:04:29PM +0900, Akihiko Odaki wrote:
> On 2024/08/09 21:50, Fabiano Rosas wrote:
> > Peter Xu  writes:
> > 
> > > On Thu, Aug 08, 2024 at 10:47:28AM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Aug 08, 2024 at 10:15:36AM -0400, Peter Xu wrote:
> > > > > On Thu, Aug 08, 2024 at 07:12:14AM -0400, Michael S. Tsirkin wrote:
> > > > > > This is too big of a hammer. People already use what you call "cross
> > > > > > migrate" and have for years. We are not going to stop developing
> > > > > > features just because someone suddenly became aware of some such 
> > > > > > bit.
> > > > > > If you care, you will have to work to solve the problem properly -
> > > > > > nacking half baked hacks is the only tool maintainers have to make
> > > > > > people work on hard problems.
> > > > > 
> > > > > IMHO this is totally different thing.  It's not about proposing a new
> > > > > feature yet so far, it's about how we should fix a breakage first.
> > > > > 
> > > > > And that's why I think we should fix it even in the simple way first, 
> > > > > then
> > > > > we consider anything more benefitial from perf side without breaking
> > > > > anything, which should be on top of that.
> > > > > 
> > > > > Thanks,
> > > > 
> > > > As I said, once the quick hack is merged people stop caring.
> > > 
> > > IMHO it's not a hack. It's a proper fix to me to disable it by default for
> > > now.
> > > 
> > > OTOH, having it ON always even knowing it can break migration is a hack to
> > > me, when we don't have anything else to guard the migration.
> > > 
> > > > Mixing different kernel versions in migration is esoteric enough for
> > > > this not to matter to most people. There's no rush I think, address
> > > > it properly.
> > > 
> > > Exactly mixing kernel versions will be tricky to users to identify, but
> > > that's, AFAICT, exactly happening everywhere.  We can't urge user to 
> > > always
> > > use the exact same kernels when we're talking about a VM cluster.  That's
> > > why I think allowing migration to work across those kernels matter.
> > 
> > I also worry a bit about the scenario where the cluster changes slightly
> > and now all VMs are already restricted by some option that requires the
> > exact same kernel. Specifically, kernel changes in a cloud environment
> > also happen due to factors completely unrelated to migration. I'm not
> > sure the people managing the infra (who care about migration) will be
> > gating kernel changes just because QEMU has been configured in a
> > specific manner.
> 
> I have wrote a bit about the expectation on the platform earlier[1], but let
> me summarize it here.
> 
> 1. I expect the user will not downgrade the platform of hosts after setting
> up a VM. This is essential to enable any platform feature.
> 
> 2. The user is allowed to upgrade the platform of hosts gradually. This
> results in a situation with mixed platforms. The oldest platform is still
> not older than the platform the VM is set up for. This enables the gradual
> deployment strategy.
> 
> 3. the user is allowed to downgrade the platform of hosts to the version
> used when setting up the VM. This enables rollbacks in case of regression.
> 
> With these expectations, we can ensure migratability by a) enabling platform
> features available on all hosts when setting up the VM and b) saving the
> enabled features. This is covered with my
> -dump-platform/-merge-platform/-use-platform proposal[2].

I really like [2]. Do you plan to work on it? Does anyone else?

> Regards,
> Akihiko Odaki
> 
> [1]
> https://lore.kernel.org/r/2b62780c-a6cb-4262-beb5-81d54c14f...@daynix.com
> [2]
> https://lore.kernel.org/all/2da4ebcd-2058-49c3-a4ec-8e60536e5...@daynix.com/




Re: [PATCH 1/1] virtio-pci: return RAM device MR when set host notifier success

2024-08-16 Thread Michael S. Tsirkin
On Mon, Aug 12, 2024 at 08:20:27PM +0800, Gao Shiyuan wrote:
> When vhost-user backend register memory region based host notifiers,
> we should return RAM device MR of notify region MR's subregion in
> virtio_address_space_lookup.
> 
> In seabios, it will use virtio PCI Configration Access Capability
> access notify region when assign notify region above 4GB. This will
> exit to QEMU and invoke virtio_address_space_write. When vhost-user
> backend register memory region based host notifiers, return RAM device
> MR instead of notify region MR is suitable.


I can't really parse this.

> Co-developed-by: Zuo Boqun 
> Signed-off-by: Gao Shiyuan 
> Signed-off-by: Zuo Boqun 

CC Jason

> ---
>  hw/virtio/virtio-pci.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 9534730bba..167ac9718a 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -610,13 +610,22 @@ static MemoryRegion 
> *virtio_address_space_lookup(VirtIOPCIProxy *proxy,
>  {
>  int i;
>  VirtIOPCIRegion *reg;
> +MemoryRegion *mr, *submr;
>  
>  for (i = 0; i < ARRAY_SIZE(proxy->regs); ++i) {
>  reg = &proxy->regs[i];
>  if (*off >= reg->offset &&
>  *off + len <= reg->offset + reg->size) {
>  *off -= reg->offset;
> -return ®->mr;
> +mr = ®->mr;
> +QTAILQ_FOREACH(submr, &mr->subregions, subregions_link) {
> +if (*off >= submr->addr &&
> +*off + len < submr->addr + submr->size) {
> +*off -= submr->addr;
> +return submr;
> +}
> +}
> +return mr;
>  }
>  }

Poking at internals of MR like this is not nice.
Doesn't memory_region_find work for this?



>  
> -- 
> 2.39.3 (Apple Git-146)




Re: [PATCH-for-9.1? v2] hw/pci/pci-hmp-cmds: Avoid displaying bogus size in 'info pci'

2024-08-16 Thread Michael S. Tsirkin
On Fri, Aug 16, 2024 at 08:16:15AM +0200, Philippe Mathieu-Daudé wrote:
> ping

tagged this now, thanks!




Re: [PATCH] vhost_net: configure all host notifiers in a single MR transaction

2024-08-16 Thread Michael S. Tsirkin
On Fri, Aug 16, 2024 at 03:08:35PM +0800, zuoboqun wrote:
> This allows the vhost_net device which has multiple virtqueues to batch
> the setup of all its host notifiers. This significantly reduces the
> vhost_net device starting and stoping time, e.g. the time spend
> on enabling notifiers reduce from 630ms to 75ms and the time spend on
> disabling notifiers reduce from 441ms to 45ms for a VM with 192 vCPUs
> and 15 vhost-user-net devices (64vq per device) in our case.
> 
> Signed-off-by: zuoboqun 

Looks good, tagged for past the release.

> ---
>  hw/net/vhost_net.c| 155 +++---
>  hw/virtio/vhost.c |   6 +-
>  include/hw/virtio/vhost.h |   4 +
>  3 files changed, 150 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index a788e6937e..28a9aca1a7 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -160,6 +160,135 @@ void vhost_net_save_acked_features(NetClientState *nc)
>  #endif
>  }
>  
> +static void vhost_net_disable_notifiers_nvhosts(VirtIODevice *dev,
> +NetClientState *ncs, int data_queue_pairs, int nvhosts)
> +{
> +VirtIONet *n = VIRTIO_NET(dev);
> +BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> +struct vhost_net *net;
> +struct vhost_dev *hdev;
> +int r, i, j;
> +NetClientState *peer;
> +
> +/*
> + * Batch all the host notifiers in a single transaction to avoid
> + * quadratic time complexity in address_space_update_ioeventfds().
> + */
> +memory_region_transaction_begin();
> +
> +for (i = 0; i < nvhosts; i++) {
> +if (i < data_queue_pairs) {
> +peer = qemu_get_peer(ncs, i);
> +} else {
> +peer = qemu_get_peer(ncs, n->max_queue_pairs);
> +}
> +
> +net = get_vhost_net(peer);
> +hdev = &net->dev;
> +for (j = 0; j < hdev->nvqs; j++) {
> +r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus),
> + hdev->vq_index + j,
> + false);
> +if (r < 0) {
> +error_report("vhost %d VQ %d notifier cleanup failed: %d",
> +  i, j, -r);
> +}
> +assert(r >= 0);
> +}
> +}
> +/*
> + * The transaction expects the ioeventfds to be open when it
> + * commits. Do it now, before the cleanup loop.
> + */
> +memory_region_transaction_commit();
> +
> +for (i = 0; i < nvhosts; i++) {
> +if (i < data_queue_pairs) {
> +peer = qemu_get_peer(ncs, i);
> +} else {
> +peer = qemu_get_peer(ncs, n->max_queue_pairs);
> +}
> +
> +net = get_vhost_net(peer);
> +hdev = &net->dev;
> +for (j = 0; j < hdev->nvqs; j++) {
> +virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus),
> + hdev->vq_index + j);
> +}
> +virtio_device_release_ioeventfd(dev);
> +}
> +}
> +
> +static int vhost_net_enable_notifiers(VirtIODevice *dev,
> +NetClientState *ncs, int data_queue_pairs, int cvq)
> +{
> +VirtIONet *n = VIRTIO_NET(dev);
> +BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
> +int nvhosts = data_queue_pairs + cvq;
> +struct vhost_net *net;
> +struct vhost_dev *hdev;
> +int r, i, j;
> +NetClientState *peer;
> +
> +/*
> + * Batch all the host notifiers in a single transaction to avoid
> + * quadratic time complexity in address_space_update_ioeventfds().
> + */
> +memory_region_transaction_begin();
> +
> +for (i = 0; i < nvhosts; i++) {
> +if (i < data_queue_pairs) {
> +peer = qemu_get_peer(ncs, i);
> +} else {
> +peer = qemu_get_peer(ncs, n->max_queue_pairs);
> +}
> +
> +net = get_vhost_net(peer);
> +hdev = &net->dev;
> +/*
> + * We will pass the notifiers to the kernel, make sure that QEMU
> + * doesn't interfere.
> + */
> +r = virtio_device_grab_ioeventfd(dev);
> +if (r < 0) {
> +error_report("binding does not support host notifiers");
> +memory_region_transaction_commit();
> +goto fail_nvhosts;
> +}
> +
> +for (j = 0; j < hdev->nvqs; j++) {
> +r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus),
> + hdev->vq_index + j,
> + true);
> +if (r < 0) {
> +error_report("vhost %d VQ %d notifier binding failed: %d",
> +  i, j, -r);
> +memory_region_transaction_commit();
> +vhost_dev_disable_notifiers_nvqs(hdev, dev, j);
> +goto fail_nvhosts;
> +}
> +}
> +}
> +
> +memory_region_transaction_commit();
> +
> +r

Re: [RFC PATCH 0/2] async: rework async event API for replay

2024-08-16 Thread Michael S. Tsirkin
On Fri, Aug 16, 2024 at 12:23:50PM +1000, Nicholas Piggin wrote:
> On Fri Aug 16, 2024 at 1:30 AM AEST, Michael S. Tsirkin wrote:
> > On Thu, Aug 15, 2024 at 11:28:35PM +1000, Nicholas Piggin wrote:
> > > Continuing the conversation from the thread about record/replay
> > > virtio fix. Here is a sketch of how we could improve the naming
> > > convention so users of bh don't have to know about record/replay.
> > > 
> > > Thanks,
> > > Nick
> >
> > The API looks ok to me.
> 
> Thanks for taking a look. In that case let's go with the fixes for
> now so we have rr regression tests in a bit better state, and I will
> resend for 9.2.
> 
> Thanks,
> Nick

No, this is big because you are structuring this wrong.
Work, then rework ... I do not really like this kind of thing.

Add sane APIs and use them keeping old ones around, reworking old code
can wait for 9.2.

-- 
MST




Re: [RFC PATCH 0/2] async: rework async event API for replay

2024-08-15 Thread Michael S. Tsirkin
On Thu, Aug 15, 2024 at 11:28:35PM +1000, Nicholas Piggin wrote:
> Continuing the conversation from the thread about record/replay
> virtio fix. Here is a sketch of how we could improve the naming
> convention so users of bh don't have to know about record/replay.
> 
> Thanks,
> Nick

The API looks ok to me.

> Nicholas Piggin (2):
>   async: rework async event API for replay
>   async: add debugging assertions for record/replay in bh APIs
> 
>  docs/devel/replay.rst  |  7 ++--
>  include/block/aio.h| 35 +++--
>  include/sysemu/replay.h|  2 +-
>  block.c|  4 +-
>  block/block-backend.c  | 24 +++-
>  block/io.c |  5 ++-
>  block/iscsi.c  |  5 ++-
>  block/nfs.c| 10 +++--
>  block/null.c   |  4 +-
>  block/nvme.c   |  8 ++--
>  hw/ide/core.c  |  7 ++--
>  hw/scsi/scsi-bus.c |  6 +--
>  monitor/monitor.c  |  2 +-
>  monitor/qmp.c  |  5 ++-
>  qapi/qmp-dispatch.c|  4 +-
>  replay/replay-events.c | 25 ++--
>  stubs/replay-tools.c   |  2 +-
>  util/aio-wait.c|  2 +-
>  util/async.c   | 63 --
>  util/main-loop.c   |  2 +-
>  util/thread-pool.c |  8 ++--
>  scripts/block-coroutine-wrapper.py |  2 +-
>  22 files changed, 164 insertions(+), 68 deletions(-)
> 
> -- 
> 2.45.2




Re: [PATCH v2 16/21] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state

2024-08-15 Thread Michael S. Tsirkin
On Thu, Aug 15, 2024 at 05:12:32PM +1000, Nicholas Piggin wrote:
> Could be a good idea. Although I'm not sure what to do with
> all types, maybe we can restrict what is supported.
> 
> > Is this wider re-factoring something that can wait for the next
> > developer cycle?
> 
> I would say so. It's not quite trivial to do nicely since
> things are a bit tangled between util/async and replay.
> 
> > >> I had started on a conversion once but not completed it.
> > >> I could resurrect if there is agreement on the API?
> >
> > I would certainly welcome it being cleaned up. The supported replay
> > devices are very piecemeal at the moment.
> 
> I'll tidy up and post an RFC for how the new API might look.
> 
> Thanks,
> Nick

Fundamentally it's virtio net, up to Jason. I don't like messy
APIs and people tend to get distracted and not fix them up
if one does not make this a blocker.

-- 
MST




Re: [PATCH v7 01/10] acpi/generic_event_device: add an APEI error device

2024-08-14 Thread Michael S. Tsirkin
On Wed, Aug 14, 2024 at 01:23:23AM +0200, Mauro Carvalho Chehab wrote:
> Adds a generic error device to handle generic hardware error
> events as specified at ACPI 6.5 specification at 18.3.2.7.2:
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> using HID PNP0C33.
> 
> The PNP0C33 device is used to report hardware errors to
> the guest via ACPI APEI Generic Hardware Error Source (GHES).
> 
> Co-authored-by: Mauro Carvalho Chehab 
> Co-authored-by: Jonathan Cameron 
> Signed-off-by: Jonathan Cameron 
> Signed-off-by: Mauro Carvalho Chehab 
> Reviewed-by: Igor Mammedov 
> ---
>  hw/acpi/aml-build.c| 10 ++
>  hw/acpi/generic_event_device.c |  8 
>  include/hw/acpi/acpi_dev_interface.h   |  1 +
>  include/hw/acpi/aml-build.h|  2 ++
>  include/hw/acpi/generic_event_device.h |  1 +
>  5 files changed, 22 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 6d4517cfbe3d..cb167523859f 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const 
> char *resource_source)
>  
>  return var;
>  }
> +
> +/* ACPI 5.0: 18.3.2.6.2 Event Notification For Generic Error Sources */



I could not find Event Notification For Generic Error Sources in ACPI
5.0.



Instead, I see
18.3.2.6.2 SCI Notification For Generic Error Sources

What did I miss?


> +Aml *aml_error_device(void)
> +{
> +Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
> +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));




> +aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +

comment on why is this here?

> +return dev;
> +}
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 15b4c3ebbf24..1673e9695be3 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
>  ACPI_GED_PWR_DOWN_EVT,
>  ACPI_GED_NVDIMM_HOTPLUG_EVT,
>  ACPI_GED_CPU_HOTPLUG_EVT,
> +ACPI_GED_ERROR_EVT
>  };
>  
>  /*
> @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, 
> HotplugHandler *hotplug_dev,
> aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
>aml_int(0x80)));
>  break;
> +case ACPI_GED_ERROR_EVT:
> +aml_append(if_ctx,
> +   aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
> +  aml_int(0x80)));
> +break;
>  case ACPI_GED_NVDIMM_HOTPLUG_EVT:
>  aml_append(if_ctx,
> aml_notify(aml_name("\\_SB.NVDR"),
> @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
> AcpiEventStatusBits ev)
>  sel = ACPI_GED_MEM_HOTPLUG_EVT;
>  } else if (ev & ACPI_POWER_DOWN_STATUS) {
>  sel = ACPI_GED_PWR_DOWN_EVT;
> +} else if (ev & ACPI_GENERIC_ERROR) {
> +sel = ACPI_GED_ERROR_EVT;
>  } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
>  sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
>  } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
> diff --git a/include/hw/acpi/acpi_dev_interface.h 
> b/include/hw/acpi/acpi_dev_interface.h
> index 68d9d15f50aa..8294f8f0ccca 100644
> --- a/include/hw/acpi/acpi_dev_interface.h
> +++ b/include/hw/acpi/acpi_dev_interface.h
> @@ -13,6 +13,7 @@ typedef enum {
>  ACPI_NVDIMM_HOTPLUG_STATUS = 16,
>  ACPI_VMGENID_CHANGE_STATUS = 32,
>  ACPI_POWER_DOWN_STATUS = 64,
> +ACPI_GENERIC_ERROR = 128,
>  } AcpiEventStatusBits;
>  
>  #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index a3784155cb33..44d1a6af0c69 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -252,6 +252,7 @@ struct CrsRangeSet {
>  /* Consumer/Producer */
>  #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY(1 << 1)
>  
> +#define ACPI_APEI_ERROR_DEVICE   "GEDD"
>  /**
>   * init_aml_allocator:
>   *
> @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, 
> AmlTransferSize sz,
>   uint8_t channel);
>  Aml *aml_sleep(uint64_t msec);
>  Aml *aml_i2c_serial_bus_device(uint16_t address, const char 
> *resource_source);
> +Aml *aml_error_device(void);
>  
>  /* Block AML object primitives */
>  Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
> diff --git a/include/hw/acpi/generic_event_device.h 
> b/include/hw/acpi/generic_event_device.h
> index 40af3550b56d..9ace8fe70328 100644
> --- a/include/hw/acpi/generic_event_device.h
> +++ b/include/hw/acpi/generic_event_device.h
> @@ -98,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
>  #define ACPI_GED_PWR_DOWN_EVT  0x2
>  #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
>  #define ACPI_GED_CPU_HO

Re: [PATCH v2 16/21] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state

2024-08-14 Thread Michael S. Tsirkin
On Wed, Aug 14, 2024 at 04:05:34PM +1000, Nicholas Piggin wrote:
> On Wed Aug 14, 2024 at 6:48 AM AEST, Michael S. Tsirkin wrote:
> > On Tue, Aug 13, 2024 at 09:23:24PM +0100, Alex Bennée wrote:
> > > From: Nicholas Piggin 
> > > 
> > > The regular qemu_bh_schedule() calls result in non-deterministic
> > > execution of the bh in record-replay mode, which causes replay failure.
> > > 
> > > Reviewed-by: Alex Bennée 
> > > Reviewed-by: Pavel Dovgalyuk 
> > > Signed-off-by: Nicholas Piggin 
> > > Message-Id: <20240813050638.446172-9-npig...@gmail.com>
> > > Signed-off-by: Alex Bennée 
> > > ---
> > >  hw/net/virtio-net.c | 11 ++-
> > >  1 file changed, 6 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index 08aa0b65e3..10ebaae5e2 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -40,6 +40,7 @@
> > >  #include "migration/misc.h"
> > >  #include "standard-headers/linux/ethtool.h"
> > >  #include "sysemu/sysemu.h"
> > > +#include "sysemu/replay.h"
> > >  #include "trace.h"
> > >  #include "monitor/qdev.h"
> > >  #include "monitor/monitor.h"
> > > @@ -417,7 +418,7 @@ static void virtio_net_set_status(struct VirtIODevice 
> > > *vdev, uint8_t status)
> > >  timer_mod(q->tx_timer,
> > > qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 
> > > n->tx_timeout);
> > >  } else {
> > > -qemu_bh_schedule(q->tx_bh);
> > > +replay_bh_schedule_event(q->tx_bh);
> > >  }
> > >  } else {
> > >  if (q->tx_timer) {
> > > @@ -2672,7 +2673,7 @@ static void virtio_net_tx_complete(NetClientState 
> > > *nc, ssize_t len)
> > >   */
> > >  virtio_queue_set_notification(q->tx_vq, 0);
> > >  if (q->tx_bh) {
> > > -qemu_bh_schedule(q->tx_bh);
> > > +replay_bh_schedule_event(q->tx_bh);
> > >  } else {
> > >  timer_mod(q->tx_timer,
> > >qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 
> > > n->tx_timeout);
> > > @@ -2838,7 +2839,7 @@ static void virtio_net_handle_tx_bh(VirtIODevice 
> > > *vdev, VirtQueue *vq)
> > >  return;
> > >  }
> > >  virtio_queue_set_notification(vq, 0);
> > > -qemu_bh_schedule(q->tx_bh);
> > > +replay_bh_schedule_event(q->tx_bh);
> > >  }
> > >  
> > >  static void virtio_net_tx_timer(void *opaque)
> > > @@ -2921,7 +2922,7 @@ static void virtio_net_tx_bh(void *opaque)
> > >  /* If we flush a full burst of packets, assume there are
> > >   * more coming and immediately reschedule */
> > >  if (ret >= n->tx_burst) {
> > > -qemu_bh_schedule(q->tx_bh);
> > > +replay_bh_schedule_event(q->tx_bh);
> > >  q->tx_waiting = 1;
> > >  return;
> > >  }
> > > @@ -2935,7 +2936,7 @@ static void virtio_net_tx_bh(void *opaque)
> > >  return;
> > >  } else if (ret > 0) {
> > >  virtio_queue_set_notification(q->tx_vq, 0);
> > > -qemu_bh_schedule(q->tx_bh);
> > > +replay_bh_schedule_event(q->tx_bh);
> > >  q->tx_waiting = 1;
> > >  }
> > >  }
> > > -- 
> > > 2.39.2
> >
> >
> > Is this really the only way to fix this? I do not think
> > virtio has any business knowing about replay.
> > What does this API do, even? BH but not broken with replay?
> > Do we ever want replay broken? Why not fix qemu_bh_schedule?
> > And when we add another feature which we do not want to break
> > will we do foo_bar_replay_bh_schedule_event or what?
> 
> I agree with you. We need to do this (a couple of other hw
> subsystems already do and likely some are still broken vs
> replay and would need to be converted), but I think it's
> mostly a case of bad naming. You're right the caller should
> not know about replay at all, what it should be is whether
> the event is for the target machine or the host harness,
> same as timers are VIRTUAL / HOST.
> So I think we just need to make a qemu_bh_schedule_,
> or qemu_bh_scheudle_event(... QEMU_EVENT_VIRTUAL/HOST/etc).

Or just pass QEMUClockType?

> I had started on a conversion once but not completed it.
> I could resurrect if there is agreement on the API?
> 
> Thanks,
> Nick




Re: [PATCH v3 2/2] intel_iommu: Make PASID-cache and PIOTLB type invalid in legacy mode

2024-08-14 Thread Michael S. Tsirkin
On Wed, Aug 14, 2024 at 03:05:33AM +, Duan, Zhenzhong wrote:
> 
> 
> >-Original Message-
> >From: Liu, Yi L 
> >Subject: Re: [PATCH v3 2/2] intel_iommu: Make PASID-cache and PIOTLB
> >type invalid in legacy mode
> >
> >On 2024/8/14 10:26, Zhenzhong Duan wrote:
> >> In vtd_process_inv_desc(), VTD_INV_DESC_PC and VTD_INV_DESC_PIOTLB
> >are
> >> bypassed without scalable mode check. These two types are not valid
> >> in legacy mode and we should report error.
> >>
> >> Fixes: 4a4f219e8a1 ("intel_iommu: add scalable-mode option to make
> >scalable mode work")
> >
> >4a4f219e8a10 would be better. :)
> 
> Ah, OK, Michael, let me know if you want me send a new version.
> 
> Thanks
> Zhenzhong


Yes pls, also pls Cc me on the cover letter.

> >
> >> Suggested-by: Yi Liu 
> >> Signed-off-by: Zhenzhong Duan 
> >> Reviewed-by: Clément Mathieu--Drif
> >> Reviewed-by: Yi Liu 
> >> ---
> >>   hw/i386/intel_iommu.c | 22 +++---
> >>   1 file changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >> index 68cb72a481..90cd4e5044 100644
> >> --- a/hw/i386/intel_iommu.c
> >> +++ b/hw/i386/intel_iommu.c
> >> @@ -2763,17 +2763,6 @@ static bool
> >vtd_process_inv_desc(IntelIOMMUState *s)
> >>   }
> >>   break;
> >>
> >> -/*
> >> - * TODO: the entity of below two cases will be implemented in future
> >series.
> >> - * To make guest (which integrates scalable mode support patch set in
> >> - * iommu driver) work, just return true is enough so far.
> >> - */
> >> -case VTD_INV_DESC_PC:
> >> -break;
> >> -
> >> -case VTD_INV_DESC_PIOTLB:
> >> -break;
> >> -
> >>   case VTD_INV_DESC_WAIT:
> >>   trace_vtd_inv_desc("wait", inv_desc.hi, inv_desc.lo);
> >>   if (!vtd_process_wait_desc(s, &inv_desc)) {
> >> @@ -2795,6 +2784,17 @@ static bool
> >vtd_process_inv_desc(IntelIOMMUState *s)
> >>   }
> >>   break;
> >>
> >> +/*
> >> + * TODO: the entity of below two cases will be implemented in future
> >series.
> >> + * To make guest (which integrates scalable mode support patch set in
> >> + * iommu driver) work, just return true is enough so far.
> >> + */
> >> +case VTD_INV_DESC_PC:
> >> +case VTD_INV_DESC_PIOTLB:
> >> +if (s->scalable_mode) {
> >> +break;
> >> +}
> >> +/* fallthrough */
> >>   default:
> >>   error_report_once("%s: invalid inv desc: hi=%"PRIx64", 
> >> lo=%"PRIx64
> >> " (unknown type)", __func__, inv_desc.hi,
> >
> >--
> >Regards,
> >Yi Liu




Re: [PATCH v2 17/21] virtio-net: Use virtual time for RSC timers

2024-08-13 Thread Michael S. Tsirkin
On Tue, Aug 13, 2024 at 09:23:25PM +0100, Alex Bennée wrote:
> From: Nicholas Piggin 
> 
> Receive coalescing is visible to the target machine, so its timers
> should use virtual time like other timers in virtio-net, to be
> compatible with record-replay.
> 
> Signed-off-by: Nicholas Piggin 
> Message-Id: <20240813050638.446172-10-npig...@gmail.com>
> Signed-off-by: Alex Bennée 

Acked-by: Michael S. Tsirkin 

> ---
>  hw/net/virtio-net.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 10ebaae5e2..ed33a32877 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -2124,7 +2124,7 @@ static void virtio_net_rsc_purge(void *opq)
>  chain->stat.timer++;
>  if (!QTAILQ_EMPTY(&chain->buffers)) {
>  timer_mod(chain->drain_timer,
> -  qemu_clock_get_ns(QEMU_CLOCK_HOST) + chain->n->rsc_timeout);
> +  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
>  }
>  }
>  
> @@ -2360,7 +2360,7 @@ static size_t 
> virtio_net_rsc_do_coalesce(VirtioNetRscChain *chain,
>  chain->stat.empty_cache++;
>  virtio_net_rsc_cache_buf(chain, nc, buf, size);
>  timer_mod(chain->drain_timer,
> -  qemu_clock_get_ns(QEMU_CLOCK_HOST) + chain->n->rsc_timeout);
> +  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + chain->n->rsc_timeout);
>  return size;
>  }
>  
> @@ -2598,7 +2598,7 @@ static VirtioNetRscChain 
> *virtio_net_rsc_lookup_chain(VirtIONet *n,
>  chain->max_payload = VIRTIO_NET_MAX_IP6_PAYLOAD;
>  chain->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
>  }
> -chain->drain_timer = timer_new_ns(QEMU_CLOCK_HOST,
> +chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>virtio_net_rsc_purge, chain);
>  memset(&chain->stat, 0, sizeof(chain->stat));
>  
> -- 
> 2.39.2




Re: [PATCH v2 16/21] virtio-net: Use replay_schedule_bh_event for bhs that affect machine state

2024-08-13 Thread Michael S. Tsirkin
On Tue, Aug 13, 2024 at 09:23:24PM +0100, Alex Bennée wrote:
> From: Nicholas Piggin 
> 
> The regular qemu_bh_schedule() calls result in non-deterministic
> execution of the bh in record-replay mode, which causes replay failure.
> 
> Reviewed-by: Alex Bennée 
> Reviewed-by: Pavel Dovgalyuk 
> Signed-off-by: Nicholas Piggin 
> Message-Id: <20240813050638.446172-9-npig...@gmail.com>
> Signed-off-by: Alex Bennée 
> ---
>  hw/net/virtio-net.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 08aa0b65e3..10ebaae5e2 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -40,6 +40,7 @@
>  #include "migration/misc.h"
>  #include "standard-headers/linux/ethtool.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
>  #include "trace.h"
>  #include "monitor/qdev.h"
>  #include "monitor/monitor.h"
> @@ -417,7 +418,7 @@ static void virtio_net_set_status(struct VirtIODevice 
> *vdev, uint8_t status)
>  timer_mod(q->tx_timer,
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 
> n->tx_timeout);
>  } else {
> -qemu_bh_schedule(q->tx_bh);
> +replay_bh_schedule_event(q->tx_bh);
>  }
>  } else {
>  if (q->tx_timer) {
> @@ -2672,7 +2673,7 @@ static void virtio_net_tx_complete(NetClientState *nc, 
> ssize_t len)
>   */
>  virtio_queue_set_notification(q->tx_vq, 0);
>  if (q->tx_bh) {
> -qemu_bh_schedule(q->tx_bh);
> +replay_bh_schedule_event(q->tx_bh);
>  } else {
>  timer_mod(q->tx_timer,
>qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
> @@ -2838,7 +2839,7 @@ static void virtio_net_handle_tx_bh(VirtIODevice *vdev, 
> VirtQueue *vq)
>  return;
>  }
>  virtio_queue_set_notification(vq, 0);
> -qemu_bh_schedule(q->tx_bh);
> +replay_bh_schedule_event(q->tx_bh);
>  }
>  
>  static void virtio_net_tx_timer(void *opaque)
> @@ -2921,7 +2922,7 @@ static void virtio_net_tx_bh(void *opaque)
>  /* If we flush a full burst of packets, assume there are
>   * more coming and immediately reschedule */
>  if (ret >= n->tx_burst) {
> -qemu_bh_schedule(q->tx_bh);
> +replay_bh_schedule_event(q->tx_bh);
>  q->tx_waiting = 1;
>  return;
>  }
> @@ -2935,7 +2936,7 @@ static void virtio_net_tx_bh(void *opaque)
>  return;
>  } else if (ret > 0) {
>  virtio_queue_set_notification(q->tx_vq, 0);
> -qemu_bh_schedule(q->tx_bh);
> +replay_bh_schedule_event(q->tx_bh);
>  q->tx_waiting = 1;
>  }
>  }
> -- 
> 2.39.2


Is this really the only way to fix this? I do not think
virtio has any business knowing about replay.
What does this API do, even? BH but not broken with replay?
Do we ever want replay broken? Why not fix qemu_bh_schedule?
And when we add another feature which we do not want to break
will we do foo_bar_replay_bh_schedule_event or what?

-- 
MST




Re: [PATCH v4 00/17] Introduce support for IGVM files

2024-08-13 Thread Michael S. Tsirkin
On Tue, Aug 13, 2024 at 10:53:58AM +0100, Roy Hopkins wrote:
> On Sat, 2024-07-20 at 14:26 -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 03, 2024 at 12:05:38PM +0100, Roy Hopkins wrote:
> > > Here is v4 of the set of patches to add support for IGVM files to QEMU. 
> > > This
> > > is
> > > based on commit 1a2d52c7fc of qemu.
> > > 
> > > This version addresses all of the review comments from v3 along with a
> > > couple of
> > > small bug fixes. This is a much smaller increment than in the previous
> > > version
> > > of the series [1]. Thanks once again to the reviewers that have been 
> > > looking
> > > at
> > > this series. This v4 patch series is also available on github: [2]
> > > 
> > > The previous version had a build issue when building without debug 
> > > enabled.
> > > Patch 8/17 has been added to fix this and I've updated my own process to
> > > test
> > > both debug and release builds of QEMU.
> > > 
> > > For testing IGVM support in QEMU you need to generate an IGVM file that is
> > > configured for the platform you want to launch. You can use the 
> > > `buildigvm`
> > > test tool [3] to allow generation of IGVM files for all currently 
> > > supported
> > > platforms. Patch 11/17 contains information on how to generate an IGVM 
> > > file
> > > using this tool.
> > 
> > PC things:
> > 
> > Acked-by: Michael S. Tsirkin 
> > 
> > 
> 
> Hi Michael,
> 
> Thanks for this. Can I add your ack to all commits, or just the PC specific
> ones?
> 
> Regards,
> Roy


I reviewed the pc things and skimmed the rest. So reviewed-by
for pc things and Ack for the rest.

> > > Changes in v4:
> > > 
> > > * Remove unused '#ifdef CONFIG_IGVM' sections
> > > * Add "'if': 'CONFIG_IGVM'" for IgvmCfgProperties in qom.json
> > > * Use error_fatal instead of error_abort in suggested locations
> > > * Prevent addition of bios code when an IGVM file is provided and
> > > pci_enabled is false
> > > * Add patch 6/17 to fix error handling from sev_encrypt_flash()
> > > * Revert unrequired changes to return values in sev/*_launch_update()
> > > functions
> > > * Add documentation to igvm.rst to describe how to use 'buildigvm'
> > > * Various convention and code style changes as suggested in reviews
> > > * Fix handling of sev_features for kernels that do not support 
> > > KVM_SEV_INIT2
> > > * Move igvm-cfg from MachineState to X86MachineState
> > > 
> > > Patch summary:
> > > 
> > > 1-12: Add support and documentation for processing IGVM files for SEV, 
> > > SEV-
> > > ES,
> > > SEV-SNP and native platforms. 
> > > 
> > > 13-16: Processing of policy and SEV-SNP ID_BLOCK from IGVM file. 
> > > 
> > > 17: Add pre-processing of IGVM file to support synchronization of
> > > 'SEV_FEATURES'
> > > from IGVM VMSA to KVM.
> > > 
> > > [1] Link to v3:
> > > https://lore.kernel.org/qemu-devel/cover.1718979106.git.roy.hopk...@suse.com/
> > > 
> > > [2] v4 patches also available here:
> > > https://github.com/roy-hopkins/qemu/tree/igvm_master_v4
> > > 
> > > [3] `buildigvm` tool v0.2.0
> > > https://github.com/roy-hopkins/buildigvm/releases/tag/v0.2.0
> > > 
> > > Roy Hopkins (17):
> > >   meson: Add optional dependency on IGVM library
> > >   backends/confidential-guest-support: Add functions to support IGVM
> > >   backends/igvm: Add IGVM loader and configuration
> > >   hw/i386: Add igvm-cfg object and processing for IGVM files
> > >   i386/pc_sysfw: Ensure sysfw flash configuration does not conflict with
> > >     IGVM
> > >   sev: Fix error handling in sev_encrypt_flash()
> > >   sev: Update launch_update_data functions to use Error handling
> > >   target/i386: Allow setting of R_LDTR and R_TR with
> > >     cpu_x86_load_seg_cache()
> > >   i386/sev: Refactor setting of reset vector and initial CPU state
> > >   i386/sev: Implement ConfidentialGuestSupport functions for SEV
> > >   docs/system: Add documentation on support for IGVM
> > >   docs/interop/firmware.json: Add igvm to FirmwareDevice
> > >   backends/confidential-guest-support: Add set_guest_policy() function
> > >   backends/igvm: Process initialization sections in IGVM file
> > >

Re: [PATCH v2] Update event idx if guest has made extra buffers during double check

2024-08-11 Thread Michael S. Tsirkin
On Mon, Jun 17, 2024 at 01:45:51PM +0800, thomas wrote:
> If guest has made some buffers available during double check,
> but the total buffer size available is lower than @bufsize,
> notify the guest with the latest available idx(event idx)
> seen by the host.
> 
> Fixes: 06b12970174 ("virtio-net: fix network stall under load")
> Signed-off-by: wencheng Yang 
> ---
>  hw/net/virtio-net.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..23c6c8c898 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1654,6 +1654,7 @@ static int virtio_net_has_buffers(VirtIONetQueue *q, 
> int bufsize)
>  if (virtio_queue_empty(q->rx_vq) ||
>  (n->mergeable_rx_bufs &&
>   !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> +virtio_queue_set_notification(q->rx_vq, 1);


This raises a lot of questions, but first of all virtio_queue_set_notification
does not notify guest, it enables guest notifications.

>  return 0;
>  }
>  }
> -- 
> 2.39.0




Re: [PATCH v6] virtio-pci: Fix the use of an uninitialized irqfd

2024-08-11 Thread Michael S. Tsirkin
On Sun, Aug 11, 2024 at 11:06:09AM +0300, Michael Tokarev wrote:
> 07.08.2024 07:02, Jason Wang wrote:
> 
> > Acked-by: Jason Wang 
> 
> Jason, would you mind picking this up together with -net help bugfix
> and sending a pull request?
> 
> This particular change has been (re)tried multiple times by Cindy Lu
> already, and the bug is still not fixed and affects users.  Both
> this and -net help fix are a must for 9.1 and for stable series.
> 
> Thanks,
> 
> /mjt
> 
> -- 
> GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24.
> New key: rsa4096/61AD3D98ECDF2C8E  9D8B E14E 3F2A 9DD7 9199  28F1 61AD 3D98 
> ECDF 2C8E
> Old key: rsa2048/457CE0A0804465C5  6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 
> 8044 65C5
> Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt

I have it tagged, just didn't do a pull with it yet. Should make 9.1
with no issues.




Re: [PATCH v2 4/4] virtio-net: Add support for USO features

2024-08-11 Thread Michael S. Tsirkin
On Thu, Aug 08, 2024 at 11:25:29AM -0400, Peter Xu wrote:
> On Thu, Aug 08, 2024 at 10:47:28AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 08, 2024 at 10:15:36AM -0400, Peter Xu wrote:
> > > On Thu, Aug 08, 2024 at 07:12:14AM -0400, Michael S. Tsirkin wrote:
> > > > This is too big of a hammer. People already use what you call "cross
> > > > migrate" and have for years. We are not going to stop developing
> > > > features just because someone suddenly became aware of some such bit.
> > > > If you care, you will have to work to solve the problem properly -
> > > > nacking half baked hacks is the only tool maintainers have to make
> > > > people work on hard problems.
> > > 
> > > IMHO this is totally different thing.  It's not about proposing a new
> > > feature yet so far, it's about how we should fix a breakage first.
> > > 
> > > And that's why I think we should fix it even in the simple way first, then
> > > we consider anything more benefitial from perf side without breaking
> > > anything, which should be on top of that.
> > > 
> > > Thanks,
> > 
> > As I said, once the quick hack is merged people stop caring.
> 
> IMHO it's not a hack. It's a proper fix to me to disable it by default for
> now.
> 
> OTOH, having it ON always even knowing it can break migration is a hack to
> me, when we don't have anything else to guard the migration.

It's a hack in the sense that it's specific to this option.
But hack or not, it's the only way I have to make people work on
a full solution.

> > Mixing different kernel versions in migration is esoteric enough for
> > this not to matter to most people. There's no rush I think, address
> > it properly.
> 
> Exactly mixing kernel versions will be tricky to users to identify, but
> that's, AFAICT, exactly happening everywhere.  We can't urge user to always
> use the exact same kernels when we're talking about a VM cluster.  That's
> why I think allowing migration to work across those kernels matter.
> 
> I will agree there's no rush iff RHEL9 kernel won't backport TAP at all,
> otherwise this will trigger between y-stream after people upgrades partial
> of the clusters.
> 
> Thanks,
> 
> -- 
> Peter Xu




Re: [PATCH v2 4/4] virtio-net: Add support for USO features

2024-08-08 Thread Michael S. Tsirkin
On Thu, Aug 08, 2024 at 10:15:36AM -0400, Peter Xu wrote:
> On Thu, Aug 08, 2024 at 07:12:14AM -0400, Michael S. Tsirkin wrote:
> > This is too big of a hammer. People already use what you call "cross
> > migrate" and have for years. We are not going to stop developing
> > features just because someone suddenly became aware of some such bit.
> > If you care, you will have to work to solve the problem properly -
> > nacking half baked hacks is the only tool maintainers have to make
> > people work on hard problems.
> 
> IMHO this is totally different thing.  It's not about proposing a new
> feature yet so far, it's about how we should fix a breakage first.
> 
> And that's why I think we should fix it even in the simple way first, then
> we consider anything more benefitial from perf side without breaking
> anything, which should be on top of that.
> 
> Thanks,

As I said, once the quick hack is merged people stop caring.
Mixing different kernel versions in migration is esoteric enough for
this not to matter to most people. There's no rush I think, address
it properly.

-- 
MST




Re: [PATCH v2 4/4] virtio-net: Add support for USO features

2024-08-08 Thread Michael S. Tsirkin
On Thu, Aug 08, 2024 at 09:55:49AM -0400, Peter Xu wrote:
> On Thu, Aug 08, 2024 at 08:43:22PM +0900, Akihiko Odaki wrote:
> > On 2024/08/07 5:41, Peter Xu wrote:
> > > On Mon, Aug 05, 2024 at 04:27:43PM +0900, Akihiko Odaki wrote:
> > > > On 2024/08/04 22:08, Peter Xu wrote:
> > > > > On Sun, Aug 04, 2024 at 03:49:45PM +0900, Akihiko Odaki wrote:
> > > > > > On 2024/08/03 1:26, Peter Xu wrote:
> > > > > > > On Sat, Aug 03, 2024 at 12:54:51AM +0900, Akihiko Odaki wrote:
> > > > > > > > > > > I'm not sure if I read it right.  Perhaps you meant 
> > > > > > > > > > > something more generic
> > > > > > > > > > > than -platform but similar?
> > > > > > > > > > > 
> > > > > > > > > > > For example, "-profile [PROFILE]" qemu cmdline, where 
> > > > > > > > > > > PROFILE can be either
> > > > > > > > > > > "perf" or "compat", while by default to "compat"?
> > > > > > > > > > 
> > > > > > > > > > "perf" would cover 4) and "compat" will cover 1). However 
> > > > > > > > > > neither of them
> > > > > > > > > > will cover 2) because an enum is not enough to know about 
> > > > > > > > > > all hosts. I
> > > > > > > > > > presented a design that will cover 2) in:
> > > > > > > > > > https://lore.kernel.org/r/2da4ebcd-2058-49c3-a4ec-8e60536e5...@daynix.com
> > > > > > > > > 
> > > > > > > > > "-merge-platform" shouldn't be a QEMU parameter, but should 
> > > > > > > > > be something
> > > > > > > > > separate.
> > > > > > > > 
> > > > > > > > Do you mean merging platform dumps should be done with another 
> > > > > > > > command? I
> > > > > > > > think we will want to know the QOM tree is in use when 
> > > > > > > > implementing
> > > > > > > > -merge-platform. For example, you cannot define a "platform" 
> > > > > > > > when e.g., you
> > > > > > > > don't know what netdev backend (e.g., user, vhost-net, 
> > > > > > > > vhost-vdpa) is
> > > > > > > > connected to virtio-net devices. Of course we can include those 
> > > > > > > > information
> > > > > > > > in dumps, but we don't do so for VMState.
> > > > > > > 
> > > > > > > What I was thinking is the generated platform dump shouldn't care 
> > > > > > > about
> > > > > > > what is used as backend: it should try to probe whatever is 
> > > > > > > specified in
> > > > > > > the qemu cmdline, and it's the user's job to make sure the exact 
> > > > > > > same qemu
> > > > > > > cmdline is used in other hosts to dump this information.
> > > > > > > 
> > > > > > > IOW, the dump will only contain the information that was based on 
> > > > > > > the qemu
> > > > > > > cmdline.  E.g., if it doesn't include virtio device at all, and 
> > > > > > > if we only
> > > > > > > support such dump for virtio, it should dump nothing.
> > > > > > > 
> > > > > > > Then the -merge-platform will expect all dumps to look the same 
> > > > > > > too,
> > > > > > > merging them with AND on each field.
> > > > > > 
> > > > > > I think we will still need the QOM tree in that case. I think the 
> > > > > > platform
> > > > > > information will look somewhat similar to VMState, which requires 
> > > > > > the QOM
> > > > > > tree to interpret.
> > > > > 
> > > > > Ah yes, I assume you meant when multiple devices can report different 
> > > > > thing
> > > > > even if with the same frontend / device type.  QOM should work, or 
> > > > > anything
> > > > > that can identify a device, e.g. with id / instance_id attached along 
> > > > > with
> > > > > the device class.
> > > > > 
> > > > > One thing that I still don't know how it works is how it interacts 
> > > > > with new
> > > > > hosts being added.
> > > > > 
> > > > > This idea is based on the fact that the cluster is known before 
> > > > > starting
> > > > > any VM.  However in reality I think it can happen when VMs started 
> > > > > with a
> > > > > small cluster but then cluster extended, when the -merge-platform has 
> > > > > been
> > > > > done on the smaller set.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Said that, I actually am still not clear on how / whether it 
> > > > > > > should work at
> > > > > > > last.  At least my previous concern (1) didn't has a good answer 
> > > > > > > yet, on
> > > > > > > what we do when profile collisions with qemu cmdlines.  So far I 
> > > > > > > actually
> > > > > > > still think it more straightforward that in migration we 
> > > > > > > handshake on these
> > > > > > > capabilities if possible.
> > > > > > > 
> > > > > > > And that's why I was thinking (where I totally agree with you on 
> > > > > > > this) that
> > > > > > > whether we should settle a short term plan first to be on the 
> > > > > > > safe side
> > > > > > > that we start with migration always being compatible, then we 
> > > > > > > figure the
> > > > > > > other approach.  That seems easier to me, and it's also a matter 
> > > > > > > of whether
> > > > > > > we want to do something for 9.1, or leaving that for 9.2 for USO*.
> > > > > > 
> > > > > > I suggest disabling all offload features of virtio-net wi

Re: [PATCH] arm/virt: place power button pin number on a define

2024-08-08 Thread Michael S. Tsirkin
On Thu, Aug 08, 2024 at 02:54:52PM +0200, Mauro Carvalho Chehab wrote:
> Having magic numbers inside the code is not a good idea, as it
> is error-prone. So, instead, create a macro with the number
> definition.
> 
> Link: 
> https://lore.kernel.org/qemu-devel/CAFEAcA-PYnZ-32MRX+PgvzhnoAV80zBKMYg61j2f=ohagfw...@mail.gmail.com/
> 
> Signed-off-by: Mauro Carvalho Chehab 
> Suggested-by: Peter Maydell 
> Reviewed-by: Jonathan Cameron 
> Reviewed-by: Igor Mammedov 

ack, but note we do things like that only if something
is repeated.

> ---
>  hw/arm/virt-acpi-build.c | 6 +++---
>  hw/arm/virt.c| 7 ---
>  include/hw/arm/virt.h| 3 +++
>  3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index e10cad86dd73..f76fb117adff 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -154,10 +154,10 @@ static void acpi_dsdt_add_gpio(Aml *scope, const 
> MemMapEntry *gpio_memmap,
>  aml_append(dev, aml_name_decl("_CRS", crs));
>  
>  Aml *aei = aml_resource_template();
> -/* Pin 3 for power button */
> -const uint32_t pin_list[1] = {3};
> +
> +const uint32_t pin = GPIO_PIN_POWER_BUTTON;
>  aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
> - AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
> + AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
>   "GPO0", NULL, 0));
>  aml_append(dev, aml_name_decl("_AEI", aei));
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 719e83e6a1e7..687fe0bb8bc9 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void 
> *opaque)
>  if (s->acpi_dev) {
>  acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
>  } else {
> -/* use gpio Pin 3 for power button event */
> +/* use gpio Pin for power button event */
>  qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
>  }
>  }
> @@ -1013,7 +1013,8 @@ static void create_gpio_keys(char *fdt, DeviceState 
> *pl061_dev,
>   uint32_t phandle)
>  {
>  gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> -qdev_get_gpio_in(pl061_dev, 3));
> +qdev_get_gpio_in(pl061_dev,
> + 
> GPIO_PIN_POWER_BUTTON));
>  
>  qemu_fdt_add_subnode(fdt, "/gpio-keys");
>  qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
> @@ -1024,7 +1025,7 @@ static void create_gpio_keys(char *fdt, DeviceState 
> *pl061_dev,
>  qemu_fdt_setprop_cell(fdt, "/gpio-keys/poweroff", "linux,code",
>KEY_POWER);
>  qemu_fdt_setprop_cells(fdt, "/gpio-keys/poweroff",
> -   "gpios", phandle, 3, 0);
> +   "gpios", phandle, GPIO_PIN_POWER_BUTTON, 0);
>  }
>  
>  #define SECURE_GPIO_POWEROFF 0
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index ab961bb6a9b8..a4d937ed45ac 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -47,6 +47,9 @@
>  /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
>  #define PVTIME_SIZE_PER_CPU 64
>  
> +/* GPIO pins */
> +#define GPIO_PIN_POWER_BUTTON  3
> +
>  enum {
>  VIRT_FLASH,
>  VIRT_MEM,
> -- 
> 2.45.2




Re: [PATCH v2 4/4] virtio-net: Add support for USO features

2024-08-08 Thread Michael S. Tsirkin
On Thu, Aug 08, 2024 at 08:03:25PM +0900, Akihiko Odaki wrote:
> On 2024/08/08 19:54, Michael S. Tsirkin wrote:
> > On Thu, Aug 08, 2024 at 07:52:37PM +0900, Akihiko Odaki wrote:
> > > On 2024/08/06 22:29, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 06, 2024 at 04:35:44PM +0900, Akihiko Odaki wrote:
> > > > > On 2024/08/05 19:08, Michael S. Tsirkin wrote:
> > > > > > On Mon, Aug 05, 2024 at 06:37:58PM +0900, Akihiko Odaki wrote:
> > > > > > > If cross-migrate=off, QEMU can still migrate on the same host 
> > > > > > > (checkpoint
> > > > > > > and restart). QEMU can also migrate across hosts if the user 
> > > > > > > ensures they
> > > > > > > are on the same platform.
> > > > > > 
> > > > > > What is so special about checkpoint/restart? I guess we hope that
> > > > > > downgrades are uncommon, but they are possible...
> > > > > 
> > > > > Downgrades will not work with cross-migrate=off. Users who want 
> > > > > downgrades
> > > > > should use cross-migrate=on.
> > > > 
> > > > We also don't know that upgrades do not disable a feature:
> > > > can happen if e.g. there's a serious bug in the feature.
> > > > Basically, this makes the feature too fragile, in my opinion.
> > > 
> > > We can do nothing in such a case. Whether it is on a single host or 
> > > multiple
> > > hosts, we cannot support migration if features once enabled disappear.
> > > 
> > > Regards,
> > > Akihiko Odaki
> > 
> > It does not follow that we have to do something, and this is something,
> > therefore that we have to do this.
> > 
> > This is just a reason not to handle checkpoint/restart any different
> > than any other migration.
> 
> Whethere it is checkpoint/restart or any other migration, I expect platform
> features won't disappear from the host(s); we can't readily support
> migration in such a situation.


We can if we mask the features from the guest before starting VM.

Or if we didn't, we can fail gracefully.

> When platform features won't disappear, for checkpoint/restart, we can
> enable all available features without disrupting migration;
> cross-migrate=off will instruct that.
> 
> However, if we are migrating a VM across hosts and the user doesn't ensure
> they are on the same platform, we cannot enable platform features even if we
> are sure that platform features already present on a host won't disappear
> because some hosts may not have features in the first place. We can set
> cross-migrate=on in such a case to disable optional platform features.
> 
> Regards,
> Akihiko Odaki


This is too big of a hammer. People already use what you call "cross
migrate" and have for years. We are not going to stop developing
features just because someone suddenly became aware of some such bit.
If you care, you will have to work to solve the problem properly -
nacking half baked hacks is the only tool maintainers have to make
people work on hard problems.


-- 
MST




Re: [PATCH v2 4/4] virtio-net: Add support for USO features

2024-08-08 Thread Michael S. Tsirkin
On Thu, Aug 08, 2024 at 07:52:37PM +0900, Akihiko Odaki wrote:
> On 2024/08/06 22:29, Michael S. Tsirkin wrote:
> > On Tue, Aug 06, 2024 at 04:35:44PM +0900, Akihiko Odaki wrote:
> > > On 2024/08/05 19:08, Michael S. Tsirkin wrote:
> > > > On Mon, Aug 05, 2024 at 06:37:58PM +0900, Akihiko Odaki wrote:
> > > > > If cross-migrate=off, QEMU can still migrate on the same host 
> > > > > (checkpoint
> > > > > and restart). QEMU can also migrate across hosts if the user ensures 
> > > > > they
> > > > > are on the same platform.
> > > > 
> > > > What is so special about checkpoint/restart? I guess we hope that
> > > > downgrades are uncommon, but they are possible...
> > > 
> > > Downgrades will not work with cross-migrate=off. Users who want downgrades
> > > should use cross-migrate=on.
> > 
> > We also don't know that upgrades do not disable a feature:
> > can happen if e.g. there's a serious bug in the feature.
> > Basically, this makes the feature too fragile, in my opinion.
> 
> We can do nothing in such a case. Whether it is on a single host or multiple
> hosts, we cannot support migration if features once enabled disappear.
> 
> Regards,
> Akihiko Odaki

It does not follow that we have to do something, and this is something,
therefore that we have to do this.

This is just a reason not to handle checkpoint/restart any different
than any other migration.

-- 
MST




Re: [PATCH 1/3] virtio_net: Add the check for vdpa's mac address

2024-08-06 Thread Michael S. Tsirkin
On Tue, Aug 06, 2024 at 08:58:01AM +0800, Cindy Lu wrote:
> When using a VDPA device, it is important to ensure that
> the MAC address in the hardware matches the MAC address
> from the QEMU command line.
> This will allow the device to boot.
> 
> Signed-off-by: Cindy Lu 

Always post threads with a cover letter please.

> ---
>  hw/net/virtio-net.c | 33 +
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..7f51bd0dd3 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -3579,12 +3579,36 @@ static bool 
> failover_hide_primary_device(DeviceListener *listener,
>  /* failover_primary_hidden is set during feature negotiation */
>  return qatomic_read(&n->failover_primary_hidden);
>  }
> +static bool virtio_net_check_vdpa_mac(NetClientState *nc, VirtIONet *n, 
> MACAddr *cmdline_mac,
> +   Error **errp)
> +{
> + struct virtio_net_config hwcfg = {};
> + static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
> +
> + vhost_net_get_config(get_vhost_net(nc->peer), (uint8_t *)&hwcfg, 
> ETH_ALEN);
> +
> +/*
> + * For VDPA device: Only two situations are acceptable:
> + * 1.The hardware MAC address is the same as the QEMU command line MAC
> + *   address, and both of them are not 0.
> + */
> +
> + if (memcmp(&hwcfg.mac, &zero, sizeof(MACAddr)) != 0) {
> + if ((memcmp(&hwcfg.mac, cmdline_mac, sizeof(MACAddr)) == 0)) {
> + return true;
> + }
> + }
> + error_setg(errp, "vDPA device's mac != the mac address from qemu 
> cmdline"
> +  "Please check the the vdpa device's setting.");
>  
> + return false;
> +}
>  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>  {
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VirtIONet *n = VIRTIO_NET(dev);
>  NetClientState *nc;
> +MACAddr macaddr_cmdline;
>  int i;
>  
>  if (n->net_conf.mtu) {
> @@ -3692,6 +3716,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
> Error **errp)
>  virtio_net_add_queue(n, 0);
>  
>  n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
> +memcpy(&macaddr_cmdline, &n->nic_conf.macaddr, sizeof(n->mac));
>  qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>  memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>  n->status = VIRTIO_NET_S_LINK_UP;
> @@ -3739,10 +3764,10 @@ static void virtio_net_device_realize(DeviceState 
> *dev, Error **errp)
>  nc->rxfilter_notify_enabled = 1;
>  
> if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> -struct virtio_net_config netcfg = {};
> -memcpy(&netcfg.mac, &n->nic_conf.macaddr, ETH_ALEN);
> -vhost_net_set_config(get_vhost_net(nc->peer),
> -(uint8_t *)&netcfg, 0, ETH_ALEN, VHOST_SET_CONFIG_TYPE_FRONTEND);
> +if (!virtio_net_check_vdpa_mac(nc, n, &macaddr_cmdline, errp)) {
> +virtio_cleanup(vdev);
> +return;
> +}
>  }
>  QTAILQ_INIT(&n->rsc_chains);
>  n->qdev = dev;
> -- 
> 2.45.0




Re: [PATCH v2 4/4] virtio-net: Add support for USO features

2024-08-06 Thread Michael S. Tsirkin
On Tue, Aug 06, 2024 at 04:35:44PM +0900, Akihiko Odaki wrote:
> On 2024/08/05 19:08, Michael S. Tsirkin wrote:
> > On Mon, Aug 05, 2024 at 06:37:58PM +0900, Akihiko Odaki wrote:
> > > If cross-migrate=off, QEMU can still migrate on the same host (checkpoint
> > > and restart). QEMU can also migrate across hosts if the user ensures they
> > > are on the same platform.
> > 
> > What is so special about checkpoint/restart? I guess we hope that
> > downgrades are uncommon, but they are possible...
> 
> Downgrades will not work with cross-migrate=off. Users who want downgrades
> should use cross-migrate=on.

We also don't know that upgrades do not disable a feature:
can happen if e.g. there's a serious bug in the feature.
Basically, this makes the feature too fragile, in my opinion.

-- 
MST




  1   2   3   4   5   6   7   8   9   10   >