Re: [PATCH] MAINTAINERS: Update my email address
On 29/04/2024 16:49, Anthony PERARD wrote: From: Anthony PERARD Signed-off-by: Anthony PERARD --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Acked-by: Paul Durrant
Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs
On 04/04/2024 15:08, Ross Lagerwall wrote: A malicious or buggy guest may generated buffered ioreqs faster than QEMU can process them in handle_buffered_iopage(). The result is a livelock - QEMU continuously processes ioreqs on the main thread without iterating through the main loop which prevents handling other events, processing timers, etc. Without QEMU handling other events, it often results in the guest becoming unsable and makes it difficult to stop the source of buffered ioreqs. To avoid this, if we process a full page of buffered ioreqs, stop and reschedule an immediate timer to continue processing them. This lets QEMU go back to the main loop and catch up. Do PV backends potentially cause the same scheduling issue (if not using io threads)? Signed-off-by: Ross Lagerwall --- hw/xen/xen-hvm-common.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 13/28] hw/xen: automatically assign device index to block devices
On 27/10/2023 11:25, David Woodhouse wrote: On Fri, 2023-10-27 at 10:01 +0100, Durrant, Paul wrote: This code is allocating a name automatically so I think the onus is on it not create a needless clash which is likely to have unpredictable results depending on what the guest is. Just avoid any aliasing in the first place and things will be fine. Yeah, fair enough. In which case I'll probably switch to using xs_directory() and then processing those results to find a free slot, instead of going out to XenStore for every existence check. This isn't exactly fast path and I'm prepared to tolerate a little bit of O(n²), but only within reason :) Yes, doing an xs_directory() and then using the code xen_block_get_vdev() to populate a list of existent disks will be neater.
Re: [PATCH v3 27/28] hw/xen: use qemu_create_nic_bus_devices() to instantiate Xen NICs
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse When instantiating XenBus itself, for each NIC which is configured with either the model unspecified, or set to to "xen" or "xen-net-device", create a corresponding xen-net-device for it. Now we can launch emulated Xen guests with '-nic user', and this fixes the setup for Xen PV guests, which was previously broken in various ways and never actually managed to peer with the netdev. Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c| 4 hw/xen/xen_devconfig.c | 25 - hw/xenpv/xen_machine_pv.c | 9 - include/hw/xen/xen-legacy-backend.h | 1 - 4 files changed, 4 insertions(+), 35 deletions(-) Yay! I've been wanting this for years but ETIME. Reviewed-by: Paul Durrant
Re: [PATCH v3 26/28] hw/i386/pc: use qemu_get_nic_info() and pci_init_nic_devices()
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse Eliminate direct access to nd_table[] and nb_nics by processing the the ISA NICs first and then calling pci_init_nic_devices() for the test. It's important to do this *before* the subsequent patch which registers the Xen PV network devices, because the code being remove here didn't check whether nd->instantiated was already set before using each entry. Signed-off-by: David Woodhouse --- hw/i386/pc.c| 21 +++-- include/hw/net/ne2000-isa.h | 2 -- 2 files changed, 11 insertions(+), 12 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 25/28] hw/pci: add pci_init_nic_devices(), pci_init_nic_in_slot()
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse The loop over nd_table[] to add PCI NICs is repeated in quite a few places. Add a helper function to do it. Some platforms also try to instantiate a specific model in a specific slot, to match the real hardware. Add pci_init_nic_in_slot() for that purpose. Signed-off-by: David Woodhouse --- hw/pci/pci.c | 45 include/hw/pci/pci.h | 4 +++- 2 files changed, 48 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 24/28] net: add qemu_create_nic_bus_devices()
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse This will instantiate any NICs which live on a given bus type. Each bus is allowed *one* substitution (for PCI it's virtio → virtio-net-pci, for Xen it's xen → xen-net-device; no point in overengineering it unless we actually want more). Signed-off-by: David Woodhouse --- include/net/net.h | 3 +++ net/net.c | 53 +++ 2 files changed, 56 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH v3 23/28] net: report list of available models according to platform
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse By noting the models for which a configuration was requested, we can give the user an accurate list of which NIC models were actually available on the platform/configuration that was otherwise chosen. Signed-off-by: David Woodhouse --- net/net.c | 94 +++ 1 file changed, 94 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH v3 22/28] net: add qemu_{configure,create}_nic_device(), qemu_find_nic_info()
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse Most code which directly accesses nd_table[] and nb_nics uses them for one of two things. Either "I have created a NIC device and I'd like a configuration for it", or "I will create a NIC device *if* there is a configuration for it". With some variants on the theme around whether they actually *check* if the model specified in the configuration is the right one. Provide functions which perform both of those, allowing platforms to be a little more consistent and as a step towards making nd_table[] and nb_nics private to the net code. Also export the qemu_find_nic_info() helper, as some platforms have special cases they need to handle. Signed-off-by: David Woodhouse --- include/net/net.h | 7 ++- net/net.c | 51 +++ 2 files changed, 57 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 21/28] xen-platform: unplug AHCI disks
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse To support Xen guests using the Q35 chipset, the unplug protocol needs to also remove AHCI disks. Make pci_xen_ide_unplug() more generic, iterating over the children of the PCI device and destroying the "ide-hd" devices. That works the same for both AHCI and IDE, as does the detection of the primary disk as unit 0 on the bus named "ide.0". Then pci_xen_ide_unplug() can be used for both AHCI and IDE devices. Signed-off-by: David Woodhouse --- hw/i386/xen/xen_platform.c | 68 +- 1 file changed, 45 insertions(+), 23 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 13/28] hw/xen: automatically assign device index to block devices
On 27/10/2023 09:45, David Woodhouse wrote: On Fri, 2023-10-27 at 08:30 +0100, Durrant, Paul wrote: + if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) { + XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); + char fe_path[XENSTORE_ABS_PATH_MAX + 1]; + char *value; + int disk = 0; + unsigned long idx; + + /* Find an unoccupied device name */ Not sure this is going to work is it? What happens if 'hda' or 'sda', or 'd0' exists? I think you need to use the core of the code in xen_block_set_vdev() to generate names and search all possible encodings for each disk. Do we care? You're allowed to have *all* of "hda", "sda" and "xvda" at the same time. If a user explicitly provides "sda" and then provides another disk without giving it a name, we're allowed to use "xvda". Maybe sda and xvda can co-exist, but https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/man/xen-vbd-interface.7.pandoc;h=ba0d159dfa7eaf359922583ccd6d2b413acddb13;hb=HEAD#l125 says that you'll likely run into trouble if hda exists and you happen to create xvda. Hell, you can also have *separate* backing stores provided as "hda1", "sda1" and "xvda1". I *might* have tolerated a heckle that this function should check for at least the latter of those, but when I was first coding it up I was more inclined to argue "Don't Do That Then". This code is allocating a name automatically so I think the onus is on it not create a needless clash which is likely to have unpredictable results depending on what the guest is. Just avoid any aliasing in the first place and things will be fine. Paul
Re: [PATCH v3 20/28] net: do not delete nics in net_cleanup()
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse In net_cleanup() we only need to delete the netdevs, as those may have state which outlives Qemu when it exits, and thus may actually need to be cleaned up on exit. The nics, on the other hand, are owned by the device which created them. Most devices don't bother to clean up on exit because they don't have any state which will outlive Qemu... but XenBus devices do need to clean up their nodes in XenStore, and do have an exit handler to delete them. When the XenBus exit handler destroys the xen-net-device, it attempts to delete its nic after net_cleanup() had already done so. And crashes. Fix this by only deleting netdevs as we walk the list. As the comment notes, we can't use QTAILQ_FOREACH_SAFE() as each deletion may remove *multiple* entries, including the "safely" saved 'next' pointer. But we can store the *previous* entry, since nics are safe. Signed-off-by: David Woodhouse --- net/net.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 19/28] hw/xen: update Xen PV NIC to XenDevice model
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse This allows us to use Xen PV networking with emulated Xen guests, and to add them on the command line or hotplug. Signed-off-by: David Woodhouse --- hw/net/meson.build| 2 +- hw/net/trace-events | 11 + hw/net/xen_nic.c | 484 +- hw/xenpv/xen_machine_pv.c | 1 - 4 files changed, 381 insertions(+), 117 deletions(-) diff --git a/hw/net/meson.build b/hw/net/meson.build index 2632634df3..f64651c467 100644 --- a/hw/net/meson.build +++ b/hw/net/meson.build @@ -1,5 +1,5 @@ system_ss.add(when: 'CONFIG_DP8393X', if_true: files('dp8393x.c')) -system_ss.add(when: 'CONFIG_XEN', if_true: files('xen_nic.c')) +system_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen_nic.c')) system_ss.add(when: 'CONFIG_NE2000_COMMON', if_true: files('ne2000.c')) # PCI network cards diff --git a/hw/net/trace-events b/hw/net/trace-events index 3abfd65e5b..3097742cc0 100644 --- a/hw/net/trace-events +++ b/hw/net/trace-events @@ -482,3 +482,14 @@ dp8393x_receive_oversize(int size) "oversize packet, pkt_size is %d" dp8393x_receive_not_netcard(void) "packet not for netcard" dp8393x_receive_packet(int crba) "Receive packet at 0x%"PRIx32 dp8393x_receive_write_status(int crba) "Write status at 0x%"PRIx32 + +# xen_nic.c +xen_netdev_realize(int dev, const char *info, const char *peer) "vif%u info '%s' peer '%s'" +xen_netdev_unrealize(int dev) "vif%u" +xen_netdev_create(int dev) "vif%u" +xen_netdev_destroy(int dev) "vif%u" +xen_netdev_disconnect(int dev) "vif%u" +xen_netdev_connect(int dev, unsigned int tx, unsigned int rx, int port) "vif%u tx %u rx %u port %u" +xen_netdev_frontend_changed(const char *dev, int state) "vif%s state %d" +xen_netdev_tx(int dev, int ref, int off, int len, unsigned int flags, const char *c, const char *d, const char *m, const char *e) "vif%u ref %u off %u len %u flags 0x%x%s%s%s%s" +xen_netdev_rx(int dev, int idx, int status, int flags) "vif%u idx %d status %d flags 0x%x" diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c index 9bbf6599fc..af4ba3f1e6 100644 --- a/hw/net/xen_nic.c +++ b/hw/net/xen_nic.c @@ -20,6 +20,13 @@ */ #include "qemu/osdep.h" +#include "qemu/main-loop.h" +#include "qemu/cutils.h" +#include "qemu/log.h" +#include "qemu/qemu-print.h" +#include "qapi/qmp/qdict.h" +#include "qapi/error.h" + #include #include #include @@ -27,18 +34,26 @@ #include "net/net.h" #include "net/checksum.h" #include "net/util.h" -#include "hw/xen/xen-legacy-backend.h" + +#include "hw/xen/xen-backend.h" +#include "hw/xen/xen-bus-helper.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-properties-system.h" #include "hw/xen/interface/io/netif.h" +#include "hw/xen/interface/io/xs_wire.h" + +#include "trace.h" /* - */ struct XenNetDev { -struct XenLegacyDevice xendev; /* must be first */ -char *mac; +struct XenDevice xendev; /* must be first */ +XenEventChannel *event_channel; +int dev; int tx_work; -int tx_ring_ref; -int rx_ring_ref; +unsigned int tx_ring_ref; +unsigned int rx_ring_ref; struct netif_tx_sring *txs; struct netif_rx_sring *rxs; netif_tx_back_ring_t tx_ring; @@ -47,6 +62,11 @@ struct XenNetDev { NICState *nic; }; +typedef struct XenNetDev XenNetDev; + +#define TYPE_XEN_NET_DEVICE "xen-net-device" +OBJECT_DECLARE_SIMPLE_TYPE(XenNetDev, XEN_NET_DEVICE) + /* - */ static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, int8_t st) @@ -68,7 +88,8 @@ static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, i netdev->tx_ring.rsp_prod_pvt = ++i; RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>tx_ring, notify); if (notify) { -xen_pv_send_notify(>xendev); +xen_device_notify_event_channel(XEN_DEVICE(netdev), +netdev->event_channel, NULL); } if (i == netdev->tx_ring.req_cons) { @@ -104,13 +125,16 @@ static void net_tx_error(struct XenNetDev *netdev, netif_tx_request_t *txp, RING #endif } -static void net_tx_packets(struct XenNetDev *netdev) +static bool net_tx_packets(struct XenNetDev *netdev) { +bool done_something = false; netif_tx_request_t txreq; RING_IDX rc, rp; void *page; void *tmpbuf = NULL; +assert(qemu_mutex_iothread_locked()); + for (;;) { rc = netdev->tx_ring.req_cons; rp = netdev->tx_ring.sring->req_prod; @@ -122,49 +146,52 @@ static void net_tx_packets(struct XenNetDev *netdev) } memcpy(, RING_GET_REQUEST(>tx_ring, rc), sizeof(txreq));
Re: [PATCH v3 18/28] hw/xen: only remove peers of PCI NICs on unplug
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse When the Xen guest asks to unplug *emulated* NICs, it's kind of unhelpful also to unplug the peer of the *Xen* PV NIC. Signed-off-by: David Woodhouse --- hw/i386/xen/xen_platform.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 17/28] hw/xen: add support for Xen primary console in emulated mode
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page into the guest for its ring, and also allocates the guest-side event channel. The guest's grant table is even primed to export that page using a known grant ref#. Add support for all that in emulated mode, so that we can have a primary console. For reasons unclear, the backends running under real Xen don't just use a mapping of the well-known GNTTAB_RESERVED_CONSOLE grant ref (which would also be in the ring-ref node in XenStore). Instead, the toolstack sets the ring-ref node of the primary console to the GFN of the guest page. The backend is expected to handle that special case and map it with foreignmem operations instead. We don't have an implementation of foreignmem ops for emulated Xen mode, so just make it map GNTTAB_RESERVED_CONSOLE instead. This would probably work for real Xen too, but we can't work out how to make real Xen create a primary console of type "ioemu" to make QEMU drive it, so we can't test that; might as well leave it as it is for now under Xen. Now at last we can boot the Xen PV shim and run PV kernels in QEMU. Signed-off-by: David Woodhouse --- hw/char/xen_console.c | 78 hw/i386/kvm/meson.build | 1 + hw/i386/kvm/trace-events | 2 + hw/i386/kvm/xen-stubs.c | 8 ++ hw/i386/kvm/xen_gnttab.c | 7 +- hw/i386/kvm/xen_primary_console.c | 193 ++ hw/i386/kvm/xen_primary_console.h | 23 hw/i386/kvm/xen_xenstore.c| 10 ++ hw/xen/xen-bus.c | 5 + include/hw/xen/xen-bus.h | 1 + target/i386/kvm/xen-emu.c | 23 +++- 11 files changed, 328 insertions(+), 23 deletions(-) create mode 100644 hw/i386/kvm/xen_primary_console.c create mode 100644 hw/i386/kvm/xen_primary_console.h Reviewed-by: Paul Durrant
Re: [PATCH v3 14/28] hw/xen: add get_frontend_path() method to XenDeviceClass
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse The primary Xen console is special. The guest's side is set up for it by the toolstack automatically and not by the standard PV init sequence. Accordingly, its *frontend* doesn't appear in …/device/console/0 either; instead it appears under …/console in the guest's XenStore node. To allow the Xen console driver to override the frontend path for the primary console, add a method to the XenDeviceClass which can be used instead of the standard xen_device_get_frontend_path() Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c | 11 ++- include/hw/xen/xen-bus.h | 2 ++ 2 files changed, 12 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 13/28] hw/xen: automatically assign device index to block devices
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse There's no need to force the user to assign a vdev. We can automatically assign one, starting at xvda and searching until we find the first disk name that's unused. This means we can now allow '-drive if=xen,file=xxx' to work without an explicit separate -driver argument, just like if=virtio. Rip out the legacy handling from the xenpv machine, which was scribbling over any disks configured by the toolstack, and didn't work with anything but raw images. Signed-off-by: David Woodhouse Acked-by: Kevin Wolf --- blockdev.c | 15 +--- hw/block/xen-block.c| 38 + hw/xen/xen_devconfig.c | 28 - hw/xenpv/xen_machine_pv.c | 9 --- include/hw/xen/xen-legacy-backend.h | 1 - 5 files changed, 50 insertions(+), 41 deletions(-) diff --git a/blockdev.c b/blockdev.c index a01c62596b..5d9f2e5395 100644 --- a/blockdev.c +++ b/blockdev.c @@ -255,13 +255,13 @@ void drive_check_orphaned(void) * Ignore default drives, because we create certain default * drives unconditionally, then leave them unclaimed. Not the * users fault. - * Ignore IF_VIRTIO, because it gets desugared into -device, - * so we can leave failing to -device. + * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into + * -device, so we can leave failing to -device. * Ignore IF_NONE, because leaving unclaimed IF_NONE remains * available for device_add is a feature. */ if (dinfo->is_default || dinfo->type == IF_VIRTIO -|| dinfo->type == IF_NONE) { +|| dinfo->type == IF_XEN || dinfo->type == IF_NONE) { continue; } if (!blk_get_attached_dev(blk)) { @@ -977,6 +977,15 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type, qemu_opt_set(devopts, "driver", "virtio-blk", _abort); qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"), _abort); +} else if (type == IF_XEN) { +QemuOpts *devopts; +devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, + _abort); +qemu_opt_set(devopts, "driver", + (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk", + _abort); +qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"), + _abort); } filename = qemu_opt_get(legacy_opts, "file"); diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index 64470fc579..5011fe9430 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -27,6 +27,7 @@ #include "sysemu/block-backend.h" #include "sysemu/iothread.h" #include "dataplane/xen-block.h" +#include "hw/xen/interface/io/xs_wire.h" #include "trace.h" static char *xen_block_get_name(XenDevice *xendev, Error **errp) @@ -34,6 +35,43 @@ static char *xen_block_get_name(XenDevice *xendev, Error **errp) XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev); XenBlockVdev *vdev = >props.vdev; +if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) { +XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); +char fe_path[XENSTORE_ABS_PATH_MAX + 1]; +char *value; +int disk = 0; +unsigned long idx; + +/* Find an unoccupied device name */ Not sure this is going to work is it? What happens if 'hda' or 'sda', or 'd0' exists? I think you need to use the core of the code in xen_block_set_vdev() to generate names and search all possible encodings for each disk. Paul +while (disk < (1 << 20)) { +if (disk < (1 << 4)) { +idx = (202 << 8) | (disk << 4); +} else { +idx = (1 << 28) | (disk << 8); +} +snprintf(fe_path, sizeof(fe_path), + "/local/domain/%u/device/vbd/%lu", + xendev->frontend_id, idx); +value = qemu_xen_xs_read(xenbus->xsh, XBT_NULL, fe_path, NULL); +if (!value) { +if (errno == ENOENT) { +vdev->type = XEN_BLOCK_VDEV_TYPE_XVD; +vdev->partition = 0; +vdev->disk = disk; +vdev->number = idx; +goto found; +} +error_setg(errp, "cannot read %s: %s", fe_path, + strerror(errno)); +return NULL; +} +free(value); +disk++; +} +error_setg(errp, "cannot find device vdev for block device"); +return NULL; +} + found: return g_strdup_printf("%lu", vdev->number); } diff --git a/hw/xen/xen_devconfig.c b/hw/xen/xen_devconfig.c index
Re: [PATCH v3 08/28] i386/xen: Ignore VCPU_SSHOTTMR_future flag in set_singleshot_timer()
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse Upstream Xen now ignores this flag¹, since the only guest kernel ever to use it was buggy. ¹ https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=19c6cbd909 Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- target/i386/kvm/xen-emu.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 07/28] hw/xen: use correct default protocol for xen-block on x86
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse Even on x86_64 the default protocol is the x86-32 one if the guest doesn't specifically ask for x86-64. Fixes: b6af8926fb85 ("xen: add implementations of xen-block connect and disconnect functions...") Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH v3 06/28] hw/xen: take iothread mutex in xen_evtchn_reset_op()
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse The xen_evtchn_soft_reset() function requires the iothread mutex, but is also called for the EVTCHNOP_reset hypercall. Ensure the mutex is taken in that case. Fixes: a15b10978fe6 ("hw/xen: Implement EVTCHNOP_reset") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 1 + 1 file changed, 1 insertion(+) Reviewed-by: Paul Durrant
Re: [PATCH v3 05/28] hw/xen: fix XenStore watch delivery to guest
On 25/10/2023 15:50, David Woodhouse wrote: From: David Woodhouse When fire_watch_cb() found the response buffer empty, it would call deliver_watch() to generate the XS_WATCH_EVENT message in the response buffer and send an event channel notification to the guest… without actually *copying* the response buffer into the ring. So there was nothing for the guest to see. The pending response didn't actually get processed into the ring until the guest next triggered some activity from its side. Add the missing call to put_rsp(). It might have been slightly nicer to call xen_xenstore_event() here, which would *almost* have worked. Except for the fact that it calls xen_be_evtchn_pending() to check that it really does have an event pending (and clear the eventfd for next time). And under Xen it's defined that setting that fd to O_NONBLOCK isn't guaranteed to work, so the emu implementation follows suit. This fixes Xen device hot-unplug. Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation stubs") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
RE: aio_set_event_notifier(is_external=true) in Xen code?
> -Original Message- > From: Stefan Hajnoczi > Sent: 28 March 2023 16:51 > To: Woodhouse, David ; Durrant, Paul > > Cc: qemu-devel@nongnu.org; qemu-bl...@nongnu.org > Subject: [EXTERNAL] aio_set_event_notifier(is_external=true) in Xen code? > > Hi, > I'm removing the aio_disable_external() API from QEMU and noticed that > Xen code calls aio_set_fd_handler(is_external=true) in hw/xen/xen-bus.c > and hw/i386/kvm/xen_xenstore.c. > > It wasn't clear to me whether is_external=true is necessary here. > is_external=true is mainly used to temporarily pause I/O submission in > the QEMU block layer. Maybe is_external=true was chosen out of caution > but actually has no effect in this code. > > Does the Xen code rely on is_external=true? That's a good question. The call in xen-bus.c has been there since commit 83361a8a1f932, which was when it substituted the old call to qemu_set_fd_handler(). I suspect this was out of caution (or possibly misunderstanding) at the time, although setting the call to xen_device_set_event_channel_context() in xen_block_dataplane_stop() does suggest it may be happening while I/O could be in progress so it could have been in response to problems caught in testing. I suspect the code in xen_xenstore.c just copied what xen-bus.c did. Sorry I can't give you a definitive answer... it's all rather a long time ago. Cheers, Paul
RE: [PATCH v2 28/32] contrib/gitdm: add Amazon to the domain map
> -Original Message- > From: Alex Bennée > Sent: 15 March 2023 17:43 > To: qemu-devel@nongnu.org > Cc: Akihiko Odaki ; Marc-André Lureau > ; qemu-ri...@nongnu.org; Riku Voipio > ; Igor Mammedov ; Xiao Guangrong > ; Thomas Huth ; Wainer dos > Santos Moschetta ; Dr. David Alan Gilbert > ; Alex Williamson ; Hao > Wu ; Cleber Rosa ; Daniel Henrique > Barboza ; Jan Kiszka ; Aurelien > Jarno ; qemu-...@nongnu.org; Marcelo Tosatti > ; Eduardo Habkost ; Alexandre > Iooss ; Gerd Hoffmann ; Palmer > Dabbelt ; Ilya Leoshkevich ; qemu- > p...@nongnu.org; Juan Quintela ; Cédric Le Goater > ; Darren Kenny ; > k...@vger.kernel.org; Marcel Apfelbaum ; Peter > Maydell ; Richard Henderson > ; Stafford Horne ; Weiwei > Li ; Sunil V L ; Stefan > Hajnoczi ; Thomas Huth ; Vijai > Kumar K ; Liu Zhiwei > ; David Gibson > ; Song Gao ; Paolo > Bonzini ; Michael S. Tsirkin ; Niek > Linnenbank ; Greg Kurz ; Laurent > Vivier ; Qiuhao Li ; Philippe > Mathieu-Daudé ; Xiaojuan Yang > ; Mahmoud Mandour ; > Alexander Bulekov ; Jiaxun Yang ; > qemu-bl...@nongnu.org; Yanan Wang ; David > Woodhouse ; qemu-s3...@nongnu.org; Strahinja Jankovic > ; Bandan Das ; Alistair > Francis ; Aleksandar Rikalo > ; Tyrone Ting ; Kevin > Wolf ; David Hildenbrand ; Beraldo > Leal ; Beniamino Galvani ; Paul > Durrant ; Bin Meng ; Sunil > Muthuswamy ; Hanna Reitz ; > Peter Xu ; Alex Bennée ; Graf > (AWS), Alexander ; Durrant, Paul ; > Woodhouse, David > Subject: [EXTERNAL] [PATCH v2 28/32] contrib/gitdm: add Amazon to the > domain map > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > We have multiple contributors from both .co.uk and .com versions of > the address. > > Signed-off-by: Alex Bennée > Cc: Alexander Graf > Cc: Paul Durrant > Cc: David Wooodhouse > Reviewed-by: Philippe Mathieu-Daudé > Message-Id: <20230310180332.2274827-7-alex.ben...@linaro.org> > --- > contrib/gitdm/domain-map | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map > index 4a988c5b5f..8dce276a1c 100644 > --- a/contrib/gitdm/domain-map > +++ b/contrib/gitdm/domain-map > @@ -4,6 +4,8 @@ > # This maps email domains to nice easy to read company names > # > > +amazon.com Amazon > +amazon.co.ukAmazon You might want 'amazon.de' too but as far as it goes... Reviewed-by: Paul Durrant > amd.com AMD > aspeedtech.com ASPEED Technology Inc. > baidu.com Baidu > -- > 2.39.2
Re: [PATCH 2/2] hw/i386/xen/xen-hvm: Inline xen_piix_pci_write_config_client() and remove it
On 26/06/2022 10:46, Bernhard Beschow wrote: xen_piix_pci_write_config_client() is implemented in the xen sub tree and uses PIIX constants internally, thus creating a direct dependency on PIIX. Now that xen_set_pci_link_route() is stubbable, the logic of xen_piix_pci_write_config_client() can be moved to PIIX which resolves the dependency. Signed-off-by: Bernhard Beschow Reviewed-by: Paul Durrant
Re: [PATCH 1/2] hw/i386/xen/xen-hvm: Allow for stubbing xen_set_pci_link_route()
On 26/06/2022 10:46, Bernhard Beschow wrote: The only user of xen_set_pci_link_route() is xen_piix_pci_write_config_client() which implements PIIX-specific logic in the xen namespace. This makes xen-hvm depend on PIIX which could be avoided if xen_piix_pci_write_config_client() was implemented in PIIX. In order to do this, xen_set_pci_link_route() needs to be stubbable which this patch addresses. Signed-off-by: Bernhard Beschow Reviewed-by: Paul Durrant
Re: [PATCH] block: get rid of blk->guest_block_size
On 18/05/2022 14:09, Stefan Hajnoczi wrote: Commit 1b7fd729559c ("block: rename buffer_alignment to guest_block_size") noted: At this point, the field is set by the device emulation, but completely ignored by the block layer. The last time the value of buffer_alignment/guest_block_size was actually used was before commit 339064d50639 ("block: Don't use guest sector size for qemu_blockalign()"). This value has not been used since 2013. Get rid of it. Cc: Xie Yongji Signed-off-by: Stefan Hajnoczi Xen part... Reviewed-by: Paul Durrant
Re: [PATCH 3/3] include/hw/ide: Unexport pci_piix3_xen_ide_unplug()
On 08/05/2022 11:34, Bernhard Beschow wrote: This function was declared in a generic and public header, implemented in a device-specific source file but only used in xen_platform. Given its 'aux' parameter, this function is more xen-specific than piix-specific. Also, the hardcoded magic constants seem to be generic and related to PCIIDEState and IDEBus rather than piix. Therefore, move this function to xen_platform, unexport it, and drop the "piix3" in the function name as well. Signed-off-by: Bernhard Beschow Reviewed-by: Paul Durrant ... with one suggestion... --- hw/i386/xen/xen_platform.c | 49 +- hw/ide/piix.c | 46 --- include/hw/ide.h | 3 --- 3 files changed, 48 insertions(+), 50 deletions(-) diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c index 72028449ba..124ffeae35 100644 --- a/hw/i386/xen/xen_platform.c +++ b/hw/i386/xen/xen_platform.c @@ -26,6 +26,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "hw/ide.h" +#include "hw/ide/pci.h" #include "hw/pci/pci.h" #include "hw/xen/xen_common.h" #include "migration/vmstate.h" @@ -134,6 +135,52 @@ static void pci_unplug_nics(PCIBus *bus) pci_for_each_device(bus, 0, unplug_nic, NULL); } +/* + * The Xen HVM unplug protocol [1] specifies a mechanism to allow guests to + * request unplug of 'aux' disks (which is stated to mean all IDE disks, + * except the primary master). + * + * NOTE: The semantics of what happens if unplug of all disks and 'aux' disks + * is simultaneously requested is not clear. The implementation assumes + * that an 'all' request overrides an 'aux' request. + * + * [1] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.pandoc + */ +static int pci_xen_ide_unplug(DeviceState *dev, bool aux) +{ +PCIIDEState *pci_ide; +int i; +IDEDevice *idedev; +IDEBus *idebus; +BlockBackend *blk; + +pci_ide = PCI_IDE(dev); + +for (i = aux ? 1 : 0; i < 4; i++) { +idebus = _ide->bus[i / 2]; +blk = idebus->ifs[i % 2].blk; + +if (blk && idebus->ifs[i % 2].drive_kind != IDE_CD) { +if (!(i % 2)) { +idedev = idebus->master; +} else { +idedev = idebus->slave; +} + +blk_drain(blk); +blk_flush(blk); + +blk_detach_dev(blk, DEVICE(idedev)); +idebus->ifs[i % 2].blk = NULL; +idedev->conf.blk = NULL; +monitor_remove_blk(blk); +blk_unref(blk); +} +} +qdev_reset_all(dev); +return 0; The return value is ignored so you may as well make this a static void function. Paul +} + static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque) { uint32_t flags = *(uint32_t *)opaque; @@ -147,7 +194,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque) switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) { case PCI_CLASS_STORAGE_IDE: -pci_piix3_xen_ide_unplug(DEVICE(d), aux); +pci_xen_ide_unplug(DEVICE(d), aux); break; case PCI_CLASS_STORAGE_SCSI: diff --git a/hw/ide/piix.c b/hw/ide/piix.c index bc1b37512a..9a9b28078e 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -173,52 +173,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) } } -/* - * The Xen HVM unplug protocol [1] specifies a mechanism to allow guests to - * request unplug of 'aux' disks (which is stated to mean all IDE disks, - * except the primary master). - * - * NOTE: The semantics of what happens if unplug of all disks and 'aux' disks - * is simultaneously requested is not clear. The implementation assumes - * that an 'all' request overrides an 'aux' request. - * - * [1] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.pandoc - */ -int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux) -{ -PCIIDEState *pci_ide; -int i; -IDEDevice *idedev; -IDEBus *idebus; -BlockBackend *blk; - -pci_ide = PCI_IDE(dev); - -for (i = aux ? 1 : 0; i < 4; i++) { -idebus = _ide->bus[i / 2]; -blk = idebus->ifs[i % 2].blk; - -if (blk && idebus->ifs[i % 2].drive_kind != IDE_CD) { -if (!(i % 2)) { -idedev = idebus->master; -} else { -idedev = idebus->slave; -} - -blk_drain(blk); -blk_flush(blk); - -blk_detach_dev(blk, DEVICE(idedev)); -idebus->ifs[i % 2].blk = NULL; -idedev->conf.blk = NULL; -monitor_remove_blk(blk); -blk_unref(blk); -} -} -qdev_reset_all(dev); -return 0; -} - static void pci_piix_ide_exitfn(PCIDevice *dev) { PCIIDEState *d = PCI_IDE(dev); diff --git a/include/hw/ide.h b/include/hw/ide.h index
Re: [PATCH] xen-hvm: Allow disabling buffer_io_timer
On 26/01/2022 13:43, Jason Andryuk wrote: On Tue, Dec 14, 2021 at 8:40 AM Durrant, Paul wrote: On 10/12/2021 11:34, Jason Andryuk wrote: commit f37f29d31488 "xen: slightly simplify bufioreq handling" hard coded setting req.count = 1 during initial field setup before the main loop. This missed a subtlety that an early exit from the loop when there are no ioreqs to process, would have req.count == 0 for the return value. handle_buffered_io() would then remove state->buffered_io_timer. Instead handle_buffered_iopage() is basically always returning true and handle_buffered_io() always re-setting the timer. Restore the disabling of the timer by introducing a new handled_ioreq boolean and use as the return value. The named variable will more clearly show the intent of the code. Signed-off-by: Jason Andryuk Reviewed-by: Paul Durrant Thanks, Paul. What is the next step for getting this into QEMU? Anthony, can you queue this? Paul To re-state more plainly, this patch fixes a bug to let QEMU go idle for longer stretches of time. Without it, buffer_io_timer continues to re-arm and fire every 100ms even if there is nothing to do. Regards, Jason
Re: [PATCH v2] xen-mapcache: Avoid entry->lock overflow
On 24/01/2022 10:44, Ross Lagerwall wrote: In some cases, a particular mapcache entry may be mapped 256 times causing the lock field to wrap to 0. For example, this may happen when using emulated NVME and the guest submits a large scatter-gather write. At this point, the entry map be remapped causing QEMU to write the wrong data or crash (since remap is not atomic). Avoid this overflow by increasing the lock field to a uint32_t and also detect it and abort rather than continuing regardless. Signed-off-by: Ross Lagerwall Reviewed-by: Paul Durrant --- Changes in v2: Change type to uint32_t since there is a hole there anyway. The struct size remains at 48 bytes on x86_64. hw/i386/xen/xen-mapcache.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c index bd47c3d672..f2ef977963 100644 --- a/hw/i386/xen/xen-mapcache.c +++ b/hw/i386/xen/xen-mapcache.c @@ -52,7 +52,7 @@ typedef struct MapCacheEntry { hwaddr paddr_index; uint8_t *vaddr_base; unsigned long *valid_mapping; -uint8_t lock; +uint32_t lock; #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0) uint8_t flags; hwaddr size; @@ -355,6 +355,12 @@ tryagain: if (lock) { MapCacheRev *reventry = g_malloc0(sizeof(MapCacheRev)); entry->lock++; +if (entry->lock == 0) { +fprintf(stderr, +"mapcache entry lock overflow: "TARGET_FMT_plx" -> %p\n", +entry->paddr_index, entry->vaddr_base); +abort(); +} reventry->dma = dma; reventry->vaddr_req = mapcache->last_entry->vaddr_base + address_offset; reventry->paddr_index = mapcache->last_entry->paddr_index;
Re: [PATCH] xen-hvm: Allow disabling buffer_io_timer
On 10/12/2021 11:34, Jason Andryuk wrote: commit f37f29d31488 "xen: slightly simplify bufioreq handling" hard coded setting req.count = 1 during initial field setup before the main loop. This missed a subtlety that an early exit from the loop when there are no ioreqs to process, would have req.count == 0 for the return value. handle_buffered_io() would then remove state->buffered_io_timer. Instead handle_buffered_iopage() is basically always returning true and handle_buffered_io() always re-setting the timer. Restore the disabling of the timer by introducing a new handled_ioreq boolean and use as the return value. The named variable will more clearly show the intent of the code. Signed-off-by: Jason Andryuk Reviewed-by: Paul Durrant
RE: [Xen-devel] [PATCH v3 03/20] exec: Let qemu_ram_*() functions take a const pointer argument
> -Original Message- > From: Xen-devel On Behalf Of > Philippe Mathieu-Daudé > Sent: 20 February 2020 13:06 > To: Peter Maydell ; qemu-devel@nongnu.org > Cc: Fam Zheng ; Dmitry Fleytman > ; k...@vger.kernel.org; Michael S. Tsirkin > ; Jason Wang ; Gerd Hoffmann > ; Edgar E. Iglesias ; Stefano > Stabellini ; Matthew Rosato > ; qemu-bl...@nongnu.org; David Hildenbrand > ; Halil Pasic ; Christian > Borntraeger ; Hervé Poussineau > ; Marcel Apfelbaum ; > Anthony Perard ; xen- > de...@lists.xenproject.org; Aleksandar Rikalo rk.com>; Richard Henderson ; Philippe Mathieu-Daudé > ; Laurent Vivier ; Thomas Huth > ; Eduardo Habkost ; Stefan Weil > ; Alistair Francis ; Richard > Henderson ; Paul Durrant ; > Eric Auger ; qemu-s3...@nongnu.org; qemu- > a...@nongnu.org; Cédric Le Goater ; John Snow > ; David Gibson ; Igor > Mitsyanko ; Cornelia Huck ; > Michael Walle ; qemu-...@nongnu.org; Paolo Bonzini > > Subject: [Xen-devel] [PATCH v3 03/20] exec: Let qemu_ram_*() functions > take a const pointer argument > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Paul Durrant
RE: [Xen-devel] [PATCH v3 19/20] Let cpu_[physical]_memory() calls pass a boolean 'is_write' argument
> -Original Message- > From: Xen-devel On Behalf Of > Philippe Mathieu-Daudé > Sent: 20 February 2020 13:06 > To: Peter Maydell ; qemu-devel@nongnu.org > Cc: Fam Zheng ; Dmitry Fleytman > ; k...@vger.kernel.org; Michael S. Tsirkin > ; Jason Wang ; Gerd Hoffmann > ; Edgar E. Iglesias ; Stefano > Stabellini ; Matthew Rosato > ; qemu-bl...@nongnu.org; David Hildenbrand > ; Halil Pasic ; Christian > Borntraeger ; Hervé Poussineau > ; Marcel Apfelbaum ; > Anthony Perard ; xen- > de...@lists.xenproject.org; Aleksandar Rikalo rk.com>; Richard Henderson ; Philippe Mathieu-Daudé > ; Laurent Vivier ; Thomas Huth > ; Eduardo Habkost ; Stefan Weil > ; Alistair Francis ; Richard > Henderson ; Paul Durrant ; > Eric Auger ; qemu-s3...@nongnu.org; qemu- > a...@nongnu.org; Cédric Le Goater ; John Snow > ; David Gibson ; Igor > Mitsyanko ; Cornelia Huck ; > Michael Walle ; qemu-...@nongnu.org; Paolo Bonzini > > Subject: [Xen-devel] [PATCH v3 19/20] Let cpu_[physical]_memory() calls > pass a boolean 'is_write' argument > > Use an explicit boolean type. > > This commit was produced with the included Coccinelle script > scripts/coccinelle/exec_rw_const. > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Paul Durrant
RE: [Xen-devel] [PATCH 6/6] xen-pt: Round pci regions sizes to XEN_PAGE_SIZE
> -Original Message- > > > Linux PCI subsytem has an option resource_alignment that can be > > applied to either a single device or all devices. Booting with > > pci=resource_aligment=4096 will align each device to a page. Do you > > think pciback should force resource_alignment=4096 for dom0? > That sounds like a good idea. > Ideally Xen should keep track of the BARs position and size and refuse > to passthrough devices that have BARs sharing a page with other > devices BARs. > > > Are > > there other MMIO ranges to be concerned about adjacent to BARs? > > IIRC you can have two BARs of different devices in the same 4K page, > BARs are only aligned to it's size, so BARs smaller than 4K are not > required to be page aligned. If we had a notion of assignment groups for this, as well as devices sharing requester id, then Xen would not need to refuse pass-through, it would just require that all devices sharing the page were passed through as a unit. Paul
RE: [Xen-devel] xen-block: race condition when stopping the device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> -Original Message- > From: Durrant, Paul > Sent: 16 December 2019 09:50 > To: Durrant, Paul ; Julien Grall ; > Ian Jackson > Cc: Jürgen Groß ; Stefano Stabellini > ; qemu-devel@nongnu.org; osstest service owner > ; Anthony Perard > ; xen-de...@lists.xenproject.org > Subject: RE: [Xen-devel] xen-block: race condition when stopping the > device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL) > > > -Original Message----- > > From: Xen-devel On Behalf Of > > Durrant, Paul > > Sent: 16 December 2019 09:34 > > To: Julien Grall ; Ian Jackson > > Cc: Jürgen Groß ; Stefano Stabellini > > ; qemu-devel@nongnu.org; osstest service owner > > ; Anthony Perard > > ; xen-de...@lists.xenproject.org > > Subject: Re: [Xen-devel] xen-block: race condition when stopping the > > device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL) > > > > > -Original Message- > > [snip] > > > >> > > > >> This feels like a race condition between the init/free code with > > > >> handler. Anthony, does it ring any bell? > > > >> > > > > > > > > From that stack bt it looks like an iothread managed to run after > the > > > sring was NULLed. This should not be able happen as the dataplane > should > > > have been moved back onto QEMU's main thread context before the ring > is > > > unmapped. > > > > > > My knowledge of this code is fairly limited, so correct me if I am > > wrong. > > > > > > blk_set_aio_context() would set the context for the block aio. AFAICT, > > > the only aio for the block is xen_block_complete_aio(). > > > > Not quite. xen_block_dataplane_start() calls > > xen_device_bind_event_channel() and that will add an event channel fd > into > > the aio context, so the shared ring is polled by the iothread as well as > > block i/o completion. > > > > > > > > In the stack above, we are not dealing with a block aio but an aio tie > > > to the event channel (see the call from xen_device_poll). So I don't > > > think the blk_set_aio_context() would affect the aio. > > > > > > > For the reason I outline above, it does. > > > > > So it would be possible to get the iothread running because we > received > > > a notification on the event channel while we are stopping the block > (i.e > > > xen_block_dataplane_stop()). > > > > > > > We should assume an iothread can essentially run at any time, as it is a > > polling entity. It should eventually block polling on fds assign to its > > aio context but I don't think the abstraction guarantees that it cannot > be > > awoken for other reasons (e.g. off a timeout). However and event from > the > > frontend will certainly cause the evtchn fd poll to wake up. > > > > > If xen_block_dataplane_stop() grab the context lock first, then the > > > iothread dealing with the event may wait on the lock until its > released. > > > > > > By the time the lock is grabbed, we may have free all the resources > > > (including srings). So the event iothread will end up to dereference a > > > NULL pointer. > > > > > > > I think the problem may actually be that xen_block_dataplane_event() > does > > not acquire the context and thus is not synchronized with > > xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt > is > > not clear whether a poll handler called by an iothread needs to acquire > > the context though; TBH I would not have thought it necessary. > > > > > It feels to me we need a way to quiesce all the iothreads (blk, > > > event,...) before continuing. But I am a bit unsure how to do this in > > > QEMU. > > > > > > > Looking at virtio-blk.c I see that it does seem to close off its evtchn > > equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder > > whether the 'right' thing to do is to call > > xen_device_unbind_event_channel() using the same mechanism to ensure > > xen_block_dataplane_event() can't race. > > Digging around the virtio-blk history I see: > > commit 1010cadf62332017648abee0d7a3dc7f2eef9632 > Author: Stefan Hajnoczi > Date: Wed Mar 7 14:42:03 2018 + > > virtio-blk: fix race between .ioeventfd_stop() and vq handler > > If the main loop thread invokes .ioeventfd_stop() just as the vq > handler > function begins in the IOThread then the handler may lose the race for > the AioCont
RE: [Xen-devel] xen-block: race condition when stopping the device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> -Original Message- > From: Xen-devel On Behalf Of > Durrant, Paul > Sent: 16 December 2019 09:34 > To: Julien Grall ; Ian Jackson > Cc: Jürgen Groß ; Stefano Stabellini > ; qemu-devel@nongnu.org; osstest service owner > ; Anthony Perard > ; xen-de...@lists.xenproject.org > Subject: Re: [Xen-devel] xen-block: race condition when stopping the > device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL) > > > -Original Message- > [snip] > > >> > > >> This feels like a race condition between the init/free code with > > >> handler. Anthony, does it ring any bell? > > >> > > > > > > From that stack bt it looks like an iothread managed to run after the > > sring was NULLed. This should not be able happen as the dataplane should > > have been moved back onto QEMU's main thread context before the ring is > > unmapped. > > > > My knowledge of this code is fairly limited, so correct me if I am > wrong. > > > > blk_set_aio_context() would set the context for the block aio. AFAICT, > > the only aio for the block is xen_block_complete_aio(). > > Not quite. xen_block_dataplane_start() calls > xen_device_bind_event_channel() and that will add an event channel fd into > the aio context, so the shared ring is polled by the iothread as well as > block i/o completion. > > > > > In the stack above, we are not dealing with a block aio but an aio tie > > to the event channel (see the call from xen_device_poll). So I don't > > think the blk_set_aio_context() would affect the aio. > > > > For the reason I outline above, it does. > > > So it would be possible to get the iothread running because we received > > a notification on the event channel while we are stopping the block (i.e > > xen_block_dataplane_stop()). > > > > We should assume an iothread can essentially run at any time, as it is a > polling entity. It should eventually block polling on fds assign to its > aio context but I don't think the abstraction guarantees that it cannot be > awoken for other reasons (e.g. off a timeout). However and event from the > frontend will certainly cause the evtchn fd poll to wake up. > > > If xen_block_dataplane_stop() grab the context lock first, then the > > iothread dealing with the event may wait on the lock until its released. > > > > By the time the lock is grabbed, we may have free all the resources > > (including srings). So the event iothread will end up to dereference a > > NULL pointer. > > > > I think the problem may actually be that xen_block_dataplane_event() does > not acquire the context and thus is not synchronized with > xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt is > not clear whether a poll handler called by an iothread needs to acquire > the context though; TBH I would not have thought it necessary. > > > It feels to me we need a way to quiesce all the iothreads (blk, > > event,...) before continuing. But I am a bit unsure how to do this in > > QEMU. > > > > Looking at virtio-blk.c I see that it does seem to close off its evtchn > equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder > whether the 'right' thing to do is to call > xen_device_unbind_event_channel() using the same mechanism to ensure > xen_block_dataplane_event() can't race. Digging around the virtio-blk history I see: commit 1010cadf62332017648abee0d7a3dc7f2eef9632 Author: Stefan Hajnoczi Date: Wed Mar 7 14:42:03 2018 + virtio-blk: fix race between .ioeventfd_stop() and vq handler If the main loop thread invokes .ioeventfd_stop() just as the vq handler function begins in the IOThread then the handler may lose the race for the AioContext lock. By the time the vq handler is able to acquire the AioContext lock the ioeventfd has already been removed and the handler isn't supposed to run anymore! Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal from within the IOThread. This way no races with the vq handler are possible. Signed-off-by: Stefan Hajnoczi Reviewed-by: Fam Zheng Acked-by: Paolo Bonzini Message-id: 20180307144205.20619-3-stefa...@redhat.com Signed-off-by: Stefan Hajnoczi ...so I think xen-block has exactly the same problem. I think we may also be missing a qemu_bh_cancel() to make sure block aio completions are stopped. I'll prep a patch. Paul > > Paul > > > Cheers, > > > > -- > > Julien Grall > ___ > Xen-devel mailing list > xen-de...@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel
RE: xen-block: race condition when stopping the device (WAS: Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions - FAIL)
> -Original Message- [snip] > >> > >> This feels like a race condition between the init/free code with > >> handler. Anthony, does it ring any bell? > >> > > > > From that stack bt it looks like an iothread managed to run after the > sring was NULLed. This should not be able happen as the dataplane should > have been moved back onto QEMU's main thread context before the ring is > unmapped. > > My knowledge of this code is fairly limited, so correct me if I am wrong. > > blk_set_aio_context() would set the context for the block aio. AFAICT, > the only aio for the block is xen_block_complete_aio(). Not quite. xen_block_dataplane_start() calls xen_device_bind_event_channel() and that will add an event channel fd into the aio context, so the shared ring is polled by the iothread as well as block i/o completion. > > In the stack above, we are not dealing with a block aio but an aio tie > to the event channel (see the call from xen_device_poll). So I don't > think the blk_set_aio_context() would affect the aio. > For the reason I outline above, it does. > So it would be possible to get the iothread running because we received > a notification on the event channel while we are stopping the block (i.e > xen_block_dataplane_stop()). > We should assume an iothread can essentially run at any time, as it is a polling entity. It should eventually block polling on fds assign to its aio context but I don't think the abstraction guarantees that it cannot be awoken for other reasons (e.g. off a timeout). However and event from the frontend will certainly cause the evtchn fd poll to wake up. > If xen_block_dataplane_stop() grab the context lock first, then the > iothread dealing with the event may wait on the lock until its released. > > By the time the lock is grabbed, we may have free all the resources > (including srings). So the event iothread will end up to dereference a > NULL pointer. > I think the problem may actually be that xen_block_dataplane_event() does not acquire the context and thus is not synchronized with xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt is not clear whether a poll handler called by an iothread needs to acquire the context though; TBH I would not have thought it necessary. > It feels to me we need a way to quiesce all the iothreads (blk, > event,...) before continuing. But I am a bit unsure how to do this in > QEMU. > Looking at virtio-blk.c I see that it does seem to close off its evtchn equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder whether the 'right' thing to do is to call xen_device_unbind_event_channel() using the same mechanism to ensure xen_block_dataplane_event() can't race. Paul > Cheers, > > -- > Julien Grall
RE: [Xen-devel] [PATCH-for-5.0 v3 5/6] hw/pci-host/i440fx: Extract the IGD passthrough host bridge device
> -Original Message- > From: Xen-devel On Behalf Of > Philippe Mathieu-Daudé > Sent: 09 December 2019 09:50 > To: qemu-devel@nongnu.org > Cc: Thomas Huth ; Stefano Stabellini > ; Michael S. Tsirkin ; Paul > Durrant ; Markus Armbruster ; Alex > Williamson ; Marcel Apfelbaum > ; Paolo Bonzini ; Anthony > Perard ; xen-de...@lists.xenproject.org; > Philippe Mathieu-Daudé > Subject: [Xen-devel] [PATCH-for-5.0 v3 5/6] hw/pci-host/i440fx: Extract > the IGD passthrough host bridge device > > We can use a i440FX without the IGD passthrough host bridge. > Extract it into a new file, 'hw/pci-host/igd_pt.c'. > > Signed-off-by: Philippe Mathieu-Daudé Acked-by: Paul Durrant > --- > v3: > - Rename as 'xen_igd_pt.c' (Alex Williamson) > - Add an entry in MAINTAINERS::Xen > --- > hw/pci-host/i440fx.c | 84 -- > hw/pci-host/xen_igd_pt.c | 120 ++ > MAINTAINERS | 1 + > hw/pci-host/Makefile.objs | 1 + > 4 files changed, 122 insertions(+), 84 deletions(-) > create mode 100644 hw/pci-host/xen_igd_pt.c > > diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c > index 414138595b..bae7b42327 100644 > --- a/hw/pci-host/i440fx.c > +++ b/hw/pci-host/i440fx.c > @@ -368,89 +368,6 @@ static const TypeInfo i440fx_info = { > }, > }; > > -/* IGD Passthrough Host Bridge. */ > -typedef struct { > -uint8_t offset; > -uint8_t len; > -} IGDHostInfo; > - > -/* Here we just expose minimal host bridge offset subset. */ > -static const IGDHostInfo igd_host_bridge_infos[] = { > -{PCI_REVISION_ID, 2}, > -{PCI_SUBSYSTEM_VENDOR_ID, 2}, > -{PCI_SUBSYSTEM_ID,2}, > -{0x50,2}, /* SNB: processor graphics control > register */ > -{0x52,2}, /* processor graphics control register > */ > -{0xa4,4}, /* SNB: graphics base of stolen memory > */ > -{0xa8,4}, /* SNB: base of GTT stolen memory */ > -}; > - > -static void host_pci_config_read(int pos, int len, uint32_t *val, Error > **errp) > -{ > -int rc, config_fd; > -/* Access real host bridge. */ > -char *path = > g_strdup_printf("/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s", > - 0, 0, 0, 0, "config"); > - > -config_fd = open(path, O_RDWR); > -if (config_fd < 0) { > -error_setg_errno(errp, errno, "Failed to open: %s", path); > -goto out; > -} > - > -if (lseek(config_fd, pos, SEEK_SET) != pos) { > -error_setg_errno(errp, errno, "Failed to seek: %s", path); > -goto out_close_fd; > -} > - > -do { > -rc = read(config_fd, (uint8_t *)val, len); > -} while (rc < 0 && (errno == EINTR || errno == EAGAIN)); > -if (rc != len) { > -error_setg_errno(errp, errno, "Failed to read: %s", path); > -} > - > -out_close_fd: > -close(config_fd); > -out: > -g_free(path); > -} > - > -static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp) > -{ > -uint32_t val = 0; > -size_t i; > -int pos, len; > -Error *local_err = NULL; > - > -for (i = 0; i < ARRAY_SIZE(igd_host_bridge_infos); i++) { > -pos = igd_host_bridge_infos[i].offset; > -len = igd_host_bridge_infos[i].len; > -host_pci_config_read(pos, len, , _err); > -if (local_err) { > -error_propagate(errp, local_err); > -return; > -} > -pci_default_write_config(pci_dev, pos, val, len); > -} > -} > - > -static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void > *data) > -{ > -DeviceClass *dc = DEVICE_CLASS(klass); > -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); > - > -k->realize = igd_pt_i440fx_realize; > -dc->desc = "IGD Passthrough Host bridge"; > -} > - > -static const TypeInfo igd_passthrough_i440fx_info = { > -.name = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE, > -.parent= TYPE_I440FX_PCI_DEVICE, > -.instance_size = sizeof(PCII440FXState), > -.class_init= igd_passthrough_i440fx_class_init, > -}; > - > static const char *i440fx_pcihost_root_bus_path(PCIHostState > *host_bridge, > PCIBus *rootbus) > { > @@ -495,7 +412,6 @@ static const TypeInfo i440fx_pcihost_info = { > static void i440fx_register_types(void) > { > type_register_static(_info); > -type_register_static(_passthrough_i440fx_info); > type_register_static(_pcihost_info); > } > > diff --git a/hw/pci-host/xen_igd_pt.c b/hw/pci-host/xen_igd_pt.c > new file mode 100644 > index 00..efcc9347ff > --- /dev/null > +++ b/hw/pci-host/xen_igd_pt.c > @@ -0,0 +1,120 @@ > +/* > + * QEMU Intel IGD Passthrough Host Bridge Emulation > + * > + * Copyright (c) 2006 Fabrice Bellard > + * > + * SPDX-License-Identifier: MIT > + * > + * Permission is hereby granted, free of charge, to any person obtaining > a copy > + * of this software
RE: [Xen-devel] [PATCH-for-5.0 v3 6/6] hw/pci-host: Add Kconfig entry to select the IGD Passthrough Host Bridge
> -Original Message- > From: Xen-devel On Behalf Of > Philippe Mathieu-Daudé > Sent: 09 December 2019 09:50 > To: qemu-devel@nongnu.org > Cc: Thomas Huth ; Stefano Stabellini > ; Michael S. Tsirkin ; Paul > Durrant ; Markus Armbruster ; Alex > Williamson ; Marcel Apfelbaum > ; Paolo Bonzini ; Anthony > Perard ; xen-de...@lists.xenproject.org; > Philippe Mathieu-Daudé > Subject: [Xen-devel] [PATCH-for-5.0 v3 6/6] hw/pci-host: Add Kconfig entry > to select the IGD Passthrough Host Bridge > > Add the XEN_IGD_PASSTHROUGH Kconfig option. > > Xen build has that option selected by default. Non-Xen builds now > have to select this feature manually. > > Signed-off-by: Philippe Mathieu-Daudé > --- > v3: Only default with Xen (Alex Williamson) > > I did not used 'depends on XEN' as suggested by Alex but > 'default y if XEN', so one can build XEN without this feature > (for example, on other ARCH than X86). Allowing it to be compiled out for Xen builds is quite reasonable IMO. I don't believe it is widely used. Acked-by: Paul Durrant > --- > hw/pci-host/Kconfig | 5 + > hw/pci-host/Makefile.objs | 2 +- > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig > index b0aa8351c4..24ba8ea046 100644 > --- a/hw/pci-host/Kconfig > +++ b/hw/pci-host/Kconfig > @@ -1,6 +1,11 @@ > config PAM > bool > > +config XEN_IGD_PASSTHROUGH > +bool > +default y if XEN > +select PCI_I440FX > + > config PREP_PCI > bool > select PCI > diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs > index fa6d1556c0..9c466fab01 100644 > --- a/hw/pci-host/Makefile.objs > +++ b/hw/pci-host/Makefile.objs > @@ -14,7 +14,7 @@ common-obj-$(CONFIG_VERSATILE_PCI) += versatile.o > common-obj-$(CONFIG_PCI_SABRE) += sabre.o > common-obj-$(CONFIG_FULONG) += bonito.o > common-obj-$(CONFIG_PCI_I440FX) += i440fx.o > -common-obj-$(CONFIG_PCI_I440FX) += xen_igd_pt.o > +common-obj-$(CONFIG_XEN_IGD_PASSTHROUGH) += xen_igd_pt.o > common-obj-$(CONFIG_PCI_EXPRESS_Q35) += q35.o > common-obj-$(CONFIG_PCI_EXPRESS_GENERIC_BRIDGE) += gpex.o > common-obj-$(CONFIG_PCI_EXPRESS_XILINX) += xilinx-pcie.o > -- > 2.21.0 > > > ___ > Xen-devel mailing list > xen-de...@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel