Re: [PATCH] MAINTAINERS: Update my email address

2024-04-29 Thread Durrant, Paul

On 29/04/2024 16:49, Anthony PERARD wrote:

From: Anthony PERARD 

Signed-off-by: Anthony PERARD 
---
  MAINTAINERS | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Acked-by: Paul Durrant 




Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs

2024-04-06 Thread Durrant, Paul

On 04/04/2024 15:08, Ross Lagerwall wrote:

A malicious or buggy guest may generated buffered ioreqs faster than
QEMU can process them in handle_buffered_iopage(). The result is a
livelock - QEMU continuously processes ioreqs on the main thread without
iterating through the main loop which prevents handling other events,
processing timers, etc. Without QEMU handling other events, it often
results in the guest becoming unsable and makes it difficult to stop the
source of buffered ioreqs.

To avoid this, if we process a full page of buffered ioreqs, stop and
reschedule an immediate timer to continue processing them. This lets
QEMU go back to the main loop and catch up.



Do PV backends potentially cause the same scheduling issue (if not using 
io threads)?



Signed-off-by: Ross Lagerwall 
---
  hw/xen/xen-hvm-common.c | 26 +-
  1 file changed, 17 insertions(+), 9 deletions(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 13/28] hw/xen: automatically assign device index to block devices

2023-10-27 Thread Durrant, Paul

On 27/10/2023 11:25, David Woodhouse wrote:

On Fri, 2023-10-27 at 10:01 +0100, Durrant, Paul wrote:


This code is allocating a name automatically so I think the onus is on
it not create a needless clash which is likely to have unpredictable
results depending on what the guest is. Just avoid any aliasing in the
first place and things will be fine.



Yeah, fair enough. In which case I'll probably switch to using
xs_directory() and then processing those results to find a free slot,
instead of going out to XenStore for every existence check.

This isn't exactly fast path and I'm prepared to tolerate a little bit
of O(n²), but only within reason :)


Yes, doing an xs_directory() and then using the code 
xen_block_get_vdev() to populate a list of existent disks will be neater.





Re: [PATCH v3 27/28] hw/xen: use qemu_create_nic_bus_devices() to instantiate Xen NICs

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

When instantiating XenBus itself, for each NIC which is configured with
either the model unspecified, or set to to "xen" or "xen-net-device",
create a corresponding xen-net-device for it.

Now we can launch emulated Xen guests with '-nic user', and this fixes
the setup for Xen PV guests, which was previously broken in various
ways and never actually managed to peer with the netdev.

Signed-off-by: David Woodhouse 
---
  hw/xen/xen-bus.c|  4 
  hw/xen/xen_devconfig.c  | 25 -
  hw/xenpv/xen_machine_pv.c   |  9 -
  include/hw/xen/xen-legacy-backend.h |  1 -
  4 files changed, 4 insertions(+), 35 deletions(-)



Yay! I've been wanting this for years but ETIME.

Reviewed-by: Paul Durrant 





Re: [PATCH v3 26/28] hw/i386/pc: use qemu_get_nic_info() and pci_init_nic_devices()

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

Eliminate direct access to nd_table[] and nb_nics by processing the the
ISA NICs first and then calling pci_init_nic_devices() for the test.

It's important to do this *before* the subsequent patch which registers
the Xen PV network devices, because the code being remove here didn't
check whether nd->instantiated was already set before using each entry.

Signed-off-by: David Woodhouse 
---
  hw/i386/pc.c| 21 +++--
  include/hw/net/ne2000-isa.h |  2 --
  2 files changed, 11 insertions(+), 12 deletions(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 25/28] hw/pci: add pci_init_nic_devices(), pci_init_nic_in_slot()

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

The loop over nd_table[] to add PCI NICs is repeated in quite a few
places. Add a helper function to do it.

Some platforms also try to instantiate a specific model in a specific
slot, to match the real hardware. Add pci_init_nic_in_slot() for that
purpose.

Signed-off-by: David Woodhouse 
---
  hw/pci/pci.c | 45 
  include/hw/pci/pci.h |  4 +++-
  2 files changed, 48 insertions(+), 1 deletion(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 24/28] net: add qemu_create_nic_bus_devices()

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

This will instantiate any NICs which live on a given bus type. Each bus
is allowed *one* substitution (for PCI it's virtio → virtio-net-pci, for
Xen it's xen → xen-net-device; no point in overengineering it unless we
actually want more).

Signed-off-by: David Woodhouse 
---
  include/net/net.h |  3 +++
  net/net.c | 53 +++
  2 files changed, 56 insertions(+)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 23/28] net: report list of available models according to platform

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

By noting the models for which a configuration was requested, we can give
the user an accurate list of which NIC models were actually available on
the platform/configuration that was otherwise chosen.

Signed-off-by: David Woodhouse 
---
  net/net.c | 94 +++
  1 file changed, 94 insertions(+)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 22/28] net: add qemu_{configure,create}_nic_device(), qemu_find_nic_info()

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

Most code which directly accesses nd_table[] and nb_nics uses them for
one of two things. Either "I have created a NIC device and I'd like a
configuration for it", or "I will create a NIC device *if* there is a
configuration for it".  With some variants on the theme around whether
they actually *check* if the model specified in the configuration is
the right one.

Provide functions which perform both of those, allowing platforms to
be a little more consistent and as a step towards making nd_table[]
and nb_nics private to the net code.

Also export the qemu_find_nic_info() helper, as some platforms have
special cases they need to handle.

Signed-off-by: David Woodhouse 
---
  include/net/net.h |  7 ++-
  net/net.c | 51 +++
  2 files changed, 57 insertions(+), 1 deletion(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 21/28] xen-platform: unplug AHCI disks

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

To support Xen guests using the Q35 chipset, the unplug protocol needs
to also remove AHCI disks.

Make pci_xen_ide_unplug() more generic, iterating over the children
of the PCI device and destroying the "ide-hd" devices. That works the
same for both AHCI and IDE, as does the detection of the primary disk
as unit 0 on the bus named "ide.0".

Then pci_xen_ide_unplug() can be used for both AHCI and IDE devices.

Signed-off-by: David Woodhouse 
---
  hw/i386/xen/xen_platform.c | 68 +-
  1 file changed, 45 insertions(+), 23 deletions(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 13/28] hw/xen: automatically assign device index to block devices

2023-10-27 Thread Durrant, Paul

On 27/10/2023 09:45, David Woodhouse wrote:

On Fri, 2023-10-27 at 08:30 +0100, Durrant, Paul wrote:



+    if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
+    XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+    char fe_path[XENSTORE_ABS_PATH_MAX + 1];
+    char *value;
+    int disk = 0;
+    unsigned long idx;
+
+    /* Find an unoccupied device name */


Not sure this is going to work is it? What happens if 'hda' or 'sda', or
'd0' exists? I think you need to use the core of the code in
xen_block_set_vdev() to generate names and search all possible encodings
for each disk.


Do we care? You're allowed to have *all* of "hda", "sda" and "xvda" at
the same time. If a user explicitly provides "sda" and then provides
another disk without giving it a name, we're allowed to use "xvda".



Maybe sda and xvda can co-exist, but

https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/man/xen-vbd-interface.7.pandoc;h=ba0d159dfa7eaf359922583ccd6d2b413acddb13;hb=HEAD#l125

says that you'll likely run into trouble if hda exists and you happen to 
create xvda.



Hell, you can also have *separate* backing stores provided as "hda1",
"sda1" and "xvda1". I *might* have tolerated a heckle that this
function should check for at least the latter of those, but when I was
first coding it up I was more inclined to argue "Don't Do That Then".


This code is allocating a name automatically so I think the onus is on 
it not create a needless clash which is likely to have unpredictable 
results depending on what the guest is. Just avoid any aliasing in the 
first place and things will be fine.


  Paul




Re: [PATCH v3 20/28] net: do not delete nics in net_cleanup()

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

In net_cleanup() we only need to delete the netdevs, as those may have
state which outlives Qemu when it exits, and thus may actually need to
be cleaned up on exit.

The nics, on the other hand, are owned by the device which created them.
Most devices don't bother to clean up on exit because they don't have
any state which will outlive Qemu... but XenBus devices do need to clean
up their nodes in XenStore, and do have an exit handler to delete them.

When the XenBus exit handler destroys the xen-net-device, it attempts
to delete its nic after net_cleanup() had already done so. And crashes.

Fix this by only deleting netdevs as we walk the list. As the comment
notes, we can't use QTAILQ_FOREACH_SAFE() as each deletion may remove
*multiple* entries, including the "safely" saved 'next' pointer. But
we can store the *previous* entry, since nics are safe.

Signed-off-by: David Woodhouse 
---
  net/net.c | 28 ++--
  1 file changed, 22 insertions(+), 6 deletions(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 19/28] hw/xen: update Xen PV NIC to XenDevice model

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

This allows us to use Xen PV networking with emulated Xen guests, and to
add them on the command line or hotplug.

Signed-off-by: David Woodhouse 
---
  hw/net/meson.build|   2 +-
  hw/net/trace-events   |  11 +
  hw/net/xen_nic.c  | 484 +-
  hw/xenpv/xen_machine_pv.c |   1 -
  4 files changed, 381 insertions(+), 117 deletions(-)

diff --git a/hw/net/meson.build b/hw/net/meson.build
index 2632634df3..f64651c467 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -1,5 +1,5 @@
  system_ss.add(when: 'CONFIG_DP8393X', if_true: files('dp8393x.c'))
-system_ss.add(when: 'CONFIG_XEN', if_true: files('xen_nic.c'))
+system_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen_nic.c'))
  system_ss.add(when: 'CONFIG_NE2000_COMMON', if_true: files('ne2000.c'))
  
  # PCI network cards

diff --git a/hw/net/trace-events b/hw/net/trace-events
index 3abfd65e5b..3097742cc0 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -482,3 +482,14 @@ dp8393x_receive_oversize(int size) "oversize packet, pkt_size 
is %d"
  dp8393x_receive_not_netcard(void) "packet not for netcard"
  dp8393x_receive_packet(int crba) "Receive packet at 0x%"PRIx32
  dp8393x_receive_write_status(int crba) "Write status at 0x%"PRIx32
+
+# xen_nic.c
+xen_netdev_realize(int dev, const char *info, const char *peer) "vif%u info '%s' 
peer '%s'"
+xen_netdev_unrealize(int dev) "vif%u"
+xen_netdev_create(int dev) "vif%u"
+xen_netdev_destroy(int dev) "vif%u"
+xen_netdev_disconnect(int dev) "vif%u"
+xen_netdev_connect(int dev, unsigned int tx, unsigned int rx, int port) "vif%u tx 
%u rx %u port %u"
+xen_netdev_frontend_changed(const char *dev, int state) "vif%s state %d"
+xen_netdev_tx(int dev, int ref, int off, int len, unsigned int flags, const char *c, 
const char *d, const char *m, const char *e) "vif%u ref %u off %u len %u flags 
0x%x%s%s%s%s"
+xen_netdev_rx(int dev, int idx, int status, int flags) "vif%u idx %d status %d 
flags 0x%x"
diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 9bbf6599fc..af4ba3f1e6 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -20,6 +20,13 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+#include "qemu/log.h"
+#include "qemu/qemu-print.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+
  #include 
  #include 
  #include 
@@ -27,18 +34,26 @@
  #include "net/net.h"
  #include "net/checksum.h"
  #include "net/util.h"
-#include "hw/xen/xen-legacy-backend.h"
+
+#include "hw/xen/xen-backend.h"
+#include "hw/xen/xen-bus-helper.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
  
  #include "hw/xen/interface/io/netif.h"

+#include "hw/xen/interface/io/xs_wire.h"
+
+#include "trace.h"
  
  /* - */
  
  struct XenNetDev {

-struct XenLegacyDevice  xendev;  /* must be first */
-char  *mac;
+struct XenDevice  xendev;  /* must be first */
+XenEventChannel   *event_channel;
+int   dev;
  int   tx_work;
-int   tx_ring_ref;
-int   rx_ring_ref;
+unsigned int  tx_ring_ref;
+unsigned int  rx_ring_ref;
  struct netif_tx_sring *txs;
  struct netif_rx_sring *rxs;
  netif_tx_back_ring_t  tx_ring;
@@ -47,6 +62,11 @@ struct XenNetDev {
  NICState  *nic;
  };
  
+typedef struct XenNetDev XenNetDev;

+
+#define TYPE_XEN_NET_DEVICE "xen-net-device"
+OBJECT_DECLARE_SIMPLE_TYPE(XenNetDev, XEN_NET_DEVICE)
+
  /* - */
  
  static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, int8_t st)

@@ -68,7 +88,8 @@ static void net_tx_response(struct XenNetDev *netdev, 
netif_tx_request_t *txp, i
  netdev->tx_ring.rsp_prod_pvt = ++i;
  RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>tx_ring, notify);
  if (notify) {
-xen_pv_send_notify(>xendev);
+xen_device_notify_event_channel(XEN_DEVICE(netdev),
+netdev->event_channel, NULL);
  }
  
  if (i == netdev->tx_ring.req_cons) {

@@ -104,13 +125,16 @@ static void net_tx_error(struct XenNetDev *netdev, 
netif_tx_request_t *txp, RING
  #endif
  }
  
-static void net_tx_packets(struct XenNetDev *netdev)

+static bool net_tx_packets(struct XenNetDev *netdev)
  {
+bool done_something = false;
  netif_tx_request_t txreq;
  RING_IDX rc, rp;
  void *page;
  void *tmpbuf = NULL;
  
+assert(qemu_mutex_iothread_locked());

+
  for (;;) {
  rc = netdev->tx_ring.req_cons;
  rp = netdev->tx_ring.sring->req_prod;
@@ -122,49 +146,52 @@ static void net_tx_packets(struct XenNetDev *netdev)
  }
  memcpy(, RING_GET_REQUEST(>tx_ring, rc), 
sizeof(txreq));
  

Re: [PATCH v3 18/28] hw/xen: only remove peers of PCI NICs on unplug

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

When the Xen guest asks to unplug *emulated* NICs, it's kind of unhelpful
also to unplug the peer of the *Xen* PV NIC.

Signed-off-by: David Woodhouse 
---
  hw/i386/xen/xen_platform.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)



Reviewed-by: Paul Durrant 



Re: [PATCH v3 17/28] hw/xen: add support for Xen primary console in emulated mode

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

The primary console is special because the toolstack maps a page into
the guest for its ring, and also allocates the guest-side event channel.
The guest's grant table is even primed to export that page using a known
grant ref#. Add support for all that in emulated mode, so that we can
have a primary console.

For reasons unclear, the backends running under real Xen don't just use
a mapping of the well-known GNTTAB_RESERVED_CONSOLE grant ref (which
would also be in the ring-ref node in XenStore). Instead, the toolstack
sets the ring-ref node of the primary console to the GFN of the guest
page. The backend is expected to handle that special case and map it
with foreignmem operations instead.

We don't have an implementation of foreignmem ops for emulated Xen mode,
so just make it map GNTTAB_RESERVED_CONSOLE instead. This would probably
work for real Xen too, but we can't work out how to make real Xen create
a primary console of type "ioemu" to make QEMU drive it, so we can't
test that; might as well leave it as it is for now under Xen.

Now at last we can boot the Xen PV shim and run PV kernels in QEMU.

Signed-off-by: David Woodhouse 
---
  hw/char/xen_console.c |  78 
  hw/i386/kvm/meson.build   |   1 +
  hw/i386/kvm/trace-events  |   2 +
  hw/i386/kvm/xen-stubs.c   |   8 ++
  hw/i386/kvm/xen_gnttab.c  |   7 +-
  hw/i386/kvm/xen_primary_console.c | 193 ++
  hw/i386/kvm/xen_primary_console.h |  23 
  hw/i386/kvm/xen_xenstore.c|  10 ++
  hw/xen/xen-bus.c  |   5 +
  include/hw/xen/xen-bus.h  |   1 +
  target/i386/kvm/xen-emu.c |  23 +++-
  11 files changed, 328 insertions(+), 23 deletions(-)
  create mode 100644 hw/i386/kvm/xen_primary_console.c
  create mode 100644 hw/i386/kvm/xen_primary_console.h



Reviewed-by: Paul Durrant 





Re: [PATCH v3 14/28] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

The primary Xen console is special. The guest's side is set up for it by
the toolstack automatically and not by the standard PV init sequence.

Accordingly, its *frontend* doesn't appear in …/device/console/0 either;
instead it appears under …/console in the guest's XenStore node.

To allow the Xen console driver to override the frontend path for the
primary console, add a method to the XenDeviceClass which can be used
instead of the standard xen_device_get_frontend_path()

Signed-off-by: David Woodhouse 
---
  hw/xen/xen-bus.c | 11 ++-
  include/hw/xen/xen-bus.h |  2 ++
  2 files changed, 12 insertions(+), 1 deletion(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 13/28] hw/xen: automatically assign device index to block devices

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

There's no need to force the user to assign a vdev. We can automatically
assign one, starting at xvda and searching until we find the first disk
name that's unused.

This means we can now allow '-drive if=xen,file=xxx' to work without an
explicit separate -driver argument, just like if=virtio.

Rip out the legacy handling from the xenpv machine, which was scribbling
over any disks configured by the toolstack, and didn't work with anything
but raw images.

Signed-off-by: David Woodhouse 
Acked-by: Kevin Wolf 
---
  blockdev.c  | 15 +---
  hw/block/xen-block.c| 38 +
  hw/xen/xen_devconfig.c  | 28 -
  hw/xenpv/xen_machine_pv.c   |  9 ---
  include/hw/xen/xen-legacy-backend.h |  1 -
  5 files changed, 50 insertions(+), 41 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index a01c62596b..5d9f2e5395 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -255,13 +255,13 @@ void drive_check_orphaned(void)
   * Ignore default drives, because we create certain default
   * drives unconditionally, then leave them unclaimed.  Not the
   * users fault.
- * Ignore IF_VIRTIO, because it gets desugared into -device,
- * so we can leave failing to -device.
+ * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into
+ * -device, so we can leave failing to -device.
   * Ignore IF_NONE, because leaving unclaimed IF_NONE remains
   * available for device_add is a feature.
   */
  if (dinfo->is_default || dinfo->type == IF_VIRTIO
-|| dinfo->type == IF_NONE) {
+|| dinfo->type == IF_XEN || dinfo->type == IF_NONE) {
  continue;
  }
  if (!blk_get_attached_dev(blk)) {
@@ -977,6 +977,15 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type,
  qemu_opt_set(devopts, "driver", "virtio-blk", _abort);
  qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
   _abort);
+} else if (type == IF_XEN) {
+QemuOpts *devopts;
+devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
+   _abort);
+qemu_opt_set(devopts, "driver",
+ (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk",
+ _abort);
+qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
+ _abort);
  }
  
  filename = qemu_opt_get(legacy_opts, "file");

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 64470fc579..5011fe9430 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -27,6 +27,7 @@
  #include "sysemu/block-backend.h"
  #include "sysemu/iothread.h"
  #include "dataplane/xen-block.h"
+#include "hw/xen/interface/io/xs_wire.h"
  #include "trace.h"
  
  static char *xen_block_get_name(XenDevice *xendev, Error **errp)

@@ -34,6 +35,43 @@ static char *xen_block_get_name(XenDevice *xendev, Error 
**errp)
  XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
  XenBlockVdev *vdev = >props.vdev;
  
+if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {

+XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+char fe_path[XENSTORE_ABS_PATH_MAX + 1];
+char *value;
+int disk = 0;
+unsigned long idx;
+
+/* Find an unoccupied device name */


Not sure this is going to work is it? What happens if 'hda' or 'sda', or 
'd0' exists? I think you need to use the core of the code in 
xen_block_set_vdev() to generate names and search all possible encodings 
for each disk.


  Paul


+while (disk < (1 << 20)) {
+if (disk < (1 << 4)) {
+idx = (202 << 8) | (disk << 4);
+} else {
+idx = (1 << 28) | (disk << 8);
+}
+snprintf(fe_path, sizeof(fe_path),
+ "/local/domain/%u/device/vbd/%lu",
+ xendev->frontend_id, idx);
+value = qemu_xen_xs_read(xenbus->xsh, XBT_NULL, fe_path, NULL);
+if (!value) {
+if (errno == ENOENT) {
+vdev->type = XEN_BLOCK_VDEV_TYPE_XVD;
+vdev->partition = 0;
+vdev->disk = disk;
+vdev->number = idx;
+goto found;
+}
+error_setg(errp, "cannot read %s: %s", fe_path,
+   strerror(errno));
+return NULL;
+}
+free(value);
+disk++;
+}
+error_setg(errp, "cannot find device vdev for block device");
+return NULL;
+}
+ found:
  return g_strdup_printf("%lu", vdev->number);
  }
  
diff --git a/hw/xen/xen_devconfig.c b/hw/xen/xen_devconfig.c

index 

Re: [PATCH v3 08/28] i386/xen: Ignore VCPU_SSHOTTMR_future flag in set_singleshot_timer()

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

Upstream Xen now ignores this flag¹, since the only guest kernel ever to
use it was buggy.

¹ https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=19c6cbd909

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
  target/i386/kvm/xen-emu.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 07/28] hw/xen: use correct default protocol for xen-block on x86

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

Even on x86_64 the default protocol is the x86-32 one if the guest doesn't
specifically ask for x86-64.

Fixes: b6af8926fb85 ("xen: add implementations of xen-block connect and disconnect 
functions...")
Signed-off-by: David Woodhouse 
---
  hw/block/xen-block.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 06/28] hw/xen: take iothread mutex in xen_evtchn_reset_op()

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

The xen_evtchn_soft_reset() function requires the iothread mutex, but is
also called for the EVTCHNOP_reset hypercall. Ensure the mutex is taken
in that case.

Fixes: a15b10978fe6 ("hw/xen: Implement EVTCHNOP_reset")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_evtchn.c | 1 +
  1 file changed, 1 insertion(+)



Reviewed-by: Paul Durrant 




Re: [PATCH v3 05/28] hw/xen: fix XenStore watch delivery to guest

2023-10-27 Thread Durrant, Paul

On 25/10/2023 15:50, David Woodhouse wrote:

From: David Woodhouse 

When fire_watch_cb() found the response buffer empty, it would call
deliver_watch() to generate the XS_WATCH_EVENT message in the response
buffer and send an event channel notification to the guest… without
actually *copying* the response buffer into the ring. So there was
nothing for the guest to see. The pending response didn't actually get
processed into the ring until the guest next triggered some activity
from its side.

Add the missing call to put_rsp().

It might have been slightly nicer to call xen_xenstore_event() here,
which would *almost* have worked. Except for the fact that it calls
xen_be_evtchn_pending() to check that it really does have an event
pending (and clear the eventfd for next time). And under Xen it's
defined that setting that fd to O_NONBLOCK isn't guaranteed to work,
so the emu implementation follows suit.

This fixes Xen device hot-unplug.

Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation 
stubs")
Signed-off-by: David Woodhouse 
---
  hw/i386/kvm/xen_xenstore.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)



Reviewed-by: Paul Durrant 




RE: aio_set_event_notifier(is_external=true) in Xen code?

2023-03-28 Thread Durrant, Paul
> -Original Message-
> From: Stefan Hajnoczi 
> Sent: 28 March 2023 16:51
> To: Woodhouse, David ; Durrant, Paul
> 
> Cc: qemu-devel@nongnu.org; qemu-bl...@nongnu.org
> Subject: [EXTERNAL] aio_set_event_notifier(is_external=true) in Xen code?
> 
> Hi,
> I'm removing the aio_disable_external() API from QEMU and noticed that
> Xen code calls aio_set_fd_handler(is_external=true) in hw/xen/xen-bus.c
> and hw/i386/kvm/xen_xenstore.c.
> 
> It wasn't clear to me whether is_external=true is necessary here.
> is_external=true is mainly used to temporarily pause I/O submission in
> the QEMU block layer. Maybe is_external=true was chosen out of caution
> but actually has no effect in this code.
> 
> Does the Xen code rely on is_external=true?


That's a good question. The call in xen-bus.c has been there since commit 
83361a8a1f932, which was when it substituted the old call to 
qemu_set_fd_handler(). I suspect this was out of caution (or possibly 
misunderstanding) at the time, although setting the call to 
xen_device_set_event_channel_context() in xen_block_dataplane_stop() does 
suggest it may be happening while I/O could be in progress so it could have 
been in response to problems caught in testing.
I suspect the code in xen_xenstore.c just copied what xen-bus.c did.

Sorry I can't give you a definitive answer... it's all rather a long time ago.

  Cheers,

Paul



RE: [PATCH v2 28/32] contrib/gitdm: add Amazon to the domain map

2023-03-15 Thread Durrant, Paul
> -Original Message-
> From: Alex Bennée 
> Sent: 15 March 2023 17:43
> To: qemu-devel@nongnu.org
> Cc: Akihiko Odaki ; Marc-André Lureau
> ; qemu-ri...@nongnu.org; Riku Voipio
> ; Igor Mammedov ; Xiao Guangrong
> ; Thomas Huth ; Wainer dos
> Santos Moschetta ; Dr. David Alan Gilbert
> ; Alex Williamson ; Hao
> Wu ; Cleber Rosa ; Daniel Henrique
> Barboza ; Jan Kiszka ; Aurelien
> Jarno ; qemu-...@nongnu.org; Marcelo Tosatti
> ; Eduardo Habkost ; Alexandre
> Iooss ; Gerd Hoffmann ; Palmer
> Dabbelt ; Ilya Leoshkevich ; qemu-
> p...@nongnu.org; Juan Quintela ; Cédric Le Goater
> ; Darren Kenny ;
> k...@vger.kernel.org; Marcel Apfelbaum ; Peter
> Maydell ; Richard Henderson
> ; Stafford Horne ; Weiwei
> Li ; Sunil V L ; Stefan
> Hajnoczi ; Thomas Huth ; Vijai
> Kumar K ; Liu Zhiwei
> ; David Gibson
> ; Song Gao ; Paolo
> Bonzini ; Michael S. Tsirkin ; Niek
> Linnenbank ; Greg Kurz ; Laurent
> Vivier ; Qiuhao Li ; Philippe
> Mathieu-Daudé ; Xiaojuan Yang
> ; Mahmoud Mandour ;
> Alexander Bulekov ; Jiaxun Yang ;
> qemu-bl...@nongnu.org; Yanan Wang ; David
> Woodhouse ; qemu-s3...@nongnu.org; Strahinja Jankovic
> ; Bandan Das ; Alistair
> Francis ; Aleksandar Rikalo
> ; Tyrone Ting ; Kevin
> Wolf ; David Hildenbrand ; Beraldo
> Leal ; Beniamino Galvani ; Paul
> Durrant ; Bin Meng ; Sunil
> Muthuswamy ; Hanna Reitz ;
> Peter Xu ; Alex Bennée ; Graf
> (AWS), Alexander ; Durrant, Paul ;
> Woodhouse, David 
> Subject: [EXTERNAL] [PATCH v2 28/32] contrib/gitdm: add Amazon to the
> domain map
> 
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> 
> 
> 
> We have multiple contributors from both .co.uk and .com versions of
> the address.
> 
> Signed-off-by: Alex Bennée 
> Cc: Alexander Graf 
> Cc: Paul Durrant 
> Cc: David Wooodhouse 
> Reviewed-by: Philippe Mathieu-Daudé 
> Message-Id: <20230310180332.2274827-7-alex.ben...@linaro.org>
> ---
>  contrib/gitdm/domain-map | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map
> index 4a988c5b5f..8dce276a1c 100644
> --- a/contrib/gitdm/domain-map
> +++ b/contrib/gitdm/domain-map
> @@ -4,6 +4,8 @@
>  # This maps email domains to nice easy to read company names
>  #
> 
> +amazon.com  Amazon
> +amazon.co.ukAmazon

You might want 'amazon.de' too but as far as it goes...

Reviewed-by: Paul Durrant 

>  amd.com AMD
>  aspeedtech.com  ASPEED Technology Inc.
>  baidu.com   Baidu
> --
> 2.39.2



Re: [PATCH 2/2] hw/i386/xen/xen-hvm: Inline xen_piix_pci_write_config_client() and remove it

2022-06-27 Thread Durrant, Paul

On 26/06/2022 10:46, Bernhard Beschow wrote:

xen_piix_pci_write_config_client() is implemented in the xen sub tree and
uses PIIX constants internally, thus creating a direct dependency on
PIIX. Now that xen_set_pci_link_route() is stubbable, the logic of
xen_piix_pci_write_config_client() can be moved to PIIX which resolves
the dependency.

Signed-off-by: Bernhard Beschow 


Reviewed-by: Paul Durrant 



Re: [PATCH 1/2] hw/i386/xen/xen-hvm: Allow for stubbing xen_set_pci_link_route()

2022-06-27 Thread Durrant, Paul

On 26/06/2022 10:46, Bernhard Beschow wrote:

The only user of xen_set_pci_link_route() is
xen_piix_pci_write_config_client() which implements PIIX-specific logic in
the xen namespace. This makes xen-hvm depend on PIIX which could be
avoided if xen_piix_pci_write_config_client() was implemented in PIIX. In
order to do this, xen_set_pci_link_route() needs to be stubbable which
this patch addresses.

Signed-off-by: Bernhard Beschow 


Reviewed-by: Paul Durrant 



Re: [PATCH] block: get rid of blk->guest_block_size

2022-05-18 Thread Durrant, Paul

On 18/05/2022 14:09, Stefan Hajnoczi wrote:

Commit 1b7fd729559c ("block: rename buffer_alignment to
guest_block_size") noted:

   At this point, the field is set by the device emulation, but completely
   ignored by the block layer.

The last time the value of buffer_alignment/guest_block_size was
actually used was before commit 339064d50639 ("block: Don't use guest
sector size for qemu_blockalign()").

This value has not been used since 2013. Get rid of it.

Cc: Xie Yongji 
Signed-off-by: Stefan Hajnoczi 


Xen part...

Reviewed-by: Paul Durrant 



Re: [PATCH 3/3] include/hw/ide: Unexport pci_piix3_xen_ide_unplug()

2022-05-09 Thread Durrant, Paul

On 08/05/2022 11:34, Bernhard Beschow wrote:

This function was declared in a generic and public header, implemented
in a device-specific source file but only used in xen_platform. Given its
'aux' parameter, this function is more xen-specific than piix-specific.
Also, the hardcoded magic constants seem to be generic and related to
PCIIDEState and IDEBus rather than piix.

Therefore, move this function to xen_platform, unexport it, and drop the
"piix3" in the function name as well.

Signed-off-by: Bernhard Beschow 


Reviewed-by: Paul Durrant 

... with one suggestion...


---
  hw/i386/xen/xen_platform.c | 49 +-
  hw/ide/piix.c  | 46 ---
  include/hw/ide.h   |  3 ---
  3 files changed, 48 insertions(+), 50 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 72028449ba..124ffeae35 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -26,6 +26,7 @@
  #include "qemu/osdep.h"
  #include "qapi/error.h"
  #include "hw/ide.h"
+#include "hw/ide/pci.h"
  #include "hw/pci/pci.h"
  #include "hw/xen/xen_common.h"
  #include "migration/vmstate.h"
@@ -134,6 +135,52 @@ static void pci_unplug_nics(PCIBus *bus)
  pci_for_each_device(bus, 0, unplug_nic, NULL);
  }
  
+/*

+ * The Xen HVM unplug protocol [1] specifies a mechanism to allow guests to
+ * request unplug of 'aux' disks (which is stated to mean all IDE disks,
+ * except the primary master).
+ *
+ * NOTE: The semantics of what happens if unplug of all disks and 'aux' disks
+ *   is simultaneously requested is not clear. The implementation assumes
+ *   that an 'all' request overrides an 'aux' request.
+ *
+ * [1] 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.pandoc
+ */
+static int pci_xen_ide_unplug(DeviceState *dev, bool aux)
+{
+PCIIDEState *pci_ide;
+int i;
+IDEDevice *idedev;
+IDEBus *idebus;
+BlockBackend *blk;
+
+pci_ide = PCI_IDE(dev);
+
+for (i = aux ? 1 : 0; i < 4; i++) {
+idebus = _ide->bus[i / 2];
+blk = idebus->ifs[i % 2].blk;
+
+if (blk && idebus->ifs[i % 2].drive_kind != IDE_CD) {
+if (!(i % 2)) {
+idedev = idebus->master;
+} else {
+idedev = idebus->slave;
+}
+
+blk_drain(blk);
+blk_flush(blk);
+
+blk_detach_dev(blk, DEVICE(idedev));
+idebus->ifs[i % 2].blk = NULL;
+idedev->conf.blk = NULL;
+monitor_remove_blk(blk);
+blk_unref(blk);
+}
+}
+qdev_reset_all(dev);
+return 0;


The return value is ignored so you may as well make this a static void 
function.


  Paul


+}
+
  static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque)
  {
  uint32_t flags = *(uint32_t *)opaque;
@@ -147,7 +194,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
  
  switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {

  case PCI_CLASS_STORAGE_IDE:
-pci_piix3_xen_ide_unplug(DEVICE(d), aux);
+pci_xen_ide_unplug(DEVICE(d), aux);
  break;
  
  case PCI_CLASS_STORAGE_SCSI:

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index bc1b37512a..9a9b28078e 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -173,52 +173,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  }
  }
  
-/*

- * The Xen HVM unplug protocol [1] specifies a mechanism to allow guests to
- * request unplug of 'aux' disks (which is stated to mean all IDE disks,
- * except the primary master).
- *
- * NOTE: The semantics of what happens if unplug of all disks and 'aux' disks
- *   is simultaneously requested is not clear. The implementation assumes
- *   that an 'all' request overrides an 'aux' request.
- *
- * [1] 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.pandoc
- */
-int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux)
-{
-PCIIDEState *pci_ide;
-int i;
-IDEDevice *idedev;
-IDEBus *idebus;
-BlockBackend *blk;
-
-pci_ide = PCI_IDE(dev);
-
-for (i = aux ? 1 : 0; i < 4; i++) {
-idebus = _ide->bus[i / 2];
-blk = idebus->ifs[i % 2].blk;
-
-if (blk && idebus->ifs[i % 2].drive_kind != IDE_CD) {
-if (!(i % 2)) {
-idedev = idebus->master;
-} else {
-idedev = idebus->slave;
-}
-
-blk_drain(blk);
-blk_flush(blk);
-
-blk_detach_dev(blk, DEVICE(idedev));
-idebus->ifs[i % 2].blk = NULL;
-idedev->conf.blk = NULL;
-monitor_remove_blk(blk);
-blk_unref(blk);
-}
-}
-qdev_reset_all(dev);
-return 0;
-}
-
  static void pci_piix_ide_exitfn(PCIDevice *dev)
  {
  PCIIDEState *d = PCI_IDE(dev);
diff --git a/include/hw/ide.h b/include/hw/ide.h
index 

Re: [PATCH] xen-hvm: Allow disabling buffer_io_timer

2022-01-26 Thread Durrant, Paul

On 26/01/2022 13:43, Jason Andryuk wrote:

On Tue, Dec 14, 2021 at 8:40 AM Durrant, Paul  wrote:


On 10/12/2021 11:34, Jason Andryuk wrote:

commit f37f29d31488 "xen: slightly simplify bufioreq handling" hard
coded setting req.count = 1 during initial field setup before the main
loop.  This missed a subtlety that an early exit from the loop when
there are no ioreqs to process, would have req.count == 0 for the return
value.  handle_buffered_io() would then remove state->buffered_io_timer.
Instead handle_buffered_iopage() is basically always returning true and
handle_buffered_io() always re-setting the timer.

Restore the disabling of the timer by introducing a new handled_ioreq
boolean and use as the return value.  The named variable will more
clearly show the intent of the code.

Signed-off-by: Jason Andryuk 


Reviewed-by: Paul Durrant 


Thanks, Paul.

What is the next step for getting this into QEMU?



Anthony, can you queue this?

  Paul


To re-state more plainly, this patch fixes a bug to let QEMU go idle
for longer stretches of time.  Without it, buffer_io_timer continues
to re-arm and fire every 100ms even if there is nothing to do.

Regards,
Jason





Re: [PATCH v2] xen-mapcache: Avoid entry->lock overflow

2022-01-25 Thread Durrant, Paul

On 24/01/2022 10:44, Ross Lagerwall wrote:

In some cases, a particular mapcache entry may be mapped 256 times
causing the lock field to wrap to 0. For example, this may happen when
using emulated NVME and the guest submits a large scatter-gather write.
At this point, the entry map be remapped causing QEMU to write the wrong
data or crash (since remap is not atomic).

Avoid this overflow by increasing the lock field to a uint32_t and also
detect it and abort rather than continuing regardless.

Signed-off-by: Ross Lagerwall 


Reviewed-by: Paul Durrant 


---
Changes in v2: Change type to uint32_t since there is a hole there
anyway. The struct size remains at 48 bytes on x86_64.

  hw/i386/xen/xen-mapcache.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index bd47c3d672..f2ef977963 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -52,7 +52,7 @@ typedef struct MapCacheEntry {
  hwaddr paddr_index;
  uint8_t *vaddr_base;
  unsigned long *valid_mapping;
-uint8_t lock;
+uint32_t lock;
  #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
  uint8_t flags;
  hwaddr size;
@@ -355,6 +355,12 @@ tryagain:
  if (lock) {
  MapCacheRev *reventry = g_malloc0(sizeof(MapCacheRev));
  entry->lock++;
+if (entry->lock == 0) {
+fprintf(stderr,
+"mapcache entry lock overflow: "TARGET_FMT_plx" -> %p\n",
+entry->paddr_index, entry->vaddr_base);
+abort();
+}
  reventry->dma = dma;
  reventry->vaddr_req = mapcache->last_entry->vaddr_base + 
address_offset;
  reventry->paddr_index = mapcache->last_entry->paddr_index;





Re: [PATCH] xen-hvm: Allow disabling buffer_io_timer

2021-12-14 Thread Durrant, Paul

On 10/12/2021 11:34, Jason Andryuk wrote:

commit f37f29d31488 "xen: slightly simplify bufioreq handling" hard
coded setting req.count = 1 during initial field setup before the main
loop.  This missed a subtlety that an early exit from the loop when
there are no ioreqs to process, would have req.count == 0 for the return
value.  handle_buffered_io() would then remove state->buffered_io_timer.
Instead handle_buffered_iopage() is basically always returning true and
handle_buffered_io() always re-setting the timer.

Restore the disabling of the timer by introducing a new handled_ioreq
boolean and use as the return value.  The named variable will more
clearly show the intent of the code.

Signed-off-by: Jason Andryuk 


Reviewed-by: Paul Durrant 



RE: [Xen-devel] [PATCH v3 03/20] exec: Let qemu_ram_*() functions take a const pointer argument

2020-02-20 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Philippe Mathieu-Daudé
> Sent: 20 February 2020 13:06
> To: Peter Maydell ; qemu-devel@nongnu.org
> Cc: Fam Zheng ; Dmitry Fleytman
> ; k...@vger.kernel.org; Michael S. Tsirkin
> ; Jason Wang ; Gerd Hoffmann
> ; Edgar E. Iglesias ; Stefano
> Stabellini ; Matthew Rosato
> ; qemu-bl...@nongnu.org; David Hildenbrand
> ; Halil Pasic ; Christian
> Borntraeger ; Hervé Poussineau
> ; Marcel Apfelbaum ;
> Anthony Perard ; xen-
> de...@lists.xenproject.org; Aleksandar Rikalo  rk.com>; Richard Henderson ; Philippe Mathieu-Daudé
> ; Laurent Vivier ; Thomas Huth
> ; Eduardo Habkost ; Stefan Weil
> ; Alistair Francis ; Richard
> Henderson ; Paul Durrant ;
> Eric Auger ; qemu-s3...@nongnu.org; qemu-
> a...@nongnu.org; Cédric Le Goater ; John Snow
> ; David Gibson ; Igor
> Mitsyanko ; Cornelia Huck ;
> Michael Walle ; qemu-...@nongnu.org; Paolo Bonzini
> 
> Subject: [Xen-devel] [PATCH v3 03/20] exec: Let qemu_ram_*() functions
> take a const pointer argument
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Paul Durrant 


RE: [Xen-devel] [PATCH v3 19/20] Let cpu_[physical]_memory() calls pass a boolean 'is_write' argument

2020-02-20 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Philippe Mathieu-Daudé
> Sent: 20 February 2020 13:06
> To: Peter Maydell ; qemu-devel@nongnu.org
> Cc: Fam Zheng ; Dmitry Fleytman
> ; k...@vger.kernel.org; Michael S. Tsirkin
> ; Jason Wang ; Gerd Hoffmann
> ; Edgar E. Iglesias ; Stefano
> Stabellini ; Matthew Rosato
> ; qemu-bl...@nongnu.org; David Hildenbrand
> ; Halil Pasic ; Christian
> Borntraeger ; Hervé Poussineau
> ; Marcel Apfelbaum ;
> Anthony Perard ; xen-
> de...@lists.xenproject.org; Aleksandar Rikalo  rk.com>; Richard Henderson ; Philippe Mathieu-Daudé
> ; Laurent Vivier ; Thomas Huth
> ; Eduardo Habkost ; Stefan Weil
> ; Alistair Francis ; Richard
> Henderson ; Paul Durrant ;
> Eric Auger ; qemu-s3...@nongnu.org; qemu-
> a...@nongnu.org; Cédric Le Goater ; John Snow
> ; David Gibson ; Igor
> Mitsyanko ; Cornelia Huck ;
> Michael Walle ; qemu-...@nongnu.org; Paolo Bonzini
> 
> Subject: [Xen-devel] [PATCH v3 19/20] Let cpu_[physical]_memory() calls
> pass a boolean 'is_write' argument
> 
> Use an explicit boolean type.
> 
> This commit was produced with the included Coccinelle script
> scripts/coccinelle/exec_rw_const.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Paul Durrant 


RE: [Xen-devel] [PATCH 6/6] xen-pt: Round pci regions sizes to XEN_PAGE_SIZE

2020-01-15 Thread Durrant, Paul
> -Original Message-
> 
> > Linux PCI subsytem has an option resource_alignment that can be
> > applied to either a single device or all devices.  Booting with
> > pci=resource_aligment=4096 will align each device to a page.  Do you
> > think pciback should force resource_alignment=4096 for dom0?
>

That sounds like a good idea.
 
> Ideally Xen should keep track of the BARs position and size and refuse
> to passthrough devices that have BARs sharing a page with other
> devices BARs.
> 
> > Are
> > there other MMIO ranges to be concerned about adjacent to BARs?
> 
> IIRC you can have two BARs of different devices in the same 4K page,
> BARs are only aligned to it's size, so BARs smaller than 4K are not
> required to be page aligned.

If we had a notion of assignment groups for this, as well as devices sharing 
requester id, then Xen would not need to refuse pass-through, it would just 
require that all devices sharing the page were passed through as a unit.

  Paul



RE: [Xen-devel] xen-block: race condition when stopping the device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)

2019-12-16 Thread Durrant, Paul
> -Original Message-
> From: Durrant, Paul 
> Sent: 16 December 2019 09:50
> To: Durrant, Paul ; Julien Grall ;
> Ian Jackson 
> Cc: Jürgen Groß ; Stefano Stabellini
> ; qemu-devel@nongnu.org; osstest service owner
> ; Anthony Perard
> ; xen-de...@lists.xenproject.org
> Subject: RE: [Xen-devel] xen-block: race condition when stopping the
> device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> 
> > -Original Message-----
> > From: Xen-devel  On Behalf Of
> > Durrant, Paul
> > Sent: 16 December 2019 09:34
> > To: Julien Grall ; Ian Jackson 
> > Cc: Jürgen Groß ; Stefano Stabellini
> > ; qemu-devel@nongnu.org; osstest service owner
> > ; Anthony Perard
> > ; xen-de...@lists.xenproject.org
> > Subject: Re: [Xen-devel] xen-block: race condition when stopping the
> > device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> >
> > > -Original Message-
> > [snip]
> > > >>
> > > >> This feels like a race condition between the init/free code with
> > > >> handler. Anthony, does it ring any bell?
> > > >>
> > > >
> > > >  From that stack bt it looks like an iothread managed to run after
> the
> > > sring was NULLed. This should not be able happen as the dataplane
> should
> > > have been moved back onto QEMU's main thread context before the ring
> is
> > > unmapped.
> > >
> > > My knowledge of this code is fairly limited, so correct me if I am
> > wrong.
> > >
> > > blk_set_aio_context() would set the context for the block aio. AFAICT,
> > > the only aio for the block is xen_block_complete_aio().
> >
> > Not quite. xen_block_dataplane_start() calls
> > xen_device_bind_event_channel() and that will add an event channel fd
> into
> > the aio context, so the shared ring is polled by the iothread as well as
> > block i/o completion.
> >
> > >
> > > In the stack above, we are not dealing with a block aio but an aio tie
> > > to the event channel (see the call from xen_device_poll). So I don't
> > > think the blk_set_aio_context() would affect the aio.
> > >
> >
> > For the reason I outline above, it does.
> >
> > > So it would be possible to get the iothread running because we
> received
> > > a notification on the event channel while we are stopping the block
> (i.e
> > > xen_block_dataplane_stop()).
> > >
> >
> > We should assume an iothread can essentially run at any time, as it is a
> > polling entity. It should eventually block polling on fds assign to its
> > aio context but I don't think the abstraction guarantees that it cannot
> be
> > awoken for other reasons (e.g. off a timeout). However and event from
> the
> > frontend will certainly cause the evtchn fd poll to wake up.
> >
> > > If xen_block_dataplane_stop() grab the context lock first, then the
> > > iothread dealing with the event may wait on the lock until its
> released.
> > >
> > > By the time the lock is grabbed, we may have free all the resources
> > > (including srings). So the event iothread will end up to dereference a
> > > NULL pointer.
> > >
> >
> > I think the problem may actually be that xen_block_dataplane_event()
> does
> > not acquire the context and thus is not synchronized with
> > xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt
> is
> > not clear whether a poll handler called by an iothread needs to acquire
> > the context though; TBH I would not have thought it necessary.
> >
> > > It feels to me we need a way to quiesce all the iothreads (blk,
> > > event,...) before continuing. But I am a bit unsure how to do this in
> > > QEMU.
> > >
> >
> > Looking at virtio-blk.c I see that it does seem to close off its evtchn
> > equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder
> > whether the 'right' thing to do is to call
> > xen_device_unbind_event_channel() using the same mechanism to ensure
> > xen_block_dataplane_event() can't race.
> 
> Digging around the virtio-blk history I see:
> 
> commit 1010cadf62332017648abee0d7a3dc7f2eef9632
> Author: Stefan Hajnoczi 
> Date:   Wed Mar 7 14:42:03 2018 +
> 
> virtio-blk: fix race between .ioeventfd_stop() and vq handler
> 
> If the main loop thread invokes .ioeventfd_stop() just as the vq
> handler
> function begins in the IOThread then the handler may lose the race for
> the AioCont

RE: [Xen-devel] xen-block: race condition when stopping the device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)

2019-12-16 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Durrant, Paul
> Sent: 16 December 2019 09:34
> To: Julien Grall ; Ian Jackson 
> Cc: Jürgen Groß ; Stefano Stabellini
> ; qemu-devel@nongnu.org; osstest service owner
> ; Anthony Perard
> ; xen-de...@lists.xenproject.org
> Subject: Re: [Xen-devel] xen-block: race condition when stopping the
> device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> 
> > -Original Message-
> [snip]
> > >>
> > >> This feels like a race condition between the init/free code with
> > >> handler. Anthony, does it ring any bell?
> > >>
> > >
> > >  From that stack bt it looks like an iothread managed to run after the
> > sring was NULLed. This should not be able happen as the dataplane should
> > have been moved back onto QEMU's main thread context before the ring is
> > unmapped.
> >
> > My knowledge of this code is fairly limited, so correct me if I am
> wrong.
> >
> > blk_set_aio_context() would set the context for the block aio. AFAICT,
> > the only aio for the block is xen_block_complete_aio().
> 
> Not quite. xen_block_dataplane_start() calls
> xen_device_bind_event_channel() and that will add an event channel fd into
> the aio context, so the shared ring is polled by the iothread as well as
> block i/o completion.
> 
> >
> > In the stack above, we are not dealing with a block aio but an aio tie
> > to the event channel (see the call from xen_device_poll). So I don't
> > think the blk_set_aio_context() would affect the aio.
> >
> 
> For the reason I outline above, it does.
> 
> > So it would be possible to get the iothread running because we received
> > a notification on the event channel while we are stopping the block (i.e
> > xen_block_dataplane_stop()).
> >
> 
> We should assume an iothread can essentially run at any time, as it is a
> polling entity. It should eventually block polling on fds assign to its
> aio context but I don't think the abstraction guarantees that it cannot be
> awoken for other reasons (e.g. off a timeout). However and event from the
> frontend will certainly cause the evtchn fd poll to wake up.
> 
> > If xen_block_dataplane_stop() grab the context lock first, then the
> > iothread dealing with the event may wait on the lock until its released.
> >
> > By the time the lock is grabbed, we may have free all the resources
> > (including srings). So the event iothread will end up to dereference a
> > NULL pointer.
> >
> 
> I think the problem may actually be that xen_block_dataplane_event() does
> not acquire the context and thus is not synchronized with
> xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt is
> not clear whether a poll handler called by an iothread needs to acquire
> the context though; TBH I would not have thought it necessary.
> 
> > It feels to me we need a way to quiesce all the iothreads (blk,
> > event,...) before continuing. But I am a bit unsure how to do this in
> > QEMU.
> >
> 
> Looking at virtio-blk.c I see that it does seem to close off its evtchn
> equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder
> whether the 'right' thing to do is to call
> xen_device_unbind_event_channel() using the same mechanism to ensure
> xen_block_dataplane_event() can't race.

Digging around the virtio-blk history I see:

commit 1010cadf62332017648abee0d7a3dc7f2eef9632
Author: Stefan Hajnoczi 
Date:   Wed Mar 7 14:42:03 2018 +

virtio-blk: fix race between .ioeventfd_stop() and vq handler

If the main loop thread invokes .ioeventfd_stop() just as the vq handler
function begins in the IOThread then the handler may lose the race for
the AioContext lock.  By the time the vq handler is able to acquire the
AioContext lock the ioeventfd has already been removed and the handler
isn't supposed to run anymore!

Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread.  This way no races with the vq handler are
possible.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Fam Zheng 
Acked-by: Paolo Bonzini 
Message-id: 20180307144205.20619-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 

...so I think xen-block has exactly the same problem. I think we may also be 
missing a qemu_bh_cancel() to make sure block aio completions are stopped. I'll 
prep a patch.

  Paul

> 
>   Paul
> 
> > Cheers,
> >
> > --
> > Julien Grall
> ___
> Xen-devel mailing list
> xen-de...@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel


RE: xen-block: race condition when stopping the device (WAS: Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions - FAIL)

2019-12-16 Thread Durrant, Paul
> -Original Message-
[snip]
> >>
> >> This feels like a race condition between the init/free code with
> >> handler. Anthony, does it ring any bell?
> >>
> >
> >  From that stack bt it looks like an iothread managed to run after the
> sring was NULLed. This should not be able happen as the dataplane should
> have been moved back onto QEMU's main thread context before the ring is
> unmapped.
> 
> My knowledge of this code is fairly limited, so correct me if I am wrong.
> 
> blk_set_aio_context() would set the context for the block aio. AFAICT,
> the only aio for the block is xen_block_complete_aio().

Not quite. xen_block_dataplane_start() calls xen_device_bind_event_channel() 
and that will add an event channel fd into the aio context, so the shared ring 
is polled by the iothread as well as block i/o completion.

> 
> In the stack above, we are not dealing with a block aio but an aio tie
> to the event channel (see the call from xen_device_poll). So I don't
> think the blk_set_aio_context() would affect the aio.
> 

For the reason I outline above, it does.

> So it would be possible to get the iothread running because we received
> a notification on the event channel while we are stopping the block (i.e
> xen_block_dataplane_stop()).
> 

We should assume an iothread can essentially run at any time, as it is a 
polling entity. It should eventually block polling on fds assign to its aio 
context but I don't think the abstraction guarantees that it cannot be awoken 
for other reasons (e.g. off a timeout). However and event from the frontend 
will certainly cause the evtchn fd poll to wake up.

> If xen_block_dataplane_stop() grab the context lock first, then the
> iothread dealing with the event may wait on the lock until its released.
> 
> By the time the lock is grabbed, we may have free all the resources
> (including srings). So the event iothread will end up to dereference a
> NULL pointer.
> 

I think the problem may actually be that xen_block_dataplane_event() does not 
acquire the context and thus is not synchronized with 
xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt is not 
clear whether a poll handler called by an iothread needs to acquire the context 
though; TBH I would not have thought it necessary.

> It feels to me we need a way to quiesce all the iothreads (blk,
> event,...) before continuing. But I am a bit unsure how to do this in
> QEMU.
> 

Looking at virtio-blk.c I see that it does seem to close off its evtchn 
equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder whether 
the 'right' thing to do is to call xen_device_unbind_event_channel() using the 
same mechanism to ensure xen_block_dataplane_event() can't race.

  Paul

> Cheers,
> 
> --
> Julien Grall


RE: [Xen-devel] [PATCH-for-5.0 v3 5/6] hw/pci-host/i440fx: Extract the IGD passthrough host bridge device

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Philippe Mathieu-Daudé
> Sent: 09 December 2019 09:50
> To: qemu-devel@nongnu.org
> Cc: Thomas Huth ; Stefano Stabellini
> ; Michael S. Tsirkin ; Paul
> Durrant ; Markus Armbruster ; Alex
> Williamson ; Marcel Apfelbaum
> ; Paolo Bonzini ; Anthony
> Perard ; xen-de...@lists.xenproject.org;
> Philippe Mathieu-Daudé 
> Subject: [Xen-devel] [PATCH-for-5.0 v3 5/6] hw/pci-host/i440fx: Extract
> the IGD passthrough host bridge device
> 
> We can use a i440FX without the IGD passthrough host bridge.
> Extract it into a new file, 'hw/pci-host/igd_pt.c'.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Acked-by: Paul Durrant 

> ---
> v3:
> - Rename as 'xen_igd_pt.c' (Alex Williamson)
> - Add an entry in MAINTAINERS::Xen
> ---
>  hw/pci-host/i440fx.c  |  84 --
>  hw/pci-host/xen_igd_pt.c  | 120 ++
>  MAINTAINERS   |   1 +
>  hw/pci-host/Makefile.objs |   1 +
>  4 files changed, 122 insertions(+), 84 deletions(-)
>  create mode 100644 hw/pci-host/xen_igd_pt.c
> 
> diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
> index 414138595b..bae7b42327 100644
> --- a/hw/pci-host/i440fx.c
> +++ b/hw/pci-host/i440fx.c
> @@ -368,89 +368,6 @@ static const TypeInfo i440fx_info = {
>  },
>  };
> 
> -/* IGD Passthrough Host Bridge. */
> -typedef struct {
> -uint8_t offset;
> -uint8_t len;
> -} IGDHostInfo;
> -
> -/* Here we just expose minimal host bridge offset subset. */
> -static const IGDHostInfo igd_host_bridge_infos[] = {
> -{PCI_REVISION_ID, 2},
> -{PCI_SUBSYSTEM_VENDOR_ID, 2},
> -{PCI_SUBSYSTEM_ID,2},
> -{0x50,2}, /* SNB: processor graphics control
> register */
> -{0x52,2}, /* processor graphics control register
> */
> -{0xa4,4}, /* SNB: graphics base of stolen memory
> */
> -{0xa8,4}, /* SNB: base of GTT stolen memory */
> -};
> -
> -static void host_pci_config_read(int pos, int len, uint32_t *val, Error
> **errp)
> -{
> -int rc, config_fd;
> -/* Access real host bridge. */
> -char *path =
> g_strdup_printf("/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
> - 0, 0, 0, 0, "config");
> -
> -config_fd = open(path, O_RDWR);
> -if (config_fd < 0) {
> -error_setg_errno(errp, errno, "Failed to open: %s", path);
> -goto out;
> -}
> -
> -if (lseek(config_fd, pos, SEEK_SET) != pos) {
> -error_setg_errno(errp, errno, "Failed to seek: %s", path);
> -goto out_close_fd;
> -}
> -
> -do {
> -rc = read(config_fd, (uint8_t *)val, len);
> -} while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> -if (rc != len) {
> -error_setg_errno(errp, errno, "Failed to read: %s", path);
> -}
> -
> -out_close_fd:
> -close(config_fd);
> -out:
> -g_free(path);
> -}
> -
> -static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
> -{
> -uint32_t val = 0;
> -size_t i;
> -int pos, len;
> -Error *local_err = NULL;
> -
> -for (i = 0; i < ARRAY_SIZE(igd_host_bridge_infos); i++) {
> -pos = igd_host_bridge_infos[i].offset;
> -len = igd_host_bridge_infos[i].len;
> -host_pci_config_read(pos, len, , _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> -pci_default_write_config(pci_dev, pos, val, len);
> -}
> -}
> -
> -static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void
> *data)
> -{
> -DeviceClass *dc = DEVICE_CLASS(klass);
> -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> -
> -k->realize = igd_pt_i440fx_realize;
> -dc->desc = "IGD Passthrough Host bridge";
> -}
> -
> -static const TypeInfo igd_passthrough_i440fx_info = {
> -.name  = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
> -.parent= TYPE_I440FX_PCI_DEVICE,
> -.instance_size = sizeof(PCII440FXState),
> -.class_init= igd_passthrough_i440fx_class_init,
> -};
> -
>  static const char *i440fx_pcihost_root_bus_path(PCIHostState
> *host_bridge,
>  PCIBus *rootbus)
>  {
> @@ -495,7 +412,6 @@ static const TypeInfo i440fx_pcihost_info = {
>  static void i440fx_register_types(void)
>  {
>  type_register_static(_info);
> -type_register_static(_passthrough_i440fx_info);
>  type_register_static(_pcihost_info);
>  }
> 
> diff --git a/hw/pci-host/xen_igd_pt.c b/hw/pci-host/xen_igd_pt.c
> new file mode 100644
> index 00..efcc9347ff
> --- /dev/null
> +++ b/hw/pci-host/xen_igd_pt.c
> @@ -0,0 +1,120 @@
> +/*
> + * QEMU Intel IGD Passthrough Host Bridge Emulation
> + *
> + * Copyright (c) 2006 Fabrice Bellard
> + *
> + * SPDX-License-Identifier: MIT
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining
> a copy
> + * of this software 

RE: [Xen-devel] [PATCH-for-5.0 v3 6/6] hw/pci-host: Add Kconfig entry to select the IGD Passthrough Host Bridge

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Philippe Mathieu-Daudé
> Sent: 09 December 2019 09:50
> To: qemu-devel@nongnu.org
> Cc: Thomas Huth ; Stefano Stabellini
> ; Michael S. Tsirkin ; Paul
> Durrant ; Markus Armbruster ; Alex
> Williamson ; Marcel Apfelbaum
> ; Paolo Bonzini ; Anthony
> Perard ; xen-de...@lists.xenproject.org;
> Philippe Mathieu-Daudé 
> Subject: [Xen-devel] [PATCH-for-5.0 v3 6/6] hw/pci-host: Add Kconfig entry
> to select the IGD Passthrough Host Bridge
> 
> Add the XEN_IGD_PASSTHROUGH Kconfig option.
> 
> Xen build has that option selected by default. Non-Xen builds now
> have to select this feature manually.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v3: Only default with Xen (Alex Williamson)
> 
> I did not used 'depends on XEN' as suggested by Alex but
> 'default y if XEN', so one can build XEN without this feature
> (for example, on other ARCH than X86).

Allowing it to be compiled out for Xen builds is quite reasonable IMO. I don't 
believe it is widely used.

Acked-by: Paul Durrant 

> ---
>  hw/pci-host/Kconfig   | 5 +
>  hw/pci-host/Makefile.objs | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
> index b0aa8351c4..24ba8ea046 100644
> --- a/hw/pci-host/Kconfig
> +++ b/hw/pci-host/Kconfig
> @@ -1,6 +1,11 @@
>  config PAM
>  bool
> 
> +config XEN_IGD_PASSTHROUGH
> +bool
> +default y if XEN
> +select PCI_I440FX
> +
>  config PREP_PCI
>  bool
>  select PCI
> diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
> index fa6d1556c0..9c466fab01 100644
> --- a/hw/pci-host/Makefile.objs
> +++ b/hw/pci-host/Makefile.objs
> @@ -14,7 +14,7 @@ common-obj-$(CONFIG_VERSATILE_PCI) += versatile.o
>  common-obj-$(CONFIG_PCI_SABRE) += sabre.o
>  common-obj-$(CONFIG_FULONG) += bonito.o
>  common-obj-$(CONFIG_PCI_I440FX) += i440fx.o
> -common-obj-$(CONFIG_PCI_I440FX) += xen_igd_pt.o
> +common-obj-$(CONFIG_XEN_IGD_PASSTHROUGH) += xen_igd_pt.o
>  common-obj-$(CONFIG_PCI_EXPRESS_Q35) += q35.o
>  common-obj-$(CONFIG_PCI_EXPRESS_GENERIC_BRIDGE) += gpex.o
>  common-obj-$(CONFIG_PCI_EXPRESS_XILINX) += xilinx-pcie.o
> --
> 2.21.0
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel